Bug 60842 - Trim extracted URLs when loading embedded resources using the Lagarto based HTML Parser.
Summary: Trim extracted URLs when loading embedded resources using the Lagarto based H...
Status: RESOLVED FIXED
Alias: None
Product: JMeter - Now in Github
Classification: Unclassified
Component: HTTP (show other bugs)
Version: 3.1
Hardware: All All
: P2 enhancement (vote)
Target Milestone: ---
Assignee: JMeter issues mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-09 15:35 UTC by George Sakhnovsky
Modified: 2017-04-22 18:30 UTC (History)
1 user (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description George Sakhnovsky 2017-03-09 15:35:41 UTC
jmeter 3.1 r1770033
java version "1.8.0_74"
Java(TM) SE Runtime Environment (build 1.8.0_74-b02)
osx sierra 10.12.3

When running a test jmeter doesn't strip newline when an html tag spans more than a single line. HTML looks like this:

        <link rel="stylesheet" href="/assets/build/css/app-1488292708
.css">

^ newline precedes .css

The broken request:

[09/Mar/2017:14:57:22 +0000] "GET /assets/build/css/app-1489008734%0A.css HTTP/1.1" 404 234 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:51.0) Gecko/20100101 Firefox/51.0" "-"
Comment 1 Felix Schumacher 2017-03-09 20:32:36 UTC
I believe that the html page is broken.

html and xml attribute values are allowed to include space characters.(https://www.w3.org/TR/REC-xml/#NT-AttValue and https://www.w3.org/TR/html51/syntax.html#attribute-value).

Looking at https://www.w3.org/TR/1998/REC-xml-19980210#AVNormalize you could argue, that we should normalize the space(s) to one space. But I think you would not want that either.

So I tend to close this as "won't fix".
Comment 2 George Sakhnovsky 2017-03-09 22:34:15 UTC
I can see your point, but will note that Chrome, Firefox, Safari, and IE don't subscribe to the same interpretation and correctly (or perhaps incorrectly) strip the newline and load the URI.
Comment 3 Felix Schumacher 2017-03-10 09:21:39 UTC
You are correct. I checked with chrome and Firefox. The logic send to be:

Trim whitespace before and after the links.
Remove any newlines.

I haven't checked for other whitespace like characters.
Comment 4 Philippe Mouawad 2017-03-10 19:48:19 UTC
(In reply to George Sakhnovsky from comment #0)
> jmeter 3.1 r1770033
> java version "1.8.0_74"
> Java(TM) SE Runtime Environment (build 1.8.0_74-b02)
> osx sierra 10.12.3
> 
> When running a test jmeter doesn't strip newline when an html tag spans more
> than a single line. HTML looks like this:
> 
>         <link rel="stylesheet" href="/assets/build/css/app-1488292708
> .css">
> 
> ^ newline precedes .css
> 
> The broken request:
> 
> [09/Mar/2017:14:57:22 +0000] "GET /assets/build/css/app-1489008734%0A.css
> HTTP/1.1" 404 234 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12;
> rv:51.0) Gecko/20100101 Firefox/51.0" "-"

How is this URL used ? Through an extractor or through resources download ?
Thanks
Comment 5 Felix Schumacher 2017-03-10 20:23:33 UTC
Could you test with the next nightly, if the extracted links are correct now?

Date: Fri Mar 10 20:12:49 2017
New Revision: 1786427

URL: http://svn.apache.org/viewvc?rev=1786427&view=rev
Log:
Tests for the Lagarto Html Parser in preparation for bug 60482

Bugzilla Id: 60482

Added:
    jmeter/trunk/test/src/org/apache/jmeter/protocol/http/parser/TestLagartoBasedHtmlParser.java   (with props)

Date: Fri Mar 10 20:19:27 2017
New Revision: 1786434

URL: http://svn.apache.org/viewvc?rev=1786434&view=rev
Log:
Trim extracted URLs and remove whitespace like newline, carriage return, formfeed and backspace.

Bugzilla Id: 60842

Modified:
    jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/LagartoBasedHtmlParser.java
    jmeter/trunk/test/src/org/apache/jmeter/protocol/http/parser/TestLagartoBasedHtmlParser.java

Date: Fri Mar 10 20:21:59 2017
New Revision: 1786435

URL: http://svn.apache.org/viewvc?rev=1786435&view=rev
Log:
Mention bug in changelog.

Bugzilla Id: 60842

Modified:
    jmeter/trunk/xdocs/changes.xml
Comment 6 Philippe Mouawad 2017-03-10 22:19:09 UTC
Author: pmouawad
Date: Fri Mar 10 22:18:55 2017
New Revision: 1786460

URL: http://svn.apache.org/viewvc?rev=1786460&view=rev
Log:
Bug 60842 - jmeter chokes on newline
Optimize by using static Pattern
Factor our code in base class
Use code in JSoup based implementation
Add Junit tests for it
Bugzilla Id: 60842

Added:
    jmeter/trunk/test/src/org/apache/jmeter/protocol/http/parser/TestJSoupBasedHtmlParser.java   (with props)
Modified:
    jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/HTMLParser.java
    jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/JsoupBasedHtmlParser.java
    jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/LagartoBasedHtmlParser.java
Comment 7 Felix Schumacher 2017-03-11 09:16:50 UTC
Date: Sat Mar 11 09:15:49 2017
New Revision: 1786497

URL: http://svn.apache.org/viewvc?rev=1786497&view=rev
Log:
Combine tests to reduce code duplication. Fix html-pseudo code for frame, so
that JSoup doesn't choke on it.

Bugzilla Id: 60842

Added:
    jmeter/trunk/test/src/org/apache/jmeter/protocol/http/parser/TestBug60842HtmlParser.java   (with props)
Removed:
    jmeter/trunk/test/src/org/apache/jmeter/protocol/http/parser/TestJSoupBasedHtmlParser.java
    jmeter/trunk/test/src/org/apache/jmeter/protocol/http/parser/TestLagartoBasedHtmlParser.java
Comment 8 Felix Schumacher 2017-03-11 09:33:00 UTC
Date: Sat Mar 11 09:32:31 2017
New Revision: 1786498

URL: http://svn.apache.org/viewvc?rev=1786498&view=rev
Log:
Add return tag to javadoc

Bugzilla Id: 60842

Modified:
    jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/HTMLParser.java
Comment 9 Philippe Mouawad 2017-04-19 20:15:57 UTC
(In reply to Felix Schumacher from comment #8)
> Date: Sat Mar 11 09:32:31 2017
> New Revision: 1786498
> 
> URL: http://svn.apache.org/viewvc?rev=1786498&view=rev
> Log:
> Add return tag to javadoc
> 
> Bugzilla Id: 60842
> 
> Modified:
>    
> jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/
> HTMLParser.java

Hi Felix,
Shouldn't this issue be marked as resolved ?
Thanks
Comment 10 Felix Schumacher 2017-04-20 18:41:46 UTC
(In reply to Philippe Mouawad from comment #9)
> (In reply to Felix Schumacher from comment #8)
> > Date: Sat Mar 11 09:32:31 2017
> > New Revision: 1786498
> > 
> > URL: http://svn.apache.org/viewvc?rev=1786498&view=rev
> > Log:
> > Add return tag to javadoc
> > 
> > Bugzilla Id: 60842
> > 
> > Modified:
> >    
> > jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/
> > HTMLParser.java
> 
> Hi Felix,
> Shouldn't this issue be marked as resolved ?

I think so, as we released the fix and no one complained, yet :)

I will probably open another bug for the case of url(...) parsing in style-attributes, when no quotes are used.

> Thanks
Comment 11 Philippe Mouawad 2017-04-20 18:42:53 UTC
Hi,
If we republish docs, then we should update changes.xml to include it.
Regards
Comment 12 The ASF infrastructure team 2022-09-24 20:38:08 UTC
This issue has been migrated to GitHub: https://github.com/apache/jmeter/issues/4328