Bug 55632 - Have a new implementation of htmlParser for embedded resources parsing with better performances
Have a new implementation of htmlParser for embedded resources parsing with b...
Status: RESOLVED FIXED
Product: JMeter
Classification: Unclassified
Component: HTTP
Nightly (Please specify date)
All All
: P2 enhancement (vote)
: ---
Assigned To: JMeter issues mailing list
:
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2013-10-05 21:50 UTC by Philippe Mouawad
Modified: 2013-12-19 21:54 UTC (History)
1 user (show)



Attachments
Test Plan used for comparing performances (4.38 KB, application/xml)
2013-10-05 22:40 UTC, Philippe Mouawad
Details
Monitoring during local test (300.21 KB, application/octet-stream)
2013-10-05 22:42 UTC, Philippe Mouawad
Details
New Performance results (854.16 KB, application/octet-stream)
2013-10-07 21:46 UTC, Philippe Mouawad
Details
ignore...attached to wrong bug (92.01 KB, text/plain)
2013-10-09 18:45 UTC, Matt Kilbride
Details

Note You need to log in before you can comment on or make changes to this bug.
Comment 1 Philippe Mouawad 2013-10-05 22:33:38 UTC
Date: Sat Oct  5 22:32:38 2013
New Revision: 1529543

URL: http://svn.apache.org/r1529543
Log:
Bug 55632 - Have a new implementation of htmlParser for embedded resources parsing with better performances
Bugzilla Id: 55632

Added:
    jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/LagartoBasedHtmlParser.java   (with props)
Modified:
    jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/HTMLParser.java
    jmeter/trunk/test/src/org/apache/jmeter/protocol/http/parser/TestHTMLParser.java
    jmeter/trunk/xdocs/changes.xml
Comment 2 Philippe Mouawad 2013-10-05 22:40:55 UTC
Created attachment 30905 [details]
Test Plan used for comparing performances

Results:

Lagarto
Generate Summary Results = 1183225 in   300s = 3944.1/s Avg:     0 Min:     0 Max:    89 Err:     0 (0.00%)

HtmlParser
Generate Summary Results = 1063893 in   300s = 3546.4/s Avg:     0 Min:     0 Max:   119 Err:     0 (0.00%)

Regex
Generate Summary Results = 941949 in   300s = 3139.9/s Avg:     0 Min:     0 Max:    68 Err:     0 (0.00%)
Comment 3 Philippe Mouawad 2013-10-05 22:42:42 UTC
Created attachment 30906 [details]
Monitoring during local test
Comment 4 Philippe Mouawad 2013-10-05 22:45:15 UTC
Date: Sat Oct  5 22:43:54 2013
New Revision: 1529545

URL: http://svn.apache.org/r1529545
Log:
Bug 55632 - Have a new implementation of htmlParser for embedded resources parsing with better performances
Update documentation
Bugzilla Id: 55632

Modified:
    jmeter/trunk/bin/jmeter.properties
Comment 5 Philippe Mouawad 2013-10-06 10:12:18 UTC
Date: Sun Oct  6 10:10:35 2013
New Revision: 1529606

URL: http://svn.apache.org/r1529606
Log:
Bug 55632 - Have a new implementation of htmlParser for embedded resources parsing with better performances
Fixed test failure
Bugzilla Id: 55632

Modified:
    jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/LagartoBasedHtmlParser.java
Comment 6 Philippe Mouawad 2013-10-06 10:17:15 UTC
For record, didn't go for JSoup implementation as initially proposed on mailing list discussion as I got not answer from google group on question asked on jsoup google group on 2 march 2013 (Was not able to find link ?) : 
-----------------------------------------------------------------------------
On Sat, Mar 2, 2013 at 9:44 PM, Philippe Mouawad <philippe.mouawad@gmail.com> wrote:

    Hello,
    First thanks for your great library.

    I have a question regarding the best way regarding the best way to parse a document to extract resource links.

    You show this:

        http://jsoup.org/cookbook/extracting-data/example-list-links


    But what is the most performing way to do it, is it the one shown or is it better to iterate on doc.getAllElements() ?
    Thanks
    Regards
    Philippe
-----------------------------------------------------------------------------

JODD Lagarto SAX like approach is as of today, more efficient than what JSoup proposes in terms of API. 
If this was to change we would update this.
Comment 7 Philippe Mouawad 2013-10-06 13:36:13 UTC
Date: Sun Oct  6 13:34:37 2013
New Revision: 1529618

URL: http://svn.apache.org/r1529618
Log:
Bug 55632 - Have a new implementation of htmlParser for embedded resources parsing with better performances
Rollback default for now
Comment on performances
Bugzilla Id: 55632

Modified:
    jmeter/trunk/bin/jmeter.properties
    jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/HTMLParser.java
Comment 8 Philippe Mouawad 2013-10-07 21:42:10 UTC
Date: Mon Oct  7 21:36:47 2013
New Revision: 1530074

URL: http://svn.apache.org/r1530074
Log:
Bug 55632 - Have a new implementation of htmlParser for embedded resources parsing with better performances
Bugzilla Id: 55632

Added:
    jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/JsoupBasedHtmlParser.java   (with props)
Modified:
    jmeter/trunk/bin/jmeter.properties
    jmeter/trunk/test/src/org/apache/jmeter/protocol/http/parser/TestHTMLParser.java
    
    
Date: Mon Oct  7 21:37:57 2013
New Revision: 1530076

URL: http://svn.apache.org/r1530076
Log:
Bug 55632 - Have a new implementation of htmlParser for embedded resources parsing with better performances
Add eol
Bugzilla Id: 55632

Modified:
    jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/JsoupBasedHtmlParser.java   (props changed)
    jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/LagartoBasedHtmlParser.java   (props changed)
    
Date: Mon Oct  7 21:40:50 2013
New Revision: 1530078

URL: http://svn.apache.org/r1530078
Log:
Bug 55632 - Have a new implementation of htmlParser for embedded resources parsing with better performances
Switch default to Lagarto Parser implementation
Bugzilla Id: 55632

Modified:
    jmeter/trunk/bin/jmeter.properties
    jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/HTMLParser.java
Comment 9 Philippe Mouawad 2013-10-07 21:45:07 UTC
Hello,
Made new tests on a more powerful machine (Mac Book Pro last generation, 16 Go RAM, 2.7 Ghz Core i7).
With 1g for Xmx and Xmx==Xms, JDK 1.7u40:

Lagarto:
Generate Summary Results + 2213206 in   147s = 15009.2/s Avg:     0 Min:     0 Max:    12 Err:     0 (0.00%) Active: 3 Started: 3 Finished: 0
Generate Summary Results + 2359774 in   153s = 15470.1/s Avg:     0 Min:     0 Max:    12 Err:     0 (0.00%) Active: 0 Started: 3 Finished: 3
Generate Summary Results = 4572980 in   300s = 15243.5/s Avg:     0 Min:     0 Max:    12 Err:     0 (0.00%)


JSOUP:
Generate Summary Results + 1572874 in   118s = 13351.6/s Avg:     0 Min:     0 Max:    12 Err:     0 (0.00%) Active: 3 Started: 3 Finished: 0
Generate Summary Results + 2448660 in   180s = 13603.7/s Avg:     0 Min:     0 Max:     4 Err:     0 (0.00%) Active: 3 Started: 3 Finished: 0
Generate Summary Results = 4021534 in   298s = 13504.0/s Avg:     0 Min:     0 Max:    12 Err:     0 (0.00%)
Generate Summary Results +  29807 in   2.2s = 13610.5/s Avg:     0 Min:     0 Max:     3 Err:     0 (0.00%) Active: 0 Started: 3 Finished: 3
Generate Summary Results = 4051341 in   300s = 13504.7/s Avg:     0 Min:     0 Max:    12 Err:     0 (0.00%)


HTMLPARSER:
Generate Summary Results + 1050392 in    82s = 12812.0/s Avg:     0 Min:     0 Max:    19 Err:     0 (0.00%) Active: 3 Started: 3 Finished: 0
Generate Summary Results + 2296747 in   180s = 12759.7/s Avg:     0 Min:     0 Max:    11 Err:     0 (0.00%) Active: 3 Started: 3 Finished: 0
Generate Summary Results = 3347139 in   262s = 12776.1/s Avg:     0 Min:     0 Max:    19 Err:     0 (0.00%)
Generate Summary Results + 490001 in    38s = 12891.4/s Avg:     0 Min:     0 Max:     5 Err:     0 (0.00%) Active: 0 Started: 3 Finished: 3
Generate Summary Results = 3837140 in   300s = 12790.7/s Avg:     0 Min:     0 Max:    19 Err:     0 (0.00%)


Lagarto has nearly 20% more throuput than HTML Parser and performs better than JSOUP. As I had developed it I commited it, could be useful for Functional Testing.

At some step we could maybe drop old htmlparser.
Comment 10 Philippe Mouawad 2013-10-07 21:46:09 UTC
Created attachment 30908 [details]
New Performance results
Comment 12 Sebb 2013-10-08 17:33:40 UTC
Comment on attachment 30908 [details]
New Performance results

The attachment appears to show that HtmlParser ends up using the least memory, and JSoup the most.

Is that expected?
Comment 13 Sebb 2013-10-08 17:35:50 UTC
(In reply to Philippe Mouawad from comment #9)
> Hello,
> Made new tests on a more powerful machine (Mac Book Pro last generation, 16
> Go RAM, 2.7 Ghz Core i7).
> With 1g for Xmx and Xmx==Xms, JDK 1.7u40:

What did you use to generate these figures?
Comment 14 Philippe Mouawad 2013-10-08 19:11:12 UTC
(In reply to Sebb from comment #12)
> Comment on attachment 30908 [details]
> New Performance results
> 
> The attachment appears to show that HtmlParser ends up using the least
> memory, and JSoup the most.
> 
> Is that expected?


I ran GC at end and 3 implementations dropped to the same figure.
I think it uses more memory before GC because it has higher throughput.
I used JConsole.
Comment 15 Philippe Mouawad 2013-10-08 19:11:37 UTC
I used Generate Summary results for figures.
Comment 16 Sebb 2013-10-08 23:56:02 UTC
We really need a test case that can be run in command-line mode from the Ant build script.

The existing test case is quite hard to use, and the throughput value includes overhead from the sampler and JSR223 processor. Not sure why the code iterates the result list.
Comment 17 Matt Kilbride 2013-10-09 18:45:15 UTC
Created attachment 30913 [details]
ignore...attached to wrong bug
Comment 18 Philippe Mouawad 2013-12-19 21:54:27 UTC
Opened 55913 to create a Test Case.
Closing issue.