As per: http://mail-archives.apache.org/mod_mbox/jmeter-dev/201309.mbox/%3CCAH9fUpYD9XLy1CAZC9hzULmOKFarF8vf2%2BG%3DJu-2KhxVA82-Gw%40mail.gmail.com%3E
Date: Sat Oct 5 22:32:38 2013 New Revision: 1529543 URL: http://svn.apache.org/r1529543 Log: Bug 55632 - Have a new implementation of htmlParser for embedded resources parsing with better performances Bugzilla Id: 55632 Added: jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/LagartoBasedHtmlParser.java (with props) Modified: jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/HTMLParser.java jmeter/trunk/test/src/org/apache/jmeter/protocol/http/parser/TestHTMLParser.java jmeter/trunk/xdocs/changes.xml
Created attachment 30905 [details] Test Plan used for comparing performances Results: Lagarto Generate Summary Results = 1183225 in 300s = 3944.1/s Avg: 0 Min: 0 Max: 89 Err: 0 (0.00%) HtmlParser Generate Summary Results = 1063893 in 300s = 3546.4/s Avg: 0 Min: 0 Max: 119 Err: 0 (0.00%) Regex Generate Summary Results = 941949 in 300s = 3139.9/s Avg: 0 Min: 0 Max: 68 Err: 0 (0.00%)
Created attachment 30906 [details] Monitoring during local test
Date: Sat Oct 5 22:43:54 2013 New Revision: 1529545 URL: http://svn.apache.org/r1529545 Log: Bug 55632 - Have a new implementation of htmlParser for embedded resources parsing with better performances Update documentation Bugzilla Id: 55632 Modified: jmeter/trunk/bin/jmeter.properties
Date: Sun Oct 6 10:10:35 2013 New Revision: 1529606 URL: http://svn.apache.org/r1529606 Log: Bug 55632 - Have a new implementation of htmlParser for embedded resources parsing with better performances Fixed test failure Bugzilla Id: 55632 Modified: jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/LagartoBasedHtmlParser.java
For record, didn't go for JSoup implementation as initially proposed on mailing list discussion as I got not answer from google group on question asked on jsoup google group on 2 march 2013 (Was not able to find link ?) : ----------------------------------------------------------------------------- On Sat, Mar 2, 2013 at 9:44 PM, Philippe Mouawad <philippe.mouawad@gmail.com> wrote: Hello, First thanks for your great library. I have a question regarding the best way regarding the best way to parse a document to extract resource links. You show this: http://jsoup.org/cookbook/extracting-data/example-list-links But what is the most performing way to do it, is it the one shown or is it better to iterate on doc.getAllElements() ? Thanks Regards Philippe ----------------------------------------------------------------------------- JODD Lagarto SAX like approach is as of today, more efficient than what JSoup proposes in terms of API. If this was to change we would update this.
Date: Sun Oct 6 13:34:37 2013 New Revision: 1529618 URL: http://svn.apache.org/r1529618 Log: Bug 55632 - Have a new implementation of htmlParser for embedded resources parsing with better performances Rollback default for now Comment on performances Bugzilla Id: 55632 Modified: jmeter/trunk/bin/jmeter.properties jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/HTMLParser.java
Date: Mon Oct 7 21:36:47 2013 New Revision: 1530074 URL: http://svn.apache.org/r1530074 Log: Bug 55632 - Have a new implementation of htmlParser for embedded resources parsing with better performances Bugzilla Id: 55632 Added: jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/JsoupBasedHtmlParser.java (with props) Modified: jmeter/trunk/bin/jmeter.properties jmeter/trunk/test/src/org/apache/jmeter/protocol/http/parser/TestHTMLParser.java Date: Mon Oct 7 21:37:57 2013 New Revision: 1530076 URL: http://svn.apache.org/r1530076 Log: Bug 55632 - Have a new implementation of htmlParser for embedded resources parsing with better performances Add eol Bugzilla Id: 55632 Modified: jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/JsoupBasedHtmlParser.java (props changed) jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/LagartoBasedHtmlParser.java (props changed) Date: Mon Oct 7 21:40:50 2013 New Revision: 1530078 URL: http://svn.apache.org/r1530078 Log: Bug 55632 - Have a new implementation of htmlParser for embedded resources parsing with better performances Switch default to Lagarto Parser implementation Bugzilla Id: 55632 Modified: jmeter/trunk/bin/jmeter.properties jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/parser/HTMLParser.java
Hello, Made new tests on a more powerful machine (Mac Book Pro last generation, 16 Go RAM, 2.7 Ghz Core i7). With 1g for Xmx and Xmx==Xms, JDK 1.7u40: Lagarto: Generate Summary Results + 2213206 in 147s = 15009.2/s Avg: 0 Min: 0 Max: 12 Err: 0 (0.00%) Active: 3 Started: 3 Finished: 0 Generate Summary Results + 2359774 in 153s = 15470.1/s Avg: 0 Min: 0 Max: 12 Err: 0 (0.00%) Active: 0 Started: 3 Finished: 3 Generate Summary Results = 4572980 in 300s = 15243.5/s Avg: 0 Min: 0 Max: 12 Err: 0 (0.00%) JSOUP: Generate Summary Results + 1572874 in 118s = 13351.6/s Avg: 0 Min: 0 Max: 12 Err: 0 (0.00%) Active: 3 Started: 3 Finished: 0 Generate Summary Results + 2448660 in 180s = 13603.7/s Avg: 0 Min: 0 Max: 4 Err: 0 (0.00%) Active: 3 Started: 3 Finished: 0 Generate Summary Results = 4021534 in 298s = 13504.0/s Avg: 0 Min: 0 Max: 12 Err: 0 (0.00%) Generate Summary Results + 29807 in 2.2s = 13610.5/s Avg: 0 Min: 0 Max: 3 Err: 0 (0.00%) Active: 0 Started: 3 Finished: 3 Generate Summary Results = 4051341 in 300s = 13504.7/s Avg: 0 Min: 0 Max: 12 Err: 0 (0.00%) HTMLPARSER: Generate Summary Results + 1050392 in 82s = 12812.0/s Avg: 0 Min: 0 Max: 19 Err: 0 (0.00%) Active: 3 Started: 3 Finished: 0 Generate Summary Results + 2296747 in 180s = 12759.7/s Avg: 0 Min: 0 Max: 11 Err: 0 (0.00%) Active: 3 Started: 3 Finished: 0 Generate Summary Results = 3347139 in 262s = 12776.1/s Avg: 0 Min: 0 Max: 19 Err: 0 (0.00%) Generate Summary Results + 490001 in 38s = 12891.4/s Avg: 0 Min: 0 Max: 5 Err: 0 (0.00%) Active: 0 Started: 3 Finished: 3 Generate Summary Results = 3837140 in 300s = 12790.7/s Avg: 0 Min: 0 Max: 19 Err: 0 (0.00%) Lagarto has nearly 20% more throuput than HTML Parser and performs better than JSOUP. As I had developed it I commited it, could be useful for Functional Testing. At some step we could maybe drop old htmlparser.
Created attachment 30908 [details] New Performance results
JSoup implementation based on: http://stackoverflow.com/questions/19218305/most-performing-way-to-extract-links-to-embedded-resources-in-jsoup
Comment on attachment 30908 [details] New Performance results The attachment appears to show that HtmlParser ends up using the least memory, and JSoup the most. Is that expected?
(In reply to Philippe Mouawad from comment #9) > Hello, > Made new tests on a more powerful machine (Mac Book Pro last generation, 16 > Go RAM, 2.7 Ghz Core i7). > With 1g for Xmx and Xmx==Xms, JDK 1.7u40: What did you use to generate these figures?
(In reply to Sebb from comment #12) > Comment on attachment 30908 [details] > New Performance results > > The attachment appears to show that HtmlParser ends up using the least > memory, and JSoup the most. > > Is that expected? I ran GC at end and 3 implementations dropped to the same figure. I think it uses more memory before GC because it has higher throughput. I used JConsole.
I used Generate Summary results for figures.
We really need a test case that can be run in command-line mode from the Ant build script. The existing test case is quite hard to use, and the throughput value includes overhead from the sampler and JSR223 processor. Not sure why the code iterates the result list.
Created attachment 30913 [details] ignore...attached to wrong bug
Opened 55913 to create a Test Case. Closing issue.
This issue has been migrated to GitHub: https://github.com/apache/jmeter/issues/3249