Bug 64653

Summary: HTTP recorder - Exception parsing javascript file
Product: JMeter Reporter: doa
Component: HTTPAssignee: JMeter issues mailing list <issues>
Status: RESOLVED FIXED    
Severity: normal CC: doa
Priority: P2 Keywords: FixedInTrunk
Version: 5.3   
Target Milestone: JMETER_5.4   
Hardware: PC   
OS: All   
Attachments: Skip javascript (and JSON) in charset guessing by forms

Description doa 2020-08-07 10:47:19 UTC
When using http recorder with our app, JMeter throws an expection when loading one specific minified js file. Sadly i can't share the content of the file here since it is a third party licenced lib.

It seems JMeter tries to parse the js file as a html for some reason?
I verified that the server sends the correct mime-type (application/javascript) in the header.

Here is the error in the log output:

2020-08-07 12:37:44,467 DEBUG o.a.j.p.h.p.FormCharSetFinder: Parsing html of: %Js content removed%

2020-08-07 12:37:44,682 ERROR o.a.j.p.h.p.Proxy: [63058]  Exception when processing sample
java.lang.NullPointerException: null
	at org.jsoup.parser.HtmlTreeBuilderState$9.process(HtmlTreeBuilderState.java:934) ~[jsoup-1.13.1.jar:?]
	at org.jsoup.parser.HtmlTreeBuilder.process(HtmlTreeBuilder.java:141) ~[jsoup-1.13.1.jar:?]
	at org.jsoup.parser.HtmlTreeBuilderState$13.anythingElse(HtmlTreeBuilderState.java:1172) ~[jsoup-1.13.1.jar:?]
	at org.jsoup.parser.HtmlTreeBuilderState$13.process(HtmlTreeBuilderState.java:1132) ~[jsoup-1.13.1.jar:?]
	at org.jsoup.parser.HtmlTreeBuilder.process(HtmlTreeBuilder.java:136) ~[jsoup-1.13.1.jar:?]
	at org.jsoup.parser.HtmlTreeBuilderState$10.process(HtmlTreeBuilderState.java:1019) ~[jsoup-1.13.1.jar:?]
	at org.jsoup.parser.HtmlTreeBuilder.process(HtmlTreeBuilder.java:136) ~[jsoup-1.13.1.jar:?]
	at org.jsoup.parser.TreeBuilder.runParser(TreeBuilder.java:66) ~[jsoup-1.13.1.jar:?]
	at org.jsoup.parser.TreeBuilder.parse(TreeBuilder.java:47) ~[jsoup-1.13.1.jar:?]
	at org.jsoup.parser.Parser.parse(Parser.java:107) ~[jsoup-1.13.1.jar:?]
	at org.jsoup.Jsoup.parse(Jsoup.java:58) ~[jsoup-1.13.1.jar:?]
	at org.apache.jmeter.protocol.http.proxy.FormCharSetFinder.addFormActionsAndCharSet(FormCharSetFinder.java:55) ~[ApacheJMeter_http.jar:5.3]
	at org.apache.jmeter.protocol.http.proxy.Proxy.addFormEncodings(Proxy.java:603) ~[ApacheJMeter_http.jar:5.3]
	at org.apache.jmeter.protocol.http.proxy.Proxy.run(Proxy.java:239) [ApacheJMeter_http.jar:5.3]
Comment 1 Felix Schumacher 2020-08-07 11:18:33 UTC
Which version of JMeter, JDK, OS are you using?
Comment 2 doa 2020-08-07 11:22:11 UTC
JMeter: 5.3 (also tried with the current nightly, same problem)
OS: Windows 10
Java: Adopt OpenJDK 64Bit 8.0.242.08

Btw i just found out that when i load the unminified version of the js file everything works as expected.
Comment 3 Felix Schumacher 2020-08-07 15:25:47 UTC
Does it help, if you exclude the Javascript resources from recording?

In the code it looks like we skip only binary types in o.a.j.protocol.http.proxy.Proxy#addFormEncodings and hope that jsoup will throw an HTMLParseException on any error. That seems not to be the case in with this Javascript code.

So we should probably narrow the parsing down a bit more and exclude javascript from the parsing, too.

Would you be able to patch and compile a JMeter version for yourself?
Comment 4 Felix Schumacher 2020-08-07 15:27:16 UTC
Created attachment 37385 [details]
Skip javascript (and JSON) in charset guessing by forms
Comment 5 doa 2020-08-07 17:13:06 UTC
No i'm not able to patch and compile my own version.
But its not a huge problem since my "workaround" is good enough for now.

I'm happy if your patch makes it into a future version.
Thx for the quick help.
Comment 6 Felix Schumacher 2020-08-08 17:38:47 UTC
Would you be able to test next nightly and report if it solves your problem?

commit 81a6d678d725e98d5325b8d9429345f40d35f845
AuthorDate: Sat Aug 8 19:31:22 2020 +0200

    Exclude Javascript and JSON from parsing for charsets from forms by proxy
    
    JSoup currently has problems parsing some non HTML code - for which
    it was probably never intended. So skip known not HTML resources
    in the proxy recording logic, when character encodings for forms
    are extracted.
    
    Bugzilla Id: 64653
---
 .../apache/jmeter/protocol/http/proxy/Proxy.java    | 21 ++++++++++++++++++++-
 xdocs/changes.xml                                   |  1 +
 2 files changed, 21 insertions(+), 1 deletion(-)
Comment 7 doa 2020-08-10 13:09:59 UTC
Works with r1812-a1bc13f1d626fc9d58bb9d5124a72bd07237374d (Nighlty from 2020-08-10).

Thank you very much for the fast fix. Great work.