Bug 49374

Summary: Encoding of embedded element URLs depend on the file.encoding property
Product: JMeter Reporter: Pieter Ennes <apache.org>
Component: HTTPAssignee: JMeter issues mailing list <issues>
Severity: normal CC: p.mouawad
Priority: P2    
Version: 2.3.4   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Attachments: Example script
Output UTF-8
Output ISO-8859-1
Content ISO-8859-1

Description Pieter Ennes 2010-06-03 05:27:57 UTC
Created attachment 25512 [details]
Example script

Running the same script file twice, only with different character set in the shell environment leads to different results:

$ LANG=en_GB.utf8 jmeter -n -t embedded-encoding.jmx -l embedded-encoding-utf8.jtl

<httpSample t="338" lt="308" ts="1275556867434" s="true" lb="http://www.rnw.nl/data/files/imagecache/list/images/lead/teaser%20Pygmalion%20©%20Julietta%20Cervantes.jpg" rc="200" rm="OK" tn="Thread Group 1-1" dt="bin" by="6868"/>

$ LANG=en_GB.iso-8859-1 jmeter -n -t embedded-encoding.jmx -l embedded-encoding-iso88591.jtl

<httpSample t="954" lt="0" ts="1275556913357" s="false" lb="http://www.rnw.nl/data/files/imagecache/list/images/lead/teaser%20Pygmalion%20��%20Julietta%20Cervantes.jpg" rc="404" rm="Not Found" tn="Thread Group 1-1" dt="text" by="0"/>

(Note the encoding of the copyright sign)

It seems that the URLs of the downloaded embedded resources are constructed using the shell's environment. I would expect the containing/parent page character set should be used instead.

Scripts and results are attached...
Comment 1 Pieter Ennes 2010-06-03 05:28:43 UTC
Created attachment 25513 [details]
Output UTF-8
Comment 2 Pieter Ennes 2010-06-03 05:29:14 UTC
Created attachment 25514 [details]
Output ISO-8859-1
Comment 3 Sebb 2010-06-05 18:03:21 UTC
I don't get the same problem, but it looks like the page has probably changed.

If you get the same problem with another page, please save the original page contents using the Save Response to File Listener, and attach that here.
Comment 4 Pieter Ennes 2010-06-06 07:36:03 UTC
Created attachment 25532 [details]
Content ISO-8859-1

That seems to lead to identical files for both charsets. (So, I'm only attaching one)
Comment 5 Pieter Ennes 2010-06-06 07:37:43 UTC
Sorry, forgot to mention that currently it can be reproduced by changing the path in the script to:


And at some point this will go to page=4 etc...
Comment 6 Sebb 2010-06-06 08:37:00 UTC
Using page=3 loads the image OK for me when I use file.encoding=iso-8859-1, although the request display and JTL output is not correct.

If I use file.encoding=iso (invalid) then JMeter 2.3.4 does fail to load the page.

Can you provide the first few lines of the jmeter.log files, upto at least
jmeter.JMeter: JMeterHome ?
Comment 7 Pieter Ennes 2010-06-06 09:47:21 UTC
Running with:

$ LANG=en_GB.UTF-8 jmeter

2010/06/06 14:41:32 INFO  - jmeter.util.JMeterUtils: Setting Locale to en_GB 
2010/06/06 14:41:32 INFO  - jmeter.JMeter: Copyright (c) 1998-2009 The Apache Software Foundation 
2010/06/06 14:41:32 INFO  - jmeter.JMeter: Version 2.3.4 
2010/06/06 14:41:32 INFO  - jmeter.JMeter: java.version=1.6.0_18 
2010/06/06 14:41:32 INFO  - jmeter.JMeter: java.vm.name=OpenJDK 64-Bit Server VM 
2010/06/06 14:41:32 INFO  - jmeter.JMeter: os.name=Linux 
2010/06/06 14:41:32 INFO  - jmeter.JMeter: os.arch=amd64 
2010/06/06 14:41:32 INFO  - jmeter.JMeter: os.version=2.6.32-22-generic 
2010/06/06 14:41:32 INFO  - jmeter.JMeter: file.encoding=UTF-8 
2010/06/06 14:41:32 INFO  - jmeter.JMeter: Default Locale=English (United Kingdom) 
2010/06/06 14:41:32 INFO  - jmeter.JMeter: JMeter  Locale=English (United Kingdom) 
2010/06/06 14:41:32 INFO  - jmeter.JMeter: JMeterHome=/usr/share/jmeter 

And both of:
$ LANG=en_GB.ISO-8859-1 jmeter
$ LANG=iso-8859-1 jmeter

2010/06/06 14:43:12 INFO  - jmeter.util.JMeterUtils: Setting Locale to en_US 
2010/06/06 14:43:12 INFO  - jmeter.JMeter: Copyright (c) 1998-2009 The Apache Software Foundation 
2010/06/06 14:43:12 INFO  - jmeter.JMeter: Version 2.3.4 
2010/06/06 14:43:12 INFO  - jmeter.JMeter: java.version=1.6.0_18 
2010/06/06 14:43:12 INFO  - jmeter.JMeter: java.vm.name=OpenJDK 64-Bit Server VM 
2010/06/06 14:43:12 INFO  - jmeter.JMeter: os.name=Linux 
2010/06/06 14:43:12 INFO  - jmeter.JMeter: os.arch=amd64 
2010/06/06 14:43:12 INFO  - jmeter.JMeter: os.version=2.6.32-22-generic 
2010/06/06 14:43:12 INFO  - jmeter.JMeter: file.encoding=ANSI_X3.4-1968 
2010/06/06 14:43:12 INFO  - jmeter.JMeter: Default Locale=English (United States) 
2010/06/06 14:43:12 INFO  - jmeter.JMeter: JMeter  Locale=English (United States) 
2010/06/06 14:43:12 INFO  - jmeter.JMeter: JMeterHome=/usr/share/jmeter
Comment 8 Philippe Mouawad 2011-12-28 14:02:53 UTC
Issue is in HtmlParserHTMLParser#getEmbeddedResourceURLs in following code:
String contents = new String(html); 

Using this code, file.encoding of platform is used instead of page encoding.

By the way, to reproduce issue , I just put :

I don't reproduce it by changing LANG variable because it does not lead to same encoding as bug reporter's one.
Comment 9 Philippe Mouawad 2011-12-28 14:07:51 UTC
Should we use:
SampleResult#getDataEncoding ?
And if null, default to "sampleresult.default.encoding" jmeter property ?
Comment 10 Philippe Mouawad 2011-12-28 14:34:37 UTC
Date: Wed Dec 28 14:33:41 2011
New Revision: 1225193

URL: http://svn.apache.org/viewvc?rev=1225193&view=rev
Bug 49374 - Encoding of embedded element URLs depend on the file.encoding property
Now using SampleResult#getDataEncodingWithDefault() to avoid relying on file.encoding of the JVM.
Modified HTMLParserTestFile_2.xml to take into account the impact of encoding change.