Bug 36898 - regular expression extractor encode again a String encoded in UTF-8
Summary: regular expression extractor encode again a String encoded in UTF-8
Status: RESOLVED WORKSFORME
Alias: None
Product: JMeter - Now in Github
Classification: Unclassified
Component: Main (show other bugs)
Version: 2.1
Hardware: PC Windows XP
: P2 normal (vote)
Target Milestone: ---
Assignee: JMeter issues mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-10-03 16:50 UTC by Darius Hachimarave
Modified: 2016-09-27 09:06 UTC (History)
1 user (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Darius Hachimarave 2005-10-03 16:50:17 UTC
In my threadgroup, I use SOAP/XML-RPC to request my server.
I use regular expression extractor for generating the next request with the
server response.

This work fine if there is no special symbol but when there is an UTF-8 special
symbol in the server response, the extraction doesn't give the String I'm
waiting for.

For example, if the server response contains a String like "<City>Vézelay</City>"
With the regular expression extractor I would like to extract the String
"Vézelay" but it give me the String "Vézelay"
In hexadecimal, the special symbol is C3 00 A9 00 and it become C3 B3 C2 A9
after extraction

It seems that the symbol is encoding another time in utf-8 or in another format.

Note : The bug 27032 (witch is in resolved state) has some similarities.
Comment 1 Sebb 2005-10-03 17:19:54 UTC
Does the response get saved correctly if you configure a listener to "Save 
Response Data"?

Or does this also cause the data to be mangled?
Comment 2 peter lin 2005-10-03 17:25:52 UTC
Looking at webserviceSampler, it currently gets a BufferedReader from apache
soap, so it's unlikely the sampler is the problem.

SOAPTransport st = msg.getSOAPTransport();
RESULT.setDataType(SampleResult.TEXT);
BufferedReader br = null;
// check to see if SOAPTransport is not nul and receive is
// also not null. hopefully this will improve the error
// reporting. 5/13/05 peter lin
if (st != null && st.receive() != null) {
	br = st.receive();
	if (this.getPropertyAsBoolean(READ_RESPONSE)) {
	        StringBuffer buf = new StringBuffer();
		String line;
		while ((line = br.readLine()) != null) {
			buf.append(line);
		}
		RESULT.sampleEnd();
		// set the response
		RESULT.setResponseData(buf.toString().getBytes());

If apache soap doesn't create a reader using the correct encoding, it "could"
cause the problem you see. I don't know apache soap well enough to say with any
certainty that is the case. It could also be a limitation of the assertion.

peter
Comment 3 Darius Hachimarave 2005-10-03 18:09:01 UTC
(In reply to comment #1)
> Does the response get saved correctly if you configure a listener to "Save 
> Response Data"?
> 
> Or does this also cause the data to be mangled?

Yes if I save the response data in a file the datas are not transformed. They
are saved correctly.
Comment 4 Darius Hachimarave 2005-10-05 11:09:10 UTC
I don't think that soap is 
(In reply to comment #2)
> If apache soap doesn't create a reader using the correct encoding, it "could"
> cause the problem you see. I don't know apache soap well enough to say with any
> certainty that is the case. It could also be a limitation of the assertion.
> 
> peter


I don't think that soap doesn't use the correct with its reader encoding.
Because in this case, the server response that I read in JMeter (when I write
the response data in a file) should be bad encoding (just like the String I read
 using regular expression regulator)

Maybe there is a way to spécify the String encoding when i use a regular
expression extractor.
Comment 5 peter lin 2005-10-05 15:37:53 UTC
it's possible you're right and the assertion isn't handling the encoding
correctly. perhaps sebb or mike will know better. jmeter uses oro-matcher, so it
could be we need to set the encoding?  if  I have time tonight I'll take a look
at oro matcher api.

peter
Comment 6 Darius Hachimarave 2005-10-14 15:48:10 UTC
Ok, I think I've finally found where is the bug.
It's in the class RegexExtractor of the package org.apache.jmeter.extractor
in the process() method there is a creation of a PatternMatcherInput whith the
last response data of the last result.
At this place, the response data is passed without any string encoding.

To correct this bug, i've replace the line 
input = new PatternMatcherInput(useHeaders() ? context.getPreviousResult()
					.getResponseHeaders() : new
String(context.getPreviousResult().getResponseData()));

by 
try {
			input = new PatternMatcherInput(useHeaders() ? context.getPreviousResult()
					.getResponseHeaders() : new
String(context.getPreviousResult().getResponseData(),context.getPreviousResult().getDataEncoding()));
		} catch (UnsupportedEncodingException e2) {
			input = new PatternMatcherInput(useHeaders() ? context.getPreviousResult()
					.getResponseHeaders() : new
String(context.getPreviousResult().getResponseData()));
		}

I don't know if it's THE good way to correct it, but now, the String I extract
are encoded correctly.

I'm not really accustomed with the way to modify the source code one a jakarta
project. Can someone do this? Am I habilited to do this ?
Comment 7 Sebb 2005-11-13 03:10:01 UTC
Fixed in 2.1 branch code. Will be in 2.1.2
Comment 8 Matthew 2015-03-17 14:46:04 UTC
I am using Jmeter 2.11 and I am seeing this same issue with the regex extractor.
Comment 9 Felix Schumacher 2015-05-10 18:51:10 UTC
Could you tell us exactly what you have done, what you expect and what you have seen?

Maybe your server is giving out no or the wrong encoding?
Comment 10 Philippe Mouawad 2016-09-27 09:06:48 UTC
Closed as no feedback from user and looking at current code , encoding is correctly used.
Comment 11 The ASF infrastructure team 2022-09-24 20:37:35 UTC
This issue has been migrated to GitHub: https://github.com/apache/jmeter/issues/1615