Bug 62612 - CSV Data set config is not reading in UTF-8
Summary: CSV Data set config is not reading in UTF-8
Status: RESOLVED INVALID
Alias: None
Product: JMeter - Now in Github
Classification: Unclassified
Component: Main (show other bugs)
Version: 4.0
Hardware: All All
: P2 normal (vote)
Target Milestone: ---
Assignee: JMeter issues mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-09 13:32 UTC by Yaniv Hemi
Modified: 2018-08-13 20:53 UTC (History)
0 users



Attachments
Testplan that uses utf-8 encoded csv (6.55 KB, application/xml)
2018-08-10 10:34 UTC, Felix Schumacher
Details
UTF-8 encoded CSV file (11 bytes, text/csv)
2018-08-10 10:35 UTC, Felix Schumacher
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yaniv Hemi 2018-08-09 13:32:02 UTC
When using the CSV Data set config to read a CSV file and use them to sent REST request with the HTTP Request, the files that contain various languages such as Portuguese arrive as gibberish in special characters such as í.
when setting File encoding to UTF-8 and setting the content encoding of the HTTP Request to UTF-8 it still the same.

when you create with the same jmeter an HTTP request and put the same content of the file in the body of the request everything works. so the http request is good, its just the CSV reader that is not using UTF-8.
Comment 1 Yaniv Hemi 2018-08-09 13:34:35 UTC
the source files are saved as UTF-8
Comment 2 Felix Schumacher 2018-08-10 09:33:50 UTC
This has nothing to do with CSV Data Set. Try to add a Debug Sampler and look at the results in a View Results Tree.

Most likely your web app is expecting another character encoding in the HTTP parameters. It is probably better to discuss your problem on the users mailing list.
Comment 3 Yaniv Hemi 2018-08-10 10:16:25 UTC
That's not true and incorrect.
The application is fine and doesn't expect any field.
When I use postman everything works with no special headers.
Also in jmeter when I copy the content of the file and send it using the http request it works.
When it reads it using the CSV data set it even shows incorrect characters in jmeter UI.
Comment 4 Yaniv Hemi 2018-08-10 10:20:08 UTC
This is exactly the issue
https://stackoverflow.com/questions/4514433/jmeter-csv-data-set-is-corrupting-japanese-strings-stored-as-proper-utf-8-i-get

Notice that even the answers have replies that it doesn't work.
I've tried all of the replies but nothing worked, just like other people commented.
Comment 5 Yaniv Hemi 2018-08-10 10:23:33 UTC
View results tree also shows incorrect characters.
When sending the body of the Json in http request and not using CSV data set the results view is correct and good.
Comment 6 Felix Schumacher 2018-08-10 10:34:29 UTC
Created attachment 36084 [details]
Testplan that uses utf-8 encoded csv

This testplan uses a utf-8 encoded CSV file (that I will add next) and sends a HTTP request with an parameter read from that file to a local mirror server (that has to be started manually before executing the test).

I see no strange behavior here, that I would call a bug. The parameter gets encoded correctly in my opinion. Note that HTTP parameter know of no encoding and the webappp has to decide itself, which decoding it uses. It might help to set a HTTP header, but that depends on the receiving site.

Have a look at the result of the debug sampler to validate that the value is rean in correctly.
Comment 7 Felix Schumacher 2018-08-10 10:35:13 UTC
Created attachment 36085 [details]
UTF-8 encoded CSV file

Simple CSV file with Umlauts encoded in UTF-8
Comment 8 Felix Schumacher 2018-08-10 10:38:21 UTC
I still believe that there is no bug in CSV Data set and that the behavior you describe is a user induced (probably caused by bad documentation and UI ;) and that this is probably best handled on the users mailing list.

If you still think this is a bug, please provide more information. It would be helpful to see the headers and body that Postman sends and those that JMeter sends.
Comment 9 Yaniv Hemi 2018-08-10 14:09:13 UTC
I will try your example.
Did you try with Portuguese special characters?
I've tried mine with several languages and some worked like Deutsche but Portuguese didn't.
Thanks
Comment 10 Yaniv Hemi 2018-08-13 07:13:48 UTC
Felix,
i viewed your example and i know what is the issue.
i'm using the CSV to pass full path to the HTTP Request

the Http Request body is as follows
${__FileToString(${__eval(${JSON_FILE})},,)}

I've noticed that in your example, which is a GET request you passed the text from the CSV and it works.

mine is a little bit different as it passes it to the Http Request.
i've read about the FileToString API and i noticed that there is a second parameter which is encoding
i've rewritten it to be
${__FileToString(${__eval(${JSON_FILE})},UTF-8,)}

on the first test with several languages it worked.
i'll try it on more languages.

Thanks for the assistance
i will update once im done testing it
Comment 11 Yaniv Hemi 2018-08-13 11:41:34 UTC
Felix,
it worked. passing UTF-8 FileToString fixed it
${__FileToString(${__eval(${JSON_FILE})},UTF-8,)}

Thanks
Comment 12 Felix Schumacher 2018-08-13 20:53:40 UTC
Great, that you solved your problem. But note, the usage of the function __FileToString has nothing to do with CSV Data set.
Comment 13 The ASF infrastructure team 2022-09-24 20:38:14 UTC
This issue has been migrated to GitHub: https://github.com/apache/jmeter/issues/4838