Bug 58247 - Some UTF-16 characters are not handled correctly (likely surrogate pair related)
Summary: Some UTF-16 characters are not handled correctly (likely surrogate pair related)
Status: RESOLVED DUPLICATE of bug 59268
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.12-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on: 54084 59268
Blocks:
  Show dependency tree
 
Reported: 2015-08-16 15:03 UTC by raveufo
Modified: 2017-09-21 16:11 UTC (History)
0 users



Attachments
Sample input (7.98 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2015-08-17 01:09 UTC, raveufo
Details
Sample output (5.97 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2015-08-17 01:10 UTC, raveufo
Details
Reproduce source code (1.43 KB, text/plain)
2015-08-17 01:13 UTC, raveufo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description raveufo 2015-08-16 15:03:00 UTC
After read .xlsx, UTF-16 characters was still displayed well. But after written back to disk again, it become "??" at that point before.

Tried to convert .xlsx file to .xls by Excel, opened and saved with HSSF but UTF-16 characters was still displayed properply. (In this case, I've used UnicodeString to set UTF-16 characters to cell).

I've checked with 3.12 latest source and same phenomena can be produced. So I think it only happens in XSSF or SXSSF.
Comment 1 Dominik Stadler 2015-08-16 17:18:13 UTC
Please provide some more details so other people can reproduce the problem, i.e. please attache sample files and a self-sufficient piece of code that reproduces the problem, ideally as a unit-test so we can add it to the test-suite for poi.
Comment 2 raveufo 2015-08-17 01:09:55 UTC
Created attachment 33002 [details]
Sample input
Comment 3 raveufo 2015-08-17 01:10:27 UTC
Created attachment 33003 [details]
Sample output
Comment 4 raveufo 2015-08-17 01:13:10 UTC
Created attachment 33004 [details]
Reproduce source code
Comment 5 raveufo 2015-08-17 01:19:11 UTC
(In reply to Dominik Stadler from comment #1)
> Please provide some more details so other people can reproduce the problem,
> i.e. please attache sample files and a self-sufficient piece of code that
> reproduces the problem, ideally as a unit-test so we can add it to the
> test-suite for poi.

I've attached sample input, sample output and the source code I used to reproduce this problem.

If I convert the sample input above to .xls file and read/write with HSSF, characters in  output file will be the same with input file
Comment 6 raveufo 2015-08-17 01:20:14 UTC
(In reply to Dominik Stadler from comment #1)
> Please provide some more details so other people can reproduce the problem,
> i.e. please attache sample files and a self-sufficient piece of code that
> reproduces the problem, ideally as a unit-test so we can add it to the
> test-suite for poi.

I've attached sample input, sample output and the source code I used to reproduce this problem.

If I convert the sample input above to .xls file and read/write with HSSF, characters in  output file will be the same with input file
Comment 7 Dominik Stadler 2015-08-17 20:41:56 UTC
This is likely a similar issue as bug 54084 where we debugged the problem to some degree and it seems the XMLBeans third pary library is involved here.
Comment 8 Dominik Stadler 2017-09-21 16:11:47 UTC
As far as I see this will be fixed only by an updated XMLBeans, the related discussion for this is at bug 59268.

*** This bug has been marked as a duplicate of bug 59268 ***