Bug 58247

Summary: Some UTF-16 characters are not handled correctly (likely surrogate pair related)
Product: POI Reporter: raveufo <sakurotawa>
Component: XSSFAssignee: POI Developers List <dev>
Status: RESOLVED DUPLICATE    
Severity: normal    
Priority: P2    
Version: 3.12-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: All   
Bug Depends on: 54084, 59268    
Bug Blocks:    
Attachments: Sample input
Sample output
Reproduce source code

Description raveufo 2015-08-16 15:03:00 UTC
After read .xlsx, UTF-16 characters was still displayed well. But after written back to disk again, it become "??" at that point before.

Tried to convert .xlsx file to .xls by Excel, opened and saved with HSSF but UTF-16 characters was still displayed properply. (In this case, I've used UnicodeString to set UTF-16 characters to cell).

I've checked with 3.12 latest source and same phenomena can be produced. So I think it only happens in XSSF or SXSSF.
Comment 1 Dominik Stadler 2015-08-16 17:18:13 UTC
Please provide some more details so other people can reproduce the problem, i.e. please attache sample files and a self-sufficient piece of code that reproduces the problem, ideally as a unit-test so we can add it to the test-suite for poi.
Comment 2 raveufo 2015-08-17 01:09:55 UTC
Created attachment 33002 [details]
Sample input
Comment 3 raveufo 2015-08-17 01:10:27 UTC
Created attachment 33003 [details]
Sample output
Comment 4 raveufo 2015-08-17 01:13:10 UTC
Created attachment 33004 [details]
Reproduce source code
Comment 5 raveufo 2015-08-17 01:19:11 UTC
(In reply to Dominik Stadler from comment #1)
> Please provide some more details so other people can reproduce the problem,
> i.e. please attache sample files and a self-sufficient piece of code that
> reproduces the problem, ideally as a unit-test so we can add it to the
> test-suite for poi.

I've attached sample input, sample output and the source code I used to reproduce this problem.

If I convert the sample input above to .xls file and read/write with HSSF, characters in  output file will be the same with input file
Comment 6 raveufo 2015-08-17 01:20:14 UTC
(In reply to Dominik Stadler from comment #1)
> Please provide some more details so other people can reproduce the problem,
> i.e. please attache sample files and a self-sufficient piece of code that
> reproduces the problem, ideally as a unit-test so we can add it to the
> test-suite for poi.

I've attached sample input, sample output and the source code I used to reproduce this problem.

If I convert the sample input above to .xls file and read/write with HSSF, characters in  output file will be the same with input file
Comment 7 Dominik Stadler 2015-08-17 20:41:56 UTC
This is likely a similar issue as bug 54084 where we debugged the problem to some degree and it seems the XMLBeans third pary library is involved here.
Comment 8 Dominik Stadler 2017-09-21 16:11:47 UTC
As far as I see this will be fixed only by an updated XMLBeans, the related discussion for this is at bug 59268.

*** This bug has been marked as a duplicate of bug 59268 ***