Summary: | Comment.setAuthor does not encode multi-byte characters (Chinese) well | ||
---|---|---|---|
Product: | POI | Reporter: | LiuYan 刘研 <lovetide> |
Component: | HSSF | Assignee: | POI Developers List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | P2 | ||
Version: | 3.7-dev | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | Windows XP | ||
Attachments: |
Test class to show this issue
Patch test for unicode on setAuthor() Patch for defaulting to multi-byte |
Description
LiuYan 刘研
2010-07-16 04:19:00 UTC
Just to sum up the thread: The serialize() method in org.apache.poi.hssf.record.NoteRecord is not calling the StringUtil.putUnicodeLE() method, because the field_5_hasMultibyte instance variable is false, even when the author field contains double-byte characters. In fact other than when a file is read field_5_hasMultibyte is never set to true. Two possible solutions: - add logic to work out if we have non-latin characters, since the issue is not just affecting double-byte characters - set the field_5_hasMultibyte variable to be true and always write out unicode characters, unless there is a usage scenario this could break. I tested on MacOS X 10.6.4 and used Excel 2008 to see the result. Changing the variable to true resulted in Chinese text to appear correctly for the author. BTW We should probably be extending the unit tests for ensuring non-latin characters are getting stored properly. Created attachment 25768 [details]
Patch test for unicode on setAuthor()
Added patch that tests for unicode on setAuthor()
Created attachment 25769 [details]
Patch for defaulting to multi-byte
Patch for defaulting to multi-byte
Thanks for investigating this The usual way in most records is to update the multibyte flag when updating the string I'll make this change, and write a unit test for it shortly |