Bug 53109 - NameCommentRecord can not handle multibyte characters
Summary: NameCommentRecord can not handle multibyte characters
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.8-FINAL
Hardware: PC All
: P2 normal with 1 vote (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2012-04-20 01:44 UTC by Shunji Konishi
Modified: 2015-06-01 20:51 UTC (History)
0 users

Testdata (19.00 KB, application/vnd.ms-excel)
2012-04-20 01:44 UTC, Shunji Konishi

Note You need to log in before you can comment on or make changes to this bug.
Description Shunji Konishi 2012-04-20 01:44:18 UTC
Created attachment 28643 [details]

The following exception has occurred when opening the file.


org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException: Initialisation of record 0x894 left 10 bytes remaining still to be read.
	at org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:156)
	at org.apache.poi.hssf.record.RecordFactoryInputStream.nextRecord(RecordFactoryInputStream.java:231)
	at org.apache.poi.hssf.record.RecordFactory.createRecords(RecordFactory.java:443)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:285)
	at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:88)

Error file has following conditions.

- Has named cell.
- Named cell has comment.
- Name of named cell or comment has multibyte characters.

I fixed NameCommentRecord as follows.

//Constructor.  NameCommentRecord(final RecordInputStream ris)
    if (in.readByte() == 0) {
        field_6_name_text = StringUtil.readCompressedUnicode(in, field_4_name_length);
    } else {
        field_6_name_text = StringUtil.readUnicodeLE(in, field_4_name_length);
    if (in.readByte() == 0) {
        field_7_comment_text = StringUtil.readCompressedUnicode(in, field_5_comment_length);
    } else {
        field_7_comment_text = StringUtil.readUnicodeLE(in, field_5_comment_length);
So I can read file.
I think the serialize method have to change too.

Please check my changes.


Shunji Konishi
Comment 1 Dominik Stadler 2015-06-01 20:51:28 UTC
This is finally fixed now via r1682999 by correctly reading/writing the unicode-flag of the NameCommentRecord as per the spec.