53109 – NameCommentRecord can not handle multibyte characters

Bug 53109 - NameCommentRecord can not handle multibyte characters

Summary: NameCommentRecord can not handle multibyte characters

Status:	RESOLVED FIXED

Alias:	None

Product:	POI
Classification:	Unclassified
Component:	HSSF (show other bugs)
Version:	3.8-FINAL
Hardware:	PC All

Importance:	P2 normal with 1 vote (vote)
Target Milestone:	---
Assignee:	POI Developers List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2012-04-20 01:44 UTC by Shunji Konishi
Modified:	2015-06-01 20:51 UTC (History)
CC List:	0 users

Attachments
Testdata (19.00 KB, application/vnd.ms-excel) 2012-04-20 01:44 UTC, Shunji Konishi	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Shunji Konishi 2012-04-20 01:44:18 UTC

Created attachment 28643 [details]
Testdata

The following exception has occurred when opening the file.

----

org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException: Initialisation of record 0x894 left 10 bytes remaining still to be read.
	at org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:156)
	at org.apache.poi.hssf.record.RecordFactoryInputStream.nextRecord(RecordFactoryInputStream.java:231)
	at org.apache.poi.hssf.record.RecordFactory.createRecords(RecordFactory.java:443)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:285)
	at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:88)

---
Error file has following conditions.

- Has named cell.
- Named cell has comment.
- Name of named cell or comment has multibyte characters.

I fixed NameCommentRecord as follows.

----
//Constructor.  NameCommentRecord(final RecordInputStream ris)
    if (in.readByte() == 0) {
        field_6_name_text = StringUtil.readCompressedUnicode(in, field_4_name_length);
    } else {
        field_6_name_text = StringUtil.readUnicodeLE(in, field_4_name_length);
    }
    if (in.readByte() == 0) {
        field_7_comment_text = StringUtil.readCompressedUnicode(in, field_5_comment_length);
    } else {
        field_7_comment_text = StringUtil.readUnicodeLE(in, field_5_comment_length);
    }
---
So I can read file.
I think the serialize method have to change too.

Please check my changes.

regards.

Shunji Konishi

Comment 1 Dominik Stadler 2015-06-01 20:51:28 UTC

This is finally fixed now via r1682999 by correctly reading/writing the unicode-flag of the NameCommentRecord as per the spec.