Bug 42652 - HSSF cannot read excel file, Record size problems
Summary: HSSF cannot read excel file, Record size problems
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: unspecified
Hardware: Other other
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2007-06-13 04:49 UTC by Rainer Schwarze
Modified: 2008-05-10 18:05 UTC (History)
0 users

Excel file with unexpected record sizes (232.15 KB, application/vnd.ms-excel)
2007-06-13 04:51 UTC, Rainer Schwarze
Test case (583 bytes, application/octet-stream)
2007-06-13 04:52 UTC, Rainer Schwarze

Note You need to log in before you can comment on or make changes to this bug.
Description Rainer Schwarze 2007-06-13 04:49:08 UTC
The attached Excel file has several problems related to HSSF. First it is not in
OLE2-format, but that can be solved by wrapping it with a POIFSFileSystem.
Second more relevant problem is, several records in the Excel file are shorter
than HSSF expects them. As of my understanding, the excel file is somewhat
"non-standard". (Maybe the issues with this file are related to bug #42564.)

Attached are the Excel file and test code to show the problem. 

I posted a message with more details about what I found out so far to the
mailing list. If that should be included/attached here, please say so.

Best wishes,
Rainer Schwarze
Comment 1 Rainer Schwarze 2007-06-13 04:51:52 UTC
Created attachment 20338 [details]
Excel file with unexpected record sizes

This excel file is not in OLE2-format, to read it with HSSF one needs to wrap
it inside a POIFSFileSystem.
Comment 2 Rainer Schwarze 2007-06-13 04:52:39 UTC
Created attachment 20339 [details]
Test case
Comment 3 Josh Micich 2008-05-10 18:05:44 UTC
Tried the example+test file in POI 3.1-beta1.  First crash is in DimensionsRecord, where POI expects to read 14 bytes but only gets 10. That's a strong hint that the actual document is really BIFF3-BIFF5 format.  You suggested that several records are shorter than expected, which tends to support this conclusion.

Setting the workbook stream name to "Workbook" (which would indicate BIFF8) is not enough.  POI can only read spreadsheets that *fully* meet the BIFF8 spec.

(No, bug 42564 was unrelated.  It was predominantly about ArrayPtg encoding issues)