Bug 44375 - [Regression in 3.0.2] Unable to read an Excel file
Summary: [Regression in 3.0.2] Unable to read an Excel file
Alias: None
Product: POI
Classification: Unclassified
Component: HPSF (show other bugs)
Version: unspecified
Hardware: All All
: P2 regression (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2008-02-07 08:46 UTC by Laurent Poublan
Modified: 2008-02-08 04:01 UTC (History)
0 users

xls file not readable with POI HSSF 3.0.2 (ok with 3.0.1) (16.50 KB, application/vnd.ms-excel)
2008-02-07 08:51 UTC, Laurent Poublan

Note You need to log in before you can comment on or make changes to this bug.
Description Laurent Poublan 2008-02-07 08:46:06 UTC
Impossible to create an HSSFWorkbook from an excel file.
There is a StringIndexOutOfBoundsException in POIDocument.readProperties().
It worked with POI 3.0.1.

Here is the full stack trace:
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String
index out of range: 541934449
	at java.lang.String.checkBounds(String.java:372)
	at java.lang.String.<init>(String.java:404)
	at org.apache.poi.hpsf.Property.readDictionary(Property.java:257)
	at org.apache.poi.hpsf.Property.<init>(Property.java:153)
	at org.apache.poi.hpsf.Section.<init>(Section.java:291)
	at org.apache.poi.hpsf.PropertySet.init(PropertySet.java:454)
	at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:249)
	at org.apache.poi.hpsf.PropertySetFactory.create(PropertySetFactory.java:61)
	at org.apache.poi.POIDocument.getPropertySet(POIDocument.java:97)
	at org.apache.poi.POIDocument.readProperties(POIDocument.java:77)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:171)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:148)
	at Test.<init>(Test.java:18)
	at Test.main(Test.java:38)
Comment 1 Laurent Poublan 2008-02-07 08:51:29 UTC
Created attachment 21493 [details]
xls file not readable with POI HSSF 3.0.2 (ok with 3.0.1)

To reproduce, simply try:
POIFSFileSystem fs=new POIFSFileSystem(new
new HSSFWorkbook(fs); // this line throws a StringIndexOutOfBoundsException
Comment 2 Nick Burch 2008-02-07 09:03:18 UTC
Hmm, no changes to org.apache.poi.hpsf.Property have been made since 2006, so
it's not anything obvious there 
Comment 3 Nick Burch 2008-02-07 09:13:57 UTC
I don't know if your document has a corrupt SummaryInformation stream, or if
there's a bug in the SummaryInformation stream parsing.

I've added a disabled failing testcase for it to svn trunk, which can be a start
for someone to take a look at why the SummaryInformation isn't working.

(3.0.1 didn't do document metadata by default, but 3.0.2 does)
Comment 4 Josh Micich 2008-02-07 14:56:31 UTC
It seems that the method:
org.apache.poi.hpsf.Property.readDictionary(byte[], long, int, int)
is not exercised by any of the existing junits.

When comparing the execution flow of this bug with the successful test cases, 
divergence can be seen at line 151 of the constructor - 
org.apache.poi.hpsf.Property.Property(long, byte[], long, int, int)
For the sample spreadsheet, the Property constructor is invoked successfully 
19 times before this.id==0 and readDictionary() gets invoked.
Comment 5 Rainer Klute 2008-02-08 02:19:57 UTC
The properties are broken. Neither the Windows XP Explorer nor Excel are able to
show them. But at least they don't fail. I am going to implement the same
behaviour in HPSF.
Comment 6 Rainer Klute 2008-02-08 04:01:52 UTC
Fixed with revision 619765. HPSF now copes with a broken dictionary in Document
Summary Information streams. RuntimeExceptions that occured when trying to read
bogus data are now caught. Dictionary entries up to but not including the bogus
one are preserved, the rest is ignored.