Bug 44375

Summary: [Regression in 3.0.2] Unable to read an Excel file
Product: POI Reporter: Laurent Poublan <lpoublan>
Component: HPSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: regression    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: xls file not readable with POI HSSF 3.0.2 (ok with 3.0.1)

Description Laurent Poublan 2008-02-07 08:46:06 UTC
Impossible to create an HSSFWorkbook from an excel file.
There is a StringIndexOutOfBoundsException in POIDocument.readProperties().
It worked with POI 3.0.1.

Here is the full stack trace:
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String
index out of range: 541934449
	at java.lang.String.checkBounds(String.java:372)
	at java.lang.String.<init>(String.java:404)
	at org.apache.poi.hpsf.Property.readDictionary(Property.java:257)
	at org.apache.poi.hpsf.Property.<init>(Property.java:153)
	at org.apache.poi.hpsf.Section.<init>(Section.java:291)
	at org.apache.poi.hpsf.PropertySet.init(PropertySet.java:454)
	at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:249)
	at org.apache.poi.hpsf.PropertySetFactory.create(PropertySetFactory.java:61)
	at org.apache.poi.POIDocument.getPropertySet(POIDocument.java:97)
	at org.apache.poi.POIDocument.readProperties(POIDocument.java:77)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:171)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:148)
	at Test.<init>(Test.java:18)
	at Test.main(Test.java:38)
Comment 1 Laurent Poublan 2008-02-07 08:51:29 UTC
Created attachment 21493 [details]
xls file not readable with POI HSSF 3.0.2 (ok with 3.0.1)

To reproduce, simply try:
POIFSFileSystem fs=new POIFSFileSystem(new
FileInputStream("C:/temp/test.xls"));
new HSSFWorkbook(fs); // this line throws a StringIndexOutOfBoundsException
Comment 2 Nick Burch 2008-02-07 09:03:18 UTC
Hmm, no changes to org.apache.poi.hpsf.Property have been made since 2006, so
it's not anything obvious there 
Comment 3 Nick Burch 2008-02-07 09:13:57 UTC
I don't know if your document has a corrupt SummaryInformation stream, or if
there's a bug in the SummaryInformation stream parsing.

I've added a disabled failing testcase for it to svn trunk, which can be a start
for someone to take a look at why the SummaryInformation isn't working.

(3.0.1 didn't do document metadata by default, but 3.0.2 does)
Comment 4 Josh Micich 2008-02-07 14:56:31 UTC
It seems that the method:
org.apache.poi.hpsf.Property.readDictionary(byte[], long, int, int)
is not exercised by any of the existing junits.

When comparing the execution flow of this bug with the successful test cases, 
divergence can be seen at line 151 of the constructor - 
org.apache.poi.hpsf.Property.Property(long, byte[], long, int, int)
For the sample spreadsheet, the Property constructor is invoked successfully 
19 times before this.id==0 and readDictionary() gets invoked.
Comment 5 Rainer Klute 2008-02-08 02:19:57 UTC
The properties are broken. Neither the Windows XP Explorer nor Excel are able to
show them. But at least they don't fail. I am going to implement the same
behaviour in HPSF.
Comment 6 Rainer Klute 2008-02-08 04:01:52 UTC
Fixed with revision 619765. HPSF now copes with a broken dictionary in Document
Summary Information streams. RuntimeExceptions that occured when trying to read
bogus data are now caught. Dictionary entries up to but not including the bogus
one are preserved, the rest is ignored.