Bug 46269 - BIFF2 XLS file not reading. "Invalid header signature"
Summary: BIFF2 XLS file not reading. "Invalid header signature"
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.5-dev
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2008-11-23 00:30 UTC by Syam Pillai
Modified: 2011-12-01 12:47 UTC (History)
1 user (show)

The problematic file. (862.45 KB, application/vnd.ms-excel)
2008-11-23 00:30 UTC, Syam Pillai
zip two java files (2.96 KB, application/zip)
2008-11-29 00:48 UTC, Josh Micich

Note You need to log in before you can comment on or make changes to this bug.
Description Syam Pillai 2008-11-23 00:30:45 UTC
Created attachment 22916 [details]
The problematic file.

Java command executed:

java org.apache.poi.poifs.filesystem.POIFSFileSystem test.xls out.xls

While reading a Excel file, I'm getting the following exception:

Exception in thread "main" java.io.IOException: Invalid header signature; read 4503608217567241, expected -2226271756974174256
        at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:112)
        at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:151)
        at org.apache.poi.poifs.filesystem.POIFSFileSystem.main(POIFSFileSystem.java:415)

The test.xls file is attached. The file was received from a Govt organization and there is no way to verify which version of M$ Excel they use. The file opens fine in Excel and openOffice.
Comment 1 Josh Micich 2008-11-28 19:18:19 UTC
The attached file is a BIFF2 file.  The only BIFF version POI supports is BIFF8.

When you open and re-save with Excel or OO, the file is silently converted to BIFF8.

Improved error message added in svn r721620.  I am marking this bug as 'WONTFIX' because it would be difficult to extend POI to handle previous BIFF versions.
Comment 2 Josh Micich 2008-11-29 00:48:30 UTC
Created attachment 22963 [details]
zip two java files

I took a look at the example file (Attachment id=22916) and saw that only 5 BIFF2 records types were present.  It was relatively easy to write a BIFF2 stream reader that would handle just these records.  This might be a viable solution path if your input stays relatively simple.  Here is some sample code to call the attached converter:

InputStream is = new FileInputStream("ex46269-22916.xls"); 
HSSFWorkbook wb = BIFF2To8Converter.convert(is, "Sheet1");