Bug 55692 - POI crashes with ....a BIFF8 'Workbook' entry. Is it really an excel file?
Summary: POI crashes with ....a BIFF8 'Workbook' entry. Is it really an excel file?
Status: RESOLVED INVALID
Alias: None
Product: POI
Classification: Unclassified
Component: POIFS (show other bugs)
Version: 3.9-FINAL
Hardware: PC All
: P2 major (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-10-23 02:21 UTC by bearbalu
Modified: 2014-03-27 22:43 UTC (History)
1 user (show)



Attachments
This is the xlsx that crashes (639.00 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2013-10-23 02:21 UTC, bearbalu
Details
This Excel crashes too.... (609.00 KB, application/vnd.openxmlformats-officedocument.spre)
2014-03-27 22:14 UTC, bearbalu
Details

Note You need to log in before you can comment on or make changes to this bug.
Description bearbalu 2013-10-23 02:21:53 UTC
Created attachment 30955 [details]
This is the xlsx that crashes

I have an xlsx file which I can open perfectly fine using Excel 2010. However, POI crashes with the following message. If I open the file and re-save it, the crash goes away. See the attached xlsx file. Interestingly, HSSF (not XSSF) is invoked by the WorkbookFactory. 

Additional clues to reproducing the issue: 

1. Started with an xls file -> NO crash -> I can e-mail this to someone -> 1.7 MB, can't upload it, and can't upload multiple attachments. 
2. I saved it once as xls (no content changed) -> NO crash
3. I "Save As" xlsx (no content changed)-> CRASHES(attached) 
4. If I open the xlsx and save it again (no content changes) -> NO crash.

If I skip step 2, the crash does NOT happen. 


==========================================================================

java.lang.IllegalArgumentException: The supplied POIFSFileSystem does not contain a BIFF8 'Workbook' entry. Is it really an excel file?
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.getWorkbookDirEntryName(HSSFWorkbook.java:222)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:263)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:243)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:187)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:322)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:303)
	at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:70)
Comment 1 Nick Burch 2013-10-23 09:44:41 UTC
Your file looks to be a password protected xlsx file, which somewhat confusingly get stored within an OLE2 structure (I'm sure Microsoft had their reasons....)

See http://poi.apache.org/encryption.html for how to read them

In r1534967 I've added a more helpful error message from HSSF if you give it an encrypted .xlsx file by mistake
Comment 2 bearbalu 2014-03-27 22:14:18 UTC
Created attachment 31452 [details]
This Excel crashes too....
Comment 3 bearbalu 2014-03-27 22:16:09 UTC
As per the original issue when I call,  workbook = WorkbookFactory.create(fileInputStream), I get the exception java.lang.IllegalArgumentException: The supplied
POIFSFileSystem does not contain a BIFF8 'Workbook' entry. Is it really an excel
 file?. 

So I was able to use the following code in most cases to get to the underlying excel.

NPOIFSFileSystem fs =  new NPOIFSFileSystem(stream);
EncryptionInfo info = new EncryptionInfo(fs);
Decryptor d = Decryptor.getInstance(info);
String password = Decryptor.DEFAULT_PASSWORD;
InputStream fInputStream = getEncryptedFileInputStream(xlsFile,errorMessages);
if (fInputStream != null) { 
    Workbook workbook = WorkbookFactory.create(fInputStream);
    fInputStream.close(); 
    return workbook; 
}
    
However, when I do this for the attached excel file, it crashes in EncryptionInfo with the Exception org.apache.poi.EncryptedDocumentException: Unsupported hash algorithm. So I am not even sure if this an encrypted file.

If I open the Excel and just re-save it, WorkbookFactory.create(fileInputStream) works fine.
Comment 4 Nick Burch 2014-03-27 22:43:21 UTC
Attachment 31452 [details] is not a regular .xls file. It appears to be a password protected .xlsx file, which must be opened as per http://poi.apache.org/encryption.html#XML-based+formats+-+Decryption (along with the password of course!)

Also, don't forget that Apache Tika is very good at working out what files are, if you ask Apache Tika to detect the file with no file extension, it correctly identifies the type as application/x-tika-ooxml-protected

As for hash alg problems, either try with the latest trunk, or raise a new bug if you really have got a file that uses a format we don't support