Bug 60284 - OldExcelExtractor should throw an EncryptedDocumentException
Summary: OldExcelExtractor should throw an EncryptedDocumentException
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: unspecified
Hardware: PC Windows NT
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-20 15:02 UTC by Tim Allison
Modified: 2016-10-20 15:13 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Allison 2016-10-20 15:02:40 UTC
On TIKA-2118, Seva Alekseyev shared a document that causes an UnsupportedCodePage exception.  The file is an old xls (BIFF5) that is encrypted.  

After looking through https://www.openoffice.org/sc/excelfileformat.pdf and experimenting with some files that cause similar exceptions in Tika's regression corpus, it appears that all records after a file pass record are encrypted, even the contents of the code page record.

Let's throw an EncryptedDocumentException (Encryption not supported for old excel files).

I did find one file that doesn't appear to be encrypted (attached to TIKA-2118).  I can open it, but it is write protected...so even though I can open it and copy and paste contents out of it, the inner contents are encrypted.
Comment 1 Tim Allison 2016-10-20 15:13:42 UTC
r1765829