Bug 64261 - Parse Errors for application/vnd.ms-excel
Summary: Parse Errors for application/vnd.ms-excel
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: unspecified
Hardware: All All
: P2 major (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-03-24 18:46 UTC by Javier
Modified: 2020-03-31 17:13 UTC (History)
1 user (show)



Attachments
Example test_dropbox_selected.xls (26.00 KB, application/vnd.ms-excel)
2020-03-24 18:46 UTC, Javier
Details
Example test_dropbox_NO_selected.xls (26.00 KB, application/vnd.ms-excel)
2020-03-24 18:46 UTC, Javier
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Javier 2020-03-24 18:46:03 UTC
Created attachment 37119 [details]
Example test_dropbox_selected.xls

We are trying to extract content from old Excel files using TIKA and we have encountered this error. If the excel file has a dropbox WITH any element selected, Apache Tika returns this exception, but if we deselect the element and save it, Tika extracts the content without any problem:

Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@37ddb69aException in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@37ddb69a at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at testTika.ExtractContent(testTika.java:183) at testTika.main(testTika.java:170)Caused by: org.apache.poi.util.RecordFormatException: Leftover 7 bytes in subrecord data [15, 00, 12, 00, 12, 00, 01, 00, 11, 20, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 0C, 00, 14, 00, 00, 00, 00, 00, 00, 00, 00, 00, 01, 00, 01, 00, 06, 00, 00, 00, 10, 00, 01, 00, 13, 00, EE, 1F, 10, 00, 09, 00, 00, 00, 00, 00, 25, 04, 00, 0A, 00, 05, 00, 05, 00, 05, 07, 00, 00, 00, 18, 00, 00, 00, 00, 00, 00, 01, 00, 00, 00] at org.apache.poi.hssf.record.ObjRecord.<init>(ObjRecord.java:112) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

I've attached 2 documents to test.
Comment 1 Javier 2020-03-24 18:46:32 UTC
Created attachment 37120 [details]
Example test_dropbox_NO_selected.xls
Comment 3 Tim Allison 2020-03-31 17:13:46 UTC
Found the problem.  Will fix shortly.