Summary: | Exception threw when filtering the attached Excel using tika-app-0.4.jar | ||
---|---|---|---|
Product: | POI | Reporter: | leon800219 |
Component: | HSSF | Assignee: | POI Developers List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | P2 | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | Windows Server 2003 | ||
Attachments: | xls file containing several embedded object that will cause the exception |
The problem fixed in r890871 To test the fix you need to update Tika to use the latest POI jars built from trunk. Daily builds can be downloaded from http://encore.torchbox.com/poi-svn-build/ Yegor Glad to see this a known issue and that it is fixed :))). My app was having issues reading spreadsheets with Linked Notes after upgrading to 3.6. I've tested the fix with the 3.7 beta1 libraries and it works like a charm for reading Linked Notes. Appreciate the quick turnaround. Any ETA on the release of 3.7? Next beta will hopefully be in a week or so (voting will start shortly), the final release will depend on bug reports! |
Created attachment 24670 [details] xls file containing several embedded object that will cause the exception I got an exception when filtering the attached Excel file using "type bugs.xls | java -jar tika-app -0.4.jar -". The embedded object seemed to cause the problem, exception stack trace follows. Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@651dba45 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:175) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:62) Caused by: org.apache.poi.hssf.record.RecordFormatException: Unable to construct record instance at org.apache.poi.hssf.record.RecordFactory$ReflectionRecordCreator.create(RecordFactory.java:64) at org.apache.poi.hssf.record.RecordFactory.createSingleRecord(RecordFactory.java:263) at org.apache.poi.hssf.record.RecordFactoryInputStream.readNextRecord(RecordFactoryInputStream.java:270) at org.apache.poi.hssf.record.RecordFactoryInputStream.nextRecord(RecordFactoryInputStream.java:236) at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:122) at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:85) at org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:145) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:114) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120) ... 3 more Caused by: org.apache.poi.hssf.record.RecordFormatException: Ran out of record data trying to read formula. fields: (option=-12 index=11540 not_used=353 name=''') at org.apache.poi.hssf.record.ExternalNameRecord.readFail(ExternalNameRecord.java:177) at org.apache.poi.hssf.record.ExternalNameRecord.<init>(ExternalNameRecord.java:164) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.poi.hssf.record.RecordFactory$ReflectionRecordCreator.create(RecordFactory.java:56) ... 11 more