Created attachment 24670 [details] xls file containing several embedded object that will cause the exception I got an exception when filtering the attached Excel file using "type bugs.xls | java -jar tika-app -0.4.jar -". The embedded object seemed to cause the problem, exception stack trace follows. Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@651dba45 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:175) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:62) Caused by: org.apache.poi.hssf.record.RecordFormatException: Unable to construct record instance at org.apache.poi.hssf.record.RecordFactory$ReflectionRecordCreator.create(RecordFactory.java:64) at org.apache.poi.hssf.record.RecordFactory.createSingleRecord(RecordFactory.java:263) at org.apache.poi.hssf.record.RecordFactoryInputStream.readNextRecord(RecordFactoryInputStream.java:270) at org.apache.poi.hssf.record.RecordFactoryInputStream.nextRecord(RecordFactoryInputStream.java:236) at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:122) at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:85) at org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:145) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:114) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120) ... 3 more Caused by: org.apache.poi.hssf.record.RecordFormatException: Ran out of record data trying to read formula. fields: (option=-12 index=11540 not_used=353 name=''') at org.apache.poi.hssf.record.ExternalNameRecord.readFail(ExternalNameRecord.java:177) at org.apache.poi.hssf.record.ExternalNameRecord.<init>(ExternalNameRecord.java:164) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.poi.hssf.record.RecordFactory$ReflectionRecordCreator.create(RecordFactory.java:56) ... 11 more
The problem fixed in r890871 To test the fix you need to update Tika to use the latest POI jars built from trunk. Daily builds can be downloaded from http://encore.torchbox.com/poi-svn-build/ Yegor
Glad to see this a known issue and that it is fixed :))). My app was having issues reading spreadsheets with Linked Notes after upgrading to 3.6. I've tested the fix with the 3.7 beta1 libraries and it works like a charm for reading Linked Notes. Appreciate the quick turnaround. Any ETA on the release of 3.7?
Next beta will hopefully be in a week or so (voting will start shortly), the final release will depend on bug reports!