Bug 48339 - Exception threw when filtering the attached Excel using tika-app-0.4.jar
Summary: Exception threw when filtering the attached Excel using tika-app-0.4.jar
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: unspecified
Hardware: PC Windows Server 2003
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2009-12-04 00:46 UTC by leon800219
Modified: 2010-08-03 12:59 UTC (History)
0 users

xls file containing several embedded object that will cause the exception (29.50 KB, application/vnd.ms-excel)
2009-12-04 00:46 UTC, leon800219

Note You need to log in before you can comment on or make changes to this bug.
Description leon800219 2009-12-04 00:46:46 UTC
Created attachment 24670 [details]
xls file containing several embedded object that will cause the exception

I got an exception when filtering the attached Excel file using "type bugs.xls | java -jar tika-app -0.4.jar -". 

The embedded object seemed to cause the problem, exception stack trace follows.

Exception in thread "main" org.apache.tika.exception.TikaException:
Unexpected RuntimeException from
       at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122)
       at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
       at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:175)
       at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:62)
Caused by: org.apache.poi.hssf.record.RecordFormatException: Unable to
construct record instance
       at org.apache.poi.hssf.record.RecordFactory$ReflectionRecordCreator.create(RecordFactory.java:64)
       at org.apache.poi.hssf.record.RecordFactory.createSingleRecord(RecordFactory.java:263)
       at org.apache.poi.hssf.record.RecordFactoryInputStream.readNextRecord(RecordFactoryInputStream.java:270)
       at org.apache.poi.hssf.record.RecordFactoryInputStream.nextRecord(RecordFactoryInputStream.java:236)
       at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:122)
       at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:85)
       at org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:145)
       at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:114)
       at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
       ... 3 more
Caused by: org.apache.poi.hssf.record.RecordFormatException: Ran out
of record data trying to read formula. fields: (option=-12 index=11540
not_used=353 name=''')
       at org.apache.poi.hssf.record.ExternalNameRecord.readFail(ExternalNameRecord.java:177)
       at org.apache.poi.hssf.record.ExternalNameRecord.<init>(ExternalNameRecord.java:164)
       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
       at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
       at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
       at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
       at org.apache.poi.hssf.record.RecordFactory$ReflectionRecordCreator.create(RecordFactory.java:56)
       ... 11 more
Comment 1 Yegor Kozlov 2009-12-15 08:33:20 UTC
The problem fixed in r890871

To test the fix you need to update Tika to use the latest POI jars built from trunk. Daily builds can be downloaded from http://encore.torchbox.com/poi-svn-build/

Comment 2 willpower1024 2010-08-03 12:54:32 UTC
Glad to see this a known issue and that it is fixed :))).  My app was having issues reading spreadsheets with Linked Notes after upgrading to 3.6.  I've tested the fix with the 3.7 beta1 libraries and it works like a charm for reading Linked Notes.  Appreciate the quick turnaround.  Any ETA on the release of 3.7?
Comment 3 Nick Burch 2010-08-03 12:59:45 UTC
Next beta will hopefully be in a week or so (voting will start shortly), the final release will depend on bug reports!