Bug 48339

Summary: Exception threw when filtering the attached Excel using tika-app-0.4.jar
Product: POI Reporter: leon800219
Component: HSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Windows Server 2003   
Attachments: xls file containing several embedded object that will cause the exception

Description leon800219 2009-12-04 00:46:46 UTC
Created attachment 24670 [details]
xls file containing several embedded object that will cause the exception

I got an exception when filtering the attached Excel file using "type bugs.xls | java -jar tika-app -0.4.jar -". 

The embedded object seemed to cause the problem, exception stack trace follows.




Exception in thread "main" org.apache.tika.exception.TikaException:
Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@651dba45
       at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122)
       at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
       at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:175)
       at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:62)
Caused by: org.apache.poi.hssf.record.RecordFormatException: Unable to
construct record instance
       at org.apache.poi.hssf.record.RecordFactory$ReflectionRecordCreator.create(RecordFactory.java:64)
       at org.apache.poi.hssf.record.RecordFactory.createSingleRecord(RecordFactory.java:263)
       at org.apache.poi.hssf.record.RecordFactoryInputStream.readNextRecord(RecordFactoryInputStream.java:270)
       at org.apache.poi.hssf.record.RecordFactoryInputStream.nextRecord(RecordFactoryInputStream.java:236)
       at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:122)
       at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:85)
       at org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:145)
       at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:114)
       at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
       ... 3 more
Caused by: org.apache.poi.hssf.record.RecordFormatException: Ran out
of record data trying to read formula. fields: (option=-12 index=11540
not_used=353 name=''')
       at org.apache.poi.hssf.record.ExternalNameRecord.readFail(ExternalNameRecord.java:177)
       at org.apache.poi.hssf.record.ExternalNameRecord.<init>(ExternalNameRecord.java:164)
       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
       at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
       at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
       at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
       at org.apache.poi.hssf.record.RecordFactory$ReflectionRecordCreator.create(RecordFactory.java:56)
       ... 11 more
Comment 1 Yegor Kozlov 2009-12-15 08:33:20 UTC
The problem fixed in r890871

To test the fix you need to update Tika to use the latest POI jars built from trunk. Daily builds can be downloaded from http://encore.torchbox.com/poi-svn-build/


Yegor
Comment 2 willpower1024 2010-08-03 12:54:32 UTC
Glad to see this a known issue and that it is fixed :))).  My app was having issues reading spreadsheets with Linked Notes after upgrading to 3.6.  I've tested the fix with the 3.7 beta1 libraries and it works like a charm for reading Linked Notes.  Appreciate the quick turnaround.  Any ETA on the release of 3.7?
Comment 3 Nick Burch 2010-08-03 12:59:45 UTC
Next beta will hopefully be in a week or so (voting will start shortly), the final release will depend on bug reports!