Bug 61257

Summary: Unable to parse doc file. IOOBE thrown while reading PropertyTable from NPOIFSFileSystem
Product: POI Reporter: gaurav.chd3
Component: POI OverallAssignee: POI Developers List <dev>
Status: RESOLVED WORKSFORME    
Severity: major CC: gaurav.chd3
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: word file

Description gaurav.chd3 2017-07-07 06:37:13 UTC
Created attachment 35100 [details]
word file

Apache Tika was unable to parse the document
at C:\Users\skumar\Desktop\tikaError_28_Apr\tikaError_28_Apr\tika_error_files\ebs\documents\www.3gpp.org\ftp\tsg_ran\WG1_RL1\TSGR1_86b\Docs\R1-1608674 Discussion on measurement related reference signals.doc.

The full exception stack trace is included below:

org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@5a9a579f
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188)
	at org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:74)
	at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:357)
	at org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:308)
	at org.apache.tika.gui.ParsingTransferHandler.importFiles(ParsingTransferHandler.java:94)
	at org.apache.tika.gui.ParsingTransferHandler.importData(ParsingTransferHandler.java:77)
	at javax.swing.TransferHandler.importData(Unknown Source)
	at javax.swing.TransferHandler$DropHandler.drop(Unknown Source)
	at java.awt.dnd.DropTarget.drop(Unknown Source)
	at javax.swing.TransferHandler$SwingDropTarget.drop(Unknown Source)
	at sun.awt.dnd.SunDropTargetContextPeer.processDropMessage(Unknown Source)
	at sun.awt.dnd.SunDropTargetContextPeer$EventDispatcher.dispatchDropEvent(Unknown Source)
	at sun.awt.dnd.SunDropTargetContextPeer$EventDispatcher.dispatchEvent(Unknown Source)
	at sun.awt.dnd.SunDropTargetEvent.dispatch(Unknown Source)
	at java.awt.Component.dispatchEventImpl(Unknown Source)
	at java.awt.Container.dispatchEventImpl(Unknown Source)
	at java.awt.Component.dispatchEvent(Unknown Source)
	at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
	at java.awt.LightweightDispatcher.processDropTargetEvent(Unknown Source)
	at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
	at java.awt.Container.dispatchEventImpl(Unknown Source)
	at java.awt.Window.dispatchEventImpl(Unknown Source)
	at java.awt.Component.dispatchEvent(Unknown Source)
	at java.awt.EventQueue.dispatchEventImpl(Unknown Source)
	at java.awt.EventQueue.access$500(Unknown Source)
	at java.awt.EventQueue$3.run(Unknown Source)
	at java.awt.EventQueue$3.run(Unknown Source)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
	at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
	at java.awt.EventQueue$4.run(Unknown Source)
	at java.awt.EventQueue$4.run(Unknown Source)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
	at java.awt.EventQueue.dispatchEvent(Unknown Source)
	at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
	at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
	at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
	at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
	at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
	at java.awt.EventDispatchThread.run(Unknown Source)
Caused by: java.lang.IndexOutOfBoundsException: Index: 20, Size: 20
	at java.util.ArrayList.rangeCheck(Unknown Source)
	at java.util.ArrayList.get(Unknown Source)
	at org.apache.poi.poifs.property.PropertyTableBase.populatePropertyTree(PropertyTableBase.java:128)
	at org.apache.poi.poifs.property.PropertyTableBase.<init>(PropertyTableBase.java:63)
	at org.apache.poi.poifs.property.NPropertyTable.<init>(NPropertyTable.java:66)
	at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readCoreContents(NPOIFSFileSystem.java:440)
	at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:235)
	at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:168)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:120)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
	... 43 more
Comment 1 PJ Fanning 2017-07-07 18:32:52 UTC
This word doc loads fine for me using https://mvnrepository.com/artifact/org.apache.tika/tika-parsers/1.15
Comment 2 Javen O'Neal 2017-07-07 18:38:00 UTC
What version of Tika and POI are you using that resulted in the reported exception?
Comment 3 gaurav.chd3 2017-07-07 19:16:08 UTC
I was using 1.14 which throwed exception. with 1.15, it works fine.
Comment 4 PJ Fanning 2017-07-07 19:23:09 UTC
I was able to reproduce the issue with tika 1.14 (which depends on poi 3.15) - tika 1.15 depends on poi 3.16 so looks like the fix is in poi 3.16.
Comment 5 Javen O'Neal 2017-07-07 19:54:56 UTC
Resolving as WORKSFORME since a fix was already applied between 3.15 and POI 3.16.