Bug 55732 - PPT can't open, fails with "Couldn't instantiate .... StyleTextProp9Atom : java.lang.ArrayIndexOutOfBoundsException: 56"
Summary: PPT can't open, fails with "Couldn't instantiate .... StyleTextProp9Atom : ja...
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HSLF (show other bugs)
Version: 3.9-FINAL
Hardware: PC All
: P2 major (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-01 07:33 UTC by aimee dev
Modified: 2014-02-20 00:05 UTC (History)
1 user (show)



Attachments
errornous PPT-File - just textarea (97.50 KB, application/vnd.ms-powerpoint)
2014-02-19 10:02 UTC, Marcel Pokrandt
Details
suggested patch (4.56 KB, text/plain)
2014-02-19 10:04 UTC, Marcel Pokrandt
Details
suggested testcase (1.63 KB, text/plain)
2014-02-19 10:05 UTC, Marcel Pokrandt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description aimee dev 2013-11-01 07:33:56 UTC
File too large to upload (2.5mb), it may be found here...

http://www.cdt.org/files/CDT_Data_Retention-PPT.ppt

It is a Microsoft Powerpoint 97 presentation 
   [ https://issues.apache.org/jira/browse/TIKA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810577#comment-13810577 ] 

Nick Burch commented on TIKA-1189:
----------------------------------

This looks to be a bug in Apache POI, one of the upstream libraries that Tika uses. Any chance you could open a bug in the POI Bugzilla - http://issues.apache.org/bugzilla/buglist.cgi?product=POI - and attach the file there?

Fails to parse PPT file
-----------------------

               Key: TIKA-1189
               URL: https://issues.apache.org/jira/browse/TIKA-1189
           Project: Tika
        Issue Type: Bug
        Components: cli, gui
       Environment: OSX 10.9, OSX 10.6
          Reporter: Aimee Dev
       Attachments: CDT_Data_Retention-PPT.ppt


Out of the box tika application when presented with the file results in 
Apache Tika was unable to parse the document
at /Volumes/FREECOM_HDD/Test/CDT_Data_Retention-PPT.ppt.
The full exception stack trace is included below:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@224f9db
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
	at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:320)
	at org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:279)
	at org.apache.tika.gui.ParsingTransferHandler.importFiles(ParsingTransferHandler.java:94)
	at org.apache.tika.gui.ParsingTransferHandler.importData(ParsingTransferHandler.java:77)
	at javax.swing.TransferHandler.importData(TransferHandler.java:826)
	at javax.swing.TransferHandler$DropHandler.drop(TransferHandler.java:1536)
	at java.awt.dnd.DropTarget.drop(DropTarget.java:450)
	at javax.swing.TransferHandler$SwingDropTarget.drop(TransferHandler.java:1274)
	at sun.awt.dnd.SunDropTargetContextPeer.processDropMessage(SunDropTargetContextPeer.java:537)
	at sun.lwawt.macosx.CDropTargetContextPeer.processDropMessage(CDropTargetContextPeer.java:127)
	at sun.awt.dnd.SunDropTargetContextPeer$EventDispatcher.dispatchDropEvent(SunDropTargetContextPeer.java:851)
	at sun.awt.dnd.SunDropTargetContextPeer$EventDispatcher.dispatchEvent(SunDropTargetContextPeer.java:775)
	at sun.awt.dnd.SunDropTargetEvent.dispatch(SunDropTargetEvent.java:48)
	at java.awt.Component.dispatchEventImpl(Component.java:4716)
	at java.awt.Container.dispatchEventImpl(Container.java:2287)
	at java.awt.Component.dispatchEvent(Component.java:4687)
	at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4832)
	at java.awt.LightweightDispatcher.processDropTargetEvent(Container.java:4566)
	at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4417)
	at java.awt.Container.dispatchEventImpl(Container.java:2273)
	at java.awt.Window.dispatchEventImpl(Window.java:2719)
	at java.awt.Component.dispatchEvent(Component.java:4687)
	at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:735)
	at java.awt.EventQueue.access$200(EventQueue.java:103)
	at java.awt.EventQueue$3.run(EventQueue.java:694)
	at java.awt.EventQueue$3.run(EventQueue.java:692)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)
	at java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:87)
	at java.awt.EventQueue$4.run(EventQueue.java:708)
	at java.awt.EventQueue$4.run(EventQueue.java:706)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)
	at java.awt.EventQueue.dispatchEvent(EventQueue.java:705)
	at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:242)
	at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:161)
	at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:150)
	at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:146)
	at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:138)
	at java.awt.EventDispatchThread.run(EventDispatchThread.java:91)
Caused by: java.lang.RuntimeException: Couldn't instantiate the class for type with id 5000 on class class org.apache.poi.hslf.record.DummyPositionSensitiveRecordWithChildren : java.lang.reflect.InvocationTargetException
Cause was : java.lang.RuntimeException: Couldn't instantiate the class for type with id 5002 on class class org.apache.poi.hslf.record.DummyPositionSensitiveRecordWithChildren : java.lang.reflect.InvocationTargetException
Cause was : java.lang.RuntimeException: Couldn't instantiate the class for type with id 5003 on class class org.apache.poi.hslf.record.BinaryTagDataBlob : java.lang.reflect.InvocationTargetException
Cause was : java.lang.RuntimeException: Couldn't instantiate the class for type with id 4012 on class class org.apache.poi.hslf.record.StyleTextProp9Atom : java.lang.reflect.InvocationTargetException
Cause was : java.lang.ArrayIndexOutOfBoundsException: 56
	at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:185)
	at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:128)
	at org.apache.poi.hslf.model.SimpleShape.getClientRecords(SimpleShape.java:347)
	at org.apache.poi.hslf.model.SimpleShape.getClientDataRecord(SimpleShape.java:319)
	at org.apache.poi.hslf.model.TextShape.getPlaceholderAtom(TextShape.java:591)
	at org.apache.poi.hslf.model.Sheet.getPlaceholder(Sheet.java:438)
	at org.apache.poi.hslf.model.HeadersFooters.isVisible(HeadersFooters.java:244)
	at org.apache.poi.hslf.model.HeadersFooters.isHeaderVisible(HeadersFooters.java:148)
	at org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:62)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:189)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	... 42 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.GeneratedConstructorAccessor12.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:181)
	... 53 more
Caused by: java.lang.RuntimeException: Couldn't instantiate the class for type with id 5002 on class class org.apache.poi.hslf.record.DummyPositionSensitiveRecordWithChildren : java.lang.reflect.InvocationTargetException
Cause was : java.lang.RuntimeException: Couldn't instantiate the class for type with id 5003 on class class org.apache.poi.hslf.record.BinaryTagDataBlob : java.lang.reflect.InvocationTargetException
Cause was : java.lang.RuntimeException: Couldn't instantiate the class for type with id 4012 on class class org.apache.poi.hslf.record.StyleTextProp9Atom : java.lang.reflect.InvocationTargetException
Cause was : java.lang.ArrayIndexOutOfBoundsException: 56
	at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:185)
	at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:128)
	at org.apache.poi.hslf.record.DummyPositionSensitiveRecordWithChildren.<init>(DummyPositionSensitiveRecordWithChildren.java:52)
	... 57 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.GeneratedConstructorAccessor12.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:181)
	... 59 more
Caused by: java.lang.RuntimeException: Couldn't instantiate the class for type with id 5003 on class class org.apache.poi.hslf.record.BinaryTagDataBlob : java.lang.reflect.InvocationTargetException
Cause was : java.lang.RuntimeException: Couldn't instantiate the class for type with id 4012 on class class org.apache.poi.hslf.record.StyleTextProp9Atom : java.lang.reflect.InvocationTargetException
Cause was : java.lang.ArrayIndexOutOfBoundsException: 56
	at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:185)
	at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:128)
	at org.apache.poi.hslf.record.DummyPositionSensitiveRecordWithChildren.<init>(DummyPositionSensitiveRecordWithChildren.java:52)
	... 63 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:181)
	... 65 more
Caused by: java.lang.RuntimeException: Couldn't instantiate the class for type with id 4012 on class class org.apache.poi.hslf.record.StyleTextProp9Atom : java.lang.reflect.InvocationTargetException
Cause was : java.lang.ArrayIndexOutOfBoundsException: 56
	at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:185)
	at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:128)
	at org.apache.poi.hslf.record.BinaryTagDataBlob.<init>(BinaryTagDataBlob.java:52)
	... 69 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:181)
	... 71 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 56
	at org.apache.poi.util.LittleEndian.getInt(LittleEndian.java:163)
	at org.apache.poi.hslf.record.StyleTextProp9Atom.<init>(StyleTextProp9Atom.java:70)
	... 76 more



--
This message was sent by Atlassian JIRA
(v6.1#6144)
Comment 1 Marcel Pokrandt 2014-02-19 10:02:44 UTC
Created attachment 31329 [details]
errornous PPT-File - just textarea

store in 
test-data\document
Comment 2 Marcel Pokrandt 2014-02-19 10:04:14 UTC
Created attachment 31330 [details]
suggested patch

store in
src\scratchpad\src\org\apache\poi\hslf\record
Comment 3 Marcel Pokrandt 2014-02-19 10:05:01 UTC
Created attachment 31331 [details]
suggested testcase

store in
src\testcases\org\apache\poi\hslf\model
Comment 4 Marcel Pokrandt 2014-02-19 10:07:23 UTC
I can confirm this bug with my own old ´97 PPT which contains nothing more than an empty Text-Area.

Caused by: java.lang.ArrayIndexOutOfBoundsException: 20
	at org.apache.poi.util.LittleEndian.getInt(LittleEndian.java:161)
	at org.apache.poi.hslf.record.StyleTextProp9Atom.<init>(StyleTextProp9Atom.java:70)
	... 65 more


I made a small test-case (attached) and a suggested solution (attached too) as a patch of class org.apache.poi.hslf.record.StyleTextProp9Atom. Before reading the (not used) fields textCfException9 and textSiException I check if the offset is already behind the array size. 

if (i >= data.length) {
      break;
}
        	
Since both fields are NOT used anywhere I think it should be safe to skip reading them in this case. With my patch two of my checked files with same error succeed to parse and I could extract text.


I would really appreciate if you could integrate this patch because I´m using poi/tika for indexing a great bunch of office files and a lot of them seem to fail because of the same error.
Comment 5 Andreas Beeker 2014-02-20 00:05:52 UTC
Thank you for the patch.
Committed with SVN ver r1569984
and because the original file still failed - another check was necessary - r1569999