Bug 56471 - Error creating StyleTextProp9Atom - ArrayIndexOutOfBoundsException: 20 - when reading a PPT file
Summary: Error creating StyleTextProp9Atom - ArrayIndexOutOfBoundsException: 20 - when...
Status: RESOLVED WORKSFORME
Alias: None
Product: POI
Classification: Unclassified
Component: HSLF (show other bugs)
Version: 3.10-FINAL
Hardware: PC All
: P1 major with 1 vote (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-29 13:06 UTC by chetan.laddha@gmail.com
Modified: 2015-11-01 20:02 UTC (History)
1 user (show)



Attachments
This PPT is not getting extracted with "poi-3.10.jar" (634.00 KB, application/vnd.ms-powerpoint)
2014-04-29 13:06 UTC, chetan.laddha@gmail.com
Details

Note You need to log in before you can comment on or make changes to this bug.
Description chetan.laddha@gmail.com 2014-04-29 13:06:36 UTC
Created attachment 31572 [details]
This PPT is not getting extracted with "poi-3.10.jar"

Attach PPT file is not getting extracted. Giving exception as
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@2d536558
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:142)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:418)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:112)
Caused by: java.lang.RuntimeException: Couldn't instantiate the class for type with id 5000 on class class org.apache.poi.hslf.record.DummyPositionSensitiveRecordWithChildren : java.lang.reflect.InvocationTargetException
Cause was : java.lang.RuntimeException: Couldn't instantiate the class for type with id 5002 on class class org.apache.poi.hslf.record.DummyPositionSensitiveRecordWithChildren : java.lang.reflect.InvocationTargetException
Cause was : java.lang.RuntimeException: Couldn't instantiate the class for type with id 5003 on class class org.apache.poi.hslf.record.BinaryTagDataBlob : java.lang.reflect.InvocationTargetException
Cause was : java.lang.RuntimeException: Couldn't instantiate the class for type with id 4012 on class class org.apache.poi.hslf.record.StyleTextProp9Atom : java.lang.reflect.InvocationTargetException
Cause was : java.lang.ArrayIndexOutOfBoundsException: 20
Comment 1 Nick Burch 2014-08-27 11:52:09 UTC
It would be great if someone could:
 * Run it through the Microsoft Binary File Format validator, and see if that reports it as valid or invalid?
 * Load it in PowerPoint, do a save-as, and see if that fixes it?
 * Load it in Open Office, and see if that is happy with it?
Comment 2 Nick Burch 2014-08-27 11:53:11 UTC
Oh, and try it with POI 3.11 beta 2, just to see if we've already fixed it!
Comment 3 Felix Schwarz 2014-08-27 11:53:38 UTC
I think this bug is a duplicate of bug 55732 . Anyhow using tika 1.6 (dev build, using POI 3.11b2 or so) I can extract text from this ppt and before I couldn't.
Comment 4 Andreas Beeker 2015-11-01 20:02:46 UTC
I've used the PowerPointExtractor on the file and it works in the meantime