On latest version in SVN, constructing an HSLFSlideShow on a certain slide show, I get the following error: java.lang.RuntimeException: Couldn't instantiate the class for type with id 0 on class class org.apache.poi.hslf.record.UnknownRecordPlaceholder : java.lang.reflect.InvocationTargetException Cause was : java.lang.ArrayIndexOutOfBoundsException at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:158) at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:109) at org.apache.poi.hslf.HSLFSlideShow.readFIB(HSLFSlideShow.java:174) at org.apache.poi.hslf.HSLFSlideShow.<init>(HSLFSlideShow.java:103)
Created attachment 17600 [details] PPT file exhibiting the problem. Here's the file in question. It opens in Office and OpenOffice, looks to be pretty innocent but breaks in HSLF for some reason.
Following the normal slide records, that PPT file than has 63 records of type 0. The first 62 are of zero length (so just the 8 byte record header), and the final one is 251658240 bytes big! I'm happy to code up something to ignore the empty records of type 0. I'm less happy about the whacking great record How did you go about creating this file, and is there anything special about it?
On closer inspection, that last record claims to be ~250mb in size, as part of a 62kb file! Something is corrupt somewhere in that PPT file, but I'll see if I can come up with a sensible way to handle documents with corruption like that.
OK, SVN now has a fix. If a record claims to be larger than the available data, it will be silently ignored. I can now process that file fine with HSLF. Note - if you open a corrupt file like this, and write it back out again, it will be changed! (The corrupt record will be discarded)
Ah, so it was corrupt... I was wondering about that, because even OpenOffice opened it silently. I actually got the PPT file out of a large set of PPT files created by other people, so it's impossible to know exactly which versions they used and what sort of shenanigans went on. :-)
Darn. This does prevent the error being thrown, but now getTextRuns() returns no text. OpenOffice can show the text, so there must be something that can be done to retrieve it. Feel free to re-close this if it looks like it's completely impossible to get out because of that corrupt record.
Fixed, though it wasn't trivial. In the attached file root-level records were not continuous and some of them were skipped with the "Skipping record of type 0 at position ..." message in System.err. As the result, the wrong Slide record was selected with no text runs. Should work fine now. Yegor