I get the Exception ArrayIndexOutOfBounds by doing: PowerPointExtractor extractor = new PowerPointExtractor("file.ppt"); use: poi-3.0-alpha1-20050830.jar poi-contrib-3.0-alpha1-20050830.jar and poi-scratchpad-3.0-alpha1-20050830.jar The files which i try to extract are definitely not corrupt! Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 8 at org.apache.poi.util.HexDump.toHex(HexDump.java:406) at org.apache.poi.util.HexDump.toHex(HexDump.java:397) at org.apache.poi.hpsf.UnsupportedVariantTypeException.<init>(UnsupportedVariantTypeException.java:46) at org.apache.poi.hpsf.ReadingNotSupportedException.<init>(ReadingNotSupportedException.java:45) at org.apache.poi.hpsf.VariantSupport.read(VariantSupport.java:267) at org.apache.poi.hpsf.Property.<init>(Property.java:146) at org.apache.poi.hpsf.Section.<init>(Section.java:281) at org.apache.poi.hpsf.PropertySet.init(PropertySet.java:452) at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:249) at org.apache.poi.hpsf.PropertySetFactory.create(PropertySetFactory.java:59) at org.apache.poi.hslf.HSLFSlideShow.getPropertySet(HSLFSlideShow.java:214) at org.apache.poi.hslf.HSLFSlideShow.readProperties(HSLFSlideShow.java:182) at org.apache.poi.hslf.HSLFSlideShow.<init>(HSLFSlideShow.java:105) at org.apache.poi.hslf.HSLFSlideShow.<init>(HSLFSlideShow.java:85) at org.apache.poi.hslf.HSLFSlideShow.<init>(HSLFSlideShow.java:72) at org.apache.poi.hslf.extractor.PowerPointExtractor.<init>(PowerPointExtractor.java:73) at TestIndex.Powerpoint.main(Powerpoint.java:23)
Can you please attach the powerpoint file that causes this? (The stack trace isn't enough to go on, because it's breaking deep in the bowels of HPSF, where things shouldn't be breaking)
Created attachment 16253 [details] a test file This is one of the files i try to parse! It's a test file!
I've just tried to open your test file, and it worked flawlessly. No problems, no exceptions. A call to extractor.getText() correctly returns your test strings. I think you must have some older POI jar files kicking around on your classpath, and that's what's causing your issues. Try running the following program - it'll tell you exactly where you are picking up POI from: import org.apache.poi.hslf.extractor.*; public class TestWherePPT { public static void main(String[] args) throws Exception { Class extractorClass = PowerPointExtractor.class; ClassLoader loader = extractorClass.getClassLoader(); String extractorFrom = loader.getResource("org/apache/poi/hslf/extractor/PowerPointExtractor.class").toString(); String hpsfFrom = loader.getResource("org/apache/poi/hpsf/PropertySet.class").toString(); System.out.println(extractorFrom); System.out.println(hpsfFrom); } } Until the output of both is just the new JAR files you downloaded, you'll have problems!
I have the same problem with the 3.0-alpha1-20050704 jars Here's from where the classes are loaded: jar:file:/C:/opt/eclipse-rc/eclipse/workspace/project/lib/textmining/poi-scratchpad-3.0-alpha1-20050704.jar!/org/apache/poi/hslf/extractor/PowerPointExtractor.class jar:file:/C:/opt/eclipse-rc/eclipse/workspace/project/lib/textmining/poi-3.0-alpha1-20050704.jar!/org/apache/poi/hpsf/PropertySet.class
> jar:file:/C:/opt/eclipse-rc/eclipse/workspace/project/lib/textmining/poi-scratchpad-3.0-alpha1-20050704.jar!/org/apache/poi/hslf/extractor/PowerPointExtractor.class > jar:file:/C:/opt/eclipse-rc/eclipse/workspace/project/lib/textmining/poi-3.0-alpha1-20050704.jar!/org/apache/poi/hpsf/PropertySet.class > That's very odd, since you do have the right jar files to hand (unlike the original reporter, who was probably using 3.0-alpha1 scratchpad with 2.5 core). I think it must be a different problem you're facing. Could you please post the stack trace you get, along with the file that's giving you grief?
I just got the latest HEAD from CVS, and have the same exception: jar:file:/C:/opt/eclipse-rc/eclipse/workspace/project/lib/textmining/poi-scratchpad-3.0-alpha1-20051019.jar!/org/apache/poi/hslf/extractor/PowerPointExtractor.class jar:file:/C:/opt/eclipse-rc/eclipse/workspace/project/lib/textmining/poi-3.0-alpha1-20051019.jar!/org/apache/poi/hpsf/PropertySet.class
Here's the stack trace (building from the HEAD just minutes ago). java.lang.ArrayIndexOutOfBoundsException: 8 at org.apache.poi.util.HexDump.toHex(HexDump.java:406) at org.apache.poi.util.HexDump.toHex(HexDump.java:397) at org.apache.poi.hpsf.UnsupportedVariantTypeException.<init>(UnsupportedVariantTypeException.java:46) at org.apache.poi.hpsf.ReadingNotSupportedException.<init>(ReadingNotSupportedException.java:45) at org.apache.poi.hpsf.VariantSupport.read(VariantSupport.java:267) at org.apache.poi.hpsf.Property.<init>(Property.java:146) at org.apache.poi.hpsf.Section.<init>(Section.java:281) at org.apache.poi.hpsf.PropertySet.init(PropertySet.java:452) at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:249) at org.apache.poi.hpsf.PropertySetFactory.create(PropertySetFactory.java:59) at org.apache.poi.hslf.HSLFSlideShow.getPropertySet(HSLFSlideShow.java:214) at org.apache.poi.hslf.HSLFSlideShow.readProperties(HSLFSlideShow.java:182) at org.apache.poi.hslf.HSLFSlideShow.<init>(HSLFSlideShow.java:105) at org.apache.poi.hslf.HSLFSlideShow.<init>(HSLFSlideShow.java:85) at org.apache.poi.hslf.extractor.PowerPointExtractor.<init>(PowerPointExtractor.java:84) at com.lek.extraction.PowerPointTextExtractor.getText(PowerPointTextExtractor.java:21) at com.lek.extraction.TextExtractionTestCase.testPowerPointExtraction(TextExtractionTestCase.java:69) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at junit.framework.TestCase.runTest(TestCase.java:154) at junit.framework.TestCase.runBare(TestCase.java:127) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:118) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:478) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:344) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196) Let me verify whether the original poster's file causes this same exception for me.
> Let me verify whether the original poster's file causes this same exception for me. Can you also try with the sample files under /jakarta-poi/src/scratchpad/testcases/org/apache/poi/hslf/data/ ? (Those files work just fine for me on my workstation, and also on the nightly build box)
I resolved the problem - it was the order of the jars. Everything worked fine when I ran my test with Ant, but not in eclipse, so I moved the POI jars to the head of the classpath. I'll attach my full classpath - perhaps that will give an indication of what other jar POI is interacting with.
Created attachment 16754 [details] eclipse project class path If you put the POI jars at the end of the classpath, then the error will occur.
> If you put the POI jars at the end of the classpath, then the error will occur. None of those scream "I contain old bits of POI", but hey. Quite why Eclipse was claiming it was going to get hpsf from the right jar, and then fail to, I don't know. Isn't eclipse fun... Nick