Summary: | ArrayIndexOutOfBoundsException while opening PPT file | ||
---|---|---|---|
Product: | POI | Reporter: | Tim Riemann <triemann> |
Component: | HSLF | Assignee: | POI Developers List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | P2 | ||
Version: | 3.0-dev | ||
Target Milestone: | --- | ||
Hardware: | Other | ||
OS: | other | ||
Attachments: | Simplest possible testcase showing the ArrayIndexOutOfBoundsError |
Description
Tim Riemann
2006-10-20 15:17:04 UTC
Created attachment 19197 [details]
Simplest possible testcase showing the ArrayIndexOutOfBoundsError
I use POI through Nutch for parsing Office documents.
Note: No exception is thrown when I do my tests, but a lot of ERROR messages
are logged, indicating that something is wrong:
ERROR - ContentReaderListener.extractTextBoxes(322) | extractClientTextBoxes
java.lang.ArrayIndexOutOfBoundsException: -353698944
at org.apache.poi.util.LittleEndian.getNumber(LittleEndian.java:491)
at org.apache.poi.util.LittleEndian.getUShort(LittleEndian.java:64)
at
org.apache.nutch.parse.mspowerpoint.ContentReaderListener.extractTextBoxes(ContentReaderListener.java:200)
at
org.apache.nutch.parse.mspowerpoint.ContentReaderListener.processPOIFSReaderEvent(ContentReaderListener.java:110)
at
org.apache.poi.poifs.eventfilesystem.POIFSReader.processProperties(POIFSReader.java:260)
at
org.apache.poi.poifs.eventfilesystem.POIFSReader.read(POIFSReader.java:96)
at
org.apache.nutch.parse.mspowerpoint.PPTExtractor.extractText(PPTExtractor.java:49)
at org.apache.nutch.parse.ms.MSExtractor.extract(MSExtractor.java:77)
at
org.apache.nutch.parse.ms.MSBaseParser.getParse(MSBaseParser.java:81)
at
org.apache.nutch.parse.mspowerpoint.MSPowerPointParser.getParse(MSPowerPointParser.java:44)
at
no.creuna.documentparser.DocumentParser.parseDocument(DocumentParser.java:156)
at
test.no.creuna.documentparser.DocumentParserErrorsTest.testArrayIndexOutOfBoundsExceptionErrors(DocumentParserErrorsTest.java:186)
I use POI through Nutch. When opening the attachment Nutch logs a series of errors from within POI: ERROR - ContentReaderListener.extractTextBoxes(322) | extractClientTextBoxes java.lang.ArrayIndexOutOfBoundsException: -353698944 at org.apache.poi.util.LittleEndian.getNumber(LittleEndian.java:491) at org.apache.poi.util.LittleEndian.getUShort(LittleEndian.java:64) at org.apache.nutch.parse.mspowerpoint.ContentReaderListener.extractTextBoxes(ContentReaderListener.java:200) at org.apache.nutch.parse.mspowerpoint.ContentReaderListener.processPOIFSReaderEvent(ContentReaderListener.java:110) at org.apache.poi.poifs.eventfilesystem.POIFSReader.processProperties(POIFSReader.java:260) at org.apache.poi.poifs.eventfilesystem.POIFSReader.read(POIFSReader.java:96) at org.apache.nutch.parse.mspowerpoint.PPTExtractor.extractText(PPTExtractor.java:49) at org.apache.nutch.parse.ms.MSExtractor.extract(MSExtractor.java:77) at org.apache.nutch.parse.ms.MSBaseParser.getParse(MSBaseParser.java:81) at org.apache.nutch.parse.mspowerpoint.MSPowerPointParser.getParse(MSPowerPointParser.java:44) at no.creuna.documentparser.DocumentParser.parseDocument(DocumentParser.java:156) at test.no.creuna.documentparser.DocumentParserErrorsTest.testArrayIndexOutOfBoundsExceptionErrors(DocumentParserErrorsTest.java:186) I think this problem has now been fixed, thanks to Yegor's new understanding of the ordering of TextProps in StyleTextPropAtom I can open your test powerpoint document without any exceptions, so I'm hoping this is now closed. If you still get problems, can you re-open with a new problem file? |