Summary: | Exception parsing MS Word 8.0 file | ||
---|---|---|---|
Product: | POI | Reporter: | Erik Hetzner <ehetzner> |
Component: | HWPF | Assignee: | POI Developers List <dev> |
Status: | RESOLVED DUPLICATE | ||
Severity: | minor | CC: | pablo.queixalos |
Priority: | P2 | ||
Version: | 3.8-dev | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | Linux | ||
Attachments: |
word file which causes error, as downloaded from http://www.arb.ca.gov/msprog/smogcheck/july00/iiif.doc
Throwing ArrayIndexOutOfBoundsException Another one Throwing ArrayIndexOutOfBoundsException |
Description
Erik Hetzner
2011-03-16 10:59:27 UTC
Created attachment 26777 [details] word file which causes error, as downloaded from http://www.arb.ca.gov/msprog/smogcheck/july00/iiif.doc The problem is not reproducible with latest build from trunk. I added a unit test and included the attached document in our collection of test documents. Yegor Issue reopened, tested with r1175705 from trunk (through tika) : java.lang.ArrayIndexOutOfBoundsException: 70185 at org.apache.poi.util.LittleEndian.getShort(LittleEndian.java:45) at org.apache.poi.ddf.DefaultEscherRecordFactory.createRecord(DefaultEscherRecordFactory.java:60) at org.apache.poi.hwpf.model.PicturesTable.searchForPictures(PicturesTable.java:182) at org.apache.poi.hwpf.model.PicturesTable.searchForPictures(PicturesTable.java:193) at org.apache.poi.hwpf.model.PicturesTable.searchForPictures(PicturesTable.java:193) at org.apache.poi.hwpf.model.PicturesTable.getAllPictures(PicturesTable.java:220) at org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:498) at org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:488) at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:81) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:200) Created attachment 27594 [details]
Throwing ArrayIndexOutOfBoundsException
Created attachment 27595 [details]
Another one Throwing ArrayIndexOutOfBoundsException
(In reply to comment #5) > Created attachment 27595 [details] > Another one Throwing ArrayIndexOutOfBoundsException This one failing validation: <BFFValidation path="Bug50936_3.doc" datetime="10/30/11 03:16:10" result="FAILED"> <ParseStack> <!-- skipped --> <Type docName="MS-DOC" sectionTitle="Section Properties" msdnLink="http://msdn.microsoft.com/en-us/library/46c3ec54-53ff-4c0a-b0d6-07ad15d2546e" streamName="WordDocument" streamOffset="166413" hexStreamOffset="0x28a0d"/> <LastData><![CDATA[ 3A -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- : ]]></LastData> </BFFValidation> |