Created attachment 30890 [details] Patch with fix about SprmBuffer problem We had a issue when we tried to get the body content of a MS Word 97 file. We got this exception: . . Caused by: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.poi.hwpf.sprm.SprmBuffer.append(SprmBuffer.java:128) at org.apache.poi.hwpf.model.PAPBinTable.rebuild(PAPBinTable.java:293) at org.apache.poi.hwpf.model.PAPBinTable.rebuild(PAPBinTable.java:116) at org.apache.poi.hwpf.HWPFOldDocument.<init>(HWPFOldDocument.java:136) at org.apache.tika.parser.microsoft.WordExtractor.parseWord6(WordExtractor.java:437) at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:79) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) ... 15 more so We did a fix to avoid that exception, I am attaching a patch file with the fix. Regards
Additionally, we tried again using POI 3.9, but it couldn't solve the issue.
Closing this as there was no update for a very long time and the attached patch just catches and completely ignores any Exception thrown at this code-location. This is surely not a fix, not even a workaround, but a rough hack that might hide any type of problem at that point. Please reopen this bug if this is still a problem for you and you can provide a fix that can be applied to the library.
Same problem here, surfaced via Tika. The following file reproduces the behavior: https://dl.dropboxusercontent.com/u/92341073/Message%20to%20Eric%20Spooner.doc Tika issue here: https://issues.apache.org/jira/browse/TIKA-2119