Bug 55604

Summary: AIOOBE from SprmBuffer class when reading body of some .doc files
Product: POI Reporter: Willy Solaligue <wsolaligue>
Component: HWPFAssignee: POI Developers List <dev>
Status: REOPENED ---    
Severity: major    
Priority: P2    
Version: 3.8-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: Patch with fix about SprmBuffer problem

Description Willy Solaligue 2013-09-27 17:35:23 UTC
Created attachment 30890 [details]
Patch with fix about SprmBuffer problem

We had a issue when we tried to get the body content of a MS Word 97 file. We got this exception:

.
.

Caused by: java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at org.apache.poi.hwpf.sprm.SprmBuffer.append(SprmBuffer.java:128)
at org.apache.poi.hwpf.model.PAPBinTable.rebuild(PAPBinTable.java:293)
at org.apache.poi.hwpf.model.PAPBinTable.rebuild(PAPBinTable.java:116)
at org.apache.poi.hwpf.HWPFOldDocument.<init>(HWPFOldDocument.java:136)
at org.apache.tika.parser.microsoft.WordExtractor.parseWord6(WordExtractor.java:437)
at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:79)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
... 15 more 

so We did a fix to avoid that exception, I am attaching a patch file with the fix.

Regards
Comment 1 Willy Solaligue 2013-09-27 17:41:05 UTC
Additionally, we tried again using POI 3.9, but it couldn't solve the issue.
Comment 2 Dominik Stadler 2016-07-17 09:44:00 UTC
Closing this as there was no update for a very long time and the attached patch just catches and completely ignores any Exception thrown at this code-location. This is surely not a fix, not even a workaround, but a rough hack that might hide any type of problem at that point. 

Please reopen this bug if this is still a problem for you and you can provide a fix that can be applied to the library.
Comment 3 Seva Alekseyev 2016-10-15 02:19:05 UTC
Same problem here, surfaced via Tika. The following file reproduces the behavior:

https://dl.dropboxusercontent.com/u/92341073/Message%20to%20Eric%20Spooner.doc

Tika issue here: https://issues.apache.org/jira/browse/TIKA-2119