Bug 60158 - AIOOBE in VBAMacroReader
Summary: AIOOBE in VBAMacroReader
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: 3.15-dev
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2016-09-20 19:56 UTC by Tim Allison
Modified: 2016-10-18 12:58 UTC (History)
0 users

triggering file (36.50 KB, application/msword)
2016-09-20 19:56 UTC, Tim Allison

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Allison 2016-09-20 19:56:54 UTC
Created attachment 34282 [details]
triggering file

While working TIKA-2069, I got an AIOOBE on a test file that I generated by taking the docm that Jeff Swindle submitted and saving as .doc.

I confirmed this AIOOBE in pure POI:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
	at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:144)
	at org.apache.poi.util.RLEDecompressingInputStream.<init>(RLEDecompressingInputStream.java:77)
	at org.apache.poi.poifs.macros.VBAMacroReader.readModule(VBAMacroReader.java:204)
	at org.apache.poi.poifs.macros.VBAMacroReader.readMacros(VBAMacroReader.java:308)
	at org.apache.poi.poifs.macros.VBAMacroReader.findMacros(VBAMacroReader.java:155)
	at org.apache.poi.poifs.macros.VBAMacroReader.findMacros(VBAMacroReader.java:160)
	at org.apache.poi.poifs.macros.VBAMacroReader.findMacros(VBAMacroReader.java:160)
	at org.apache.poi.poifs.macros.VBAMacroReader.readMacros(VBAMacroReader.java:116)
	at org.apache.poi.poifs.macros.VBAMacroExtractor.extract(VBAMacroExtractor.java:83)
	at org.apache.poi.poifs.macros.VBAMacroExtractor.extract(VBAMacroExtractor.java:123)
	at org.apache.poi.poifs.macros.VBAMacroExtractor.main(VBAMacroExtractor.java:54)
Comment 1 Tim Allison 2016-09-20 19:58:27 UTC
Same exception with the original .docm file that Jeff submitted on TIKA-2069
Comment 2 Javen O'Neal 2016-09-21 01:19:20 UTC
I added a failing unit test to POI in r1761652 using test-macro-doc.docm from TIKA-2069 [1] submitted by Jeff Swindle

[1] https://issues.apache.org/jira/browse/TIKA-2069
Comment 3 Tim Allison 2016-10-17 19:49:19 UTC
Slightly less than 50% of the macro exceptions are caused by this. See xlsx reports on https://issues.apache.org/jira/browse/TIKA-2104.
Comment 4 Tim Allison 2016-10-18 12:26:17 UTC
I think this is a problem in RLEDecompressingInputStream.

In readChunk(), under 

if ((tokenFlags & POWER2[n]) == 0) {

if the int that is read is 'ff', when that gets cast to a byte, its value becomes -1.

When we try to readInt() to get the module offset, the first byte returns '-1' and we think we've hit the end of the stream and return -1.
Comment 5 Tim Allison 2016-10-18 12:58:27 UTC

I modified RLEDecompressingInputStream's read() from

        return buf[pos++];

        return buf[pos++] & 0xFF;

Let me know if we need to modify anything else in RLEDecompressingInputStream...or if there's a better place to fix this.