Bug 60158

Summary: AIOOBE in VBAMacroReader
Product: POI Reporter: Tim Allison <tallison>
Component: POI OverallAssignee: POI Developers List <dev>
Severity: normal    
Priority: P2    
Version: 3.15-dev   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: triggering file

Description Tim Allison 2016-09-20 19:56:54 UTC
Created attachment 34282 [details]
triggering file

While working TIKA-2069, I got an AIOOBE on a test file that I generated by taking the docm that Jeff Swindle submitted and saving as .doc.

I confirmed this AIOOBE in pure POI:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
	at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:144)
	at org.apache.poi.util.RLEDecompressingInputStream.<init>(RLEDecompressingInputStream.java:77)
	at org.apache.poi.poifs.macros.VBAMacroReader.readModule(VBAMacroReader.java:204)
	at org.apache.poi.poifs.macros.VBAMacroReader.readMacros(VBAMacroReader.java:308)
	at org.apache.poi.poifs.macros.VBAMacroReader.findMacros(VBAMacroReader.java:155)
	at org.apache.poi.poifs.macros.VBAMacroReader.findMacros(VBAMacroReader.java:160)
	at org.apache.poi.poifs.macros.VBAMacroReader.findMacros(VBAMacroReader.java:160)
	at org.apache.poi.poifs.macros.VBAMacroReader.readMacros(VBAMacroReader.java:116)
	at org.apache.poi.poifs.macros.VBAMacroExtractor.extract(VBAMacroExtractor.java:83)
	at org.apache.poi.poifs.macros.VBAMacroExtractor.extract(VBAMacroExtractor.java:123)
	at org.apache.poi.poifs.macros.VBAMacroExtractor.main(VBAMacroExtractor.java:54)
Comment 1 Tim Allison 2016-09-20 19:58:27 UTC
Same exception with the original .docm file that Jeff submitted on TIKA-2069
Comment 2 Javen O'Neal 2016-09-21 01:19:20 UTC
I added a failing unit test to POI in r1761652 using test-macro-doc.docm from TIKA-2069 [1] submitted by Jeff Swindle

[1] https://issues.apache.org/jira/browse/TIKA-2069
Comment 3 Tim Allison 2016-10-17 19:49:19 UTC
Slightly less than 50% of the macro exceptions are caused by this. See xlsx reports on https://issues.apache.org/jira/browse/TIKA-2104.
Comment 4 Tim Allison 2016-10-18 12:26:17 UTC
I think this is a problem in RLEDecompressingInputStream.

In readChunk(), under 

if ((tokenFlags & POWER2[n]) == 0) {

if the int that is read is 'ff', when that gets cast to a byte, its value becomes -1.

When we try to readInt() to get the module offset, the first byte returns '-1' and we think we've hit the end of the stream and return -1.
Comment 5 Tim Allison 2016-10-18 12:58:27 UTC

I modified RLEDecompressingInputStream's read() from

        return buf[pos++];

        return buf[pos++] & 0xFF;

Let me know if we need to modify anything else in RLEDecompressingInputStream...or if there's a better place to fix this.