Bug 60279

Summary: VBAMacroReader throws IllegalArgumentException on some files
Product: POI Reporter: Tim Allison <tallison>
Component: POIFSAssignee: POI Developers List <dev>
Severity: normal    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Windows NT   
Attachments: One triggering file from common crawl

Description Tim Allison 2016-10-19 19:16:43 UTC
Created attachment 34391 [details]
One triggering file from common crawl

On a few files in our regression corpus, I got:

java.lang.IllegalArgumentException: Header byte 0x01 expected, received 0x00
	at org.apache.poi.util.RLEDecompressingInputStream.<init>(RLEDecompressingInputStream.java:79)

I'm not sure if these files have valid macros in them or another embedded object.  Let's investigate.
Comment 1 veena subbu 2017-09-06 12:35:12 UTC
Am also facing th same problem.'Could anyone resolve it for us.
Comment 2 Javen O'Neal 2017-09-06 15:26:49 UTC
Sounds like we have a volunteer! Veena, are you interested in researching this problem and putting together a patch?
Comment 3 Tim Allison 2017-09-06 18:59:26 UTC
On the attached file, I don't have a solution, but I wanted to document what I've found so far.

1) This file's macros cause Microsoft to complain on document load (when you enable macros).  So, something is wonky at least for this document.

2) decelage's oledump.py is able to read this macro as:
Attribute VB_Name = "ThisDocument"
Attribute VB_Base = "1Normal.ThisDocument"
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = True
Attribute VB_Exposed = True
Attribute VB_TemplateDerived = True
Attribute VB_Customizable = True

oledump.py commandline: oledump.py -s 7 -v file.doc

2) The module offset is read as 5541 (15A5), but the actual record starts at 02F9.
Comment 4 Tim Allison 2017-09-07 14:13:53 UTC
Veena, are you able to share your file?

I think the offset is corrupt in my file, and I'm guessing that decalage's tool may be brute-force reading the macros out of the ModuleStream.
Comment 5 Tim Allison 2017-09-08 12:19:18 UTC
Total <face_palm/>

oledump.py is by Didier Stevens


I filled out quite a bit in the vba stream parser hoping that an incorrect parse was leading to an incorrect offset.  The good news: we can now get quite a bit more metadata about the macros out, and there were some records that do require special handling.  The bad news: the offset really was incorrect and no improvements to the parser fixed this.

So, I have a patch for this that backs off to brute force to find the macro contents if there's an RLE decompression failure.

I'll wait to apply it until we release 3.17.
Comment 6 Tim Allison 2017-09-14 02:23:38 UTC