|Summary:||VBAMacroReader throws IllegalArgumentException on some files|
|Product:||POI||Reporter:||Tim Allison <tallison>|
|Component:||POIFS||Assignee:||POI Developers List <dev>|
|Attachments:||One triggering file from common crawl|
Description Tim Allison 2016-10-19 19:16:43 UTC
Created attachment 34391 [details] One triggering file from common crawl On a few files in our regression corpus, I got: java.lang.IllegalArgumentException: Header byte 0x01 expected, received 0x00 at org.apache.poi.util.RLEDecompressingInputStream.<init>(RLEDecompressingInputStream.java:79) ... I'm not sure if these files have valid macros in them or another embedded object. Let's investigate.
Comment 1 veena subbu 2017-09-06 12:35:12 UTC
Am also facing th same problem.'Could anyone resolve it for us.
Comment 2 Javen O'Neal 2017-09-06 15:26:49 UTC
Sounds like we have a volunteer! Veena, are you interested in researching this problem and putting together a patch?
Comment 3 Tim Allison 2017-09-06 18:59:26 UTC
On the attached file, I don't have a solution, but I wanted to document what I've found so far. 1) This file's macros cause Microsoft to complain on document load (when you enable macros). So, something is wonky at least for this document. 2) decelage's oledump.py is able to read this macro as: Attribute VB_Name = "ThisDocument" Attribute VB_Base = "1Normal.ThisDocument" Attribute VB_Creatable = False Attribute VB_PredeclaredId = True Attribute VB_Exposed = True Attribute VB_TemplateDerived = True Attribute VB_Customizable = True oledump.py commandline: oledump.py -s 7 -v file.doc 2) The module offset is read as 5541 (15A5), but the actual record starts at 02F9.
Comment 4 Tim Allison 2017-09-07 14:13:53 UTC
Veena, are you able to share your file? I think the offset is corrupt in my file, and I'm guessing that decalage's tool may be brute-force reading the macros out of the ModuleStream.
Comment 5 Tim Allison 2017-09-08 12:19:18 UTC
Total <face_palm/> oledump.py is by Didier Stevens Ugh...Sorry. I filled out quite a bit in the vba stream parser hoping that an incorrect parse was leading to an incorrect offset. The good news: we can now get quite a bit more metadata about the macros out, and there were some records that do require special handling. The bad news: the offset really was incorrect and no improvements to the parser fixed this. So, I have a patch for this that backs off to brute force to find the macro contents if there's an RLE decompression failure. I'll wait to apply it until we release 3.17.