Bug 59830

Summary: "Skipped only -1 while trying to skip 67116544 bytes. This should never happen." IOException in VBAMacroReader
Product: POI Reporter: brooke
Component: POIFSAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal CC: brooke
Priority: P2    
Version: 3.15-dev   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: MACRO VIRUS INFECTED WORD DOC, DO NOT OPEN IN WORD! It causes readMacros() to throw an IOException
The vbaProject.bin file found after saving as a .docm using Word
The results of running org.apache.poi.poifs.dev.POIFSDump.main on the problem document.

Description brooke 2016-07-08 16:34:15 UTC
Created attachment 34027 [details]
MACRO VIRUS INFECTED WORD DOC, DO NOT OPEN IN WORD!
It causes readMacros() to throw an IOException

WARNING: ATTACHED DOCUMENT IS INFECTED WITH A MALICIOUS MACRO. DO NOT OPEN IT USING MICROSOFT WORD.

I have been using POI to help take apart Office documents and look for suspicious/malicious content. I had developed my own way to extract the VBA script in the embedded macros, but have found the new VBAMacroReader to be much more useful.

However, I have noticed that VBAMacroReader fails on certain Office documents. I have attached one such example.


Code used to reproduce the error:

File file = new File("macro_virus.doc");
VBAMacroReader reader = new VBAMacroReader(file);
Map<String, String> macros = reader.readMacros();


Stack trace:
java.io.IOException: Skipped only -1 while trying to skip 67116544 bytes.  This should never happen.
	at org.apache.poi.poifs.macros.VBAMacroReader.trySkip(VBAMacroReader.java:182)
	at org.apache.poi.poifs.macros.VBAMacroReader.readMacros(VBAMacroReader.java:240)
	at org.apache.poi.poifs.macros.VBAMacroReader.findMacros(VBAMacroReader.java:148)
	at org.apache.poi.poifs.macros.VBAMacroReader.findMacros(VBAMacroReader.java:153)
	at org.apache.poi.poifs.macros.VBAMacroReader.findMacros(VBAMacroReader.java:153)
	at org.apache.poi.poifs.macros.VBAMacroReader.readMacros(VBAMacroReader.java:115)
	at poitester.POITester.main(POITester.java:27)


I've successfully opened the file in a sandbox and it appears to be a valid Word document.

Also, just FYI, I am using poi-3.15-beta2 and I get the same results on OSX 10.10.5 and openSUSE 11.4. 

Thanks!
Comment 1 Javen O'Neal 2016-07-09 06:21:39 UTC
I added context as to why -1 is being returned (could not read bytes from input stream) in r1751982.

Error occurred while reading section id 2
java.io.IOException: Error occurred while reading section id 2
        at org.apache.poi.poifs.macros.VBAMacroReader.readMacros(VBAMacroReader.java:244)
        at org.apache.poi.poifs.macros.VBAMacroReader.findMacros(VBAMacroReader.java:148)
        at org.apache.poi.poifs.macros.VBAMacroReader.findMacros(VBAMacroReader.java:153)
        at org.apache.poi.poifs.macros.VBAMacroReader.findMacros(VBAMacroReader.java:153)
        at org.apache.poi.poifs.macros.VBAMacroReader.readMacros(VBAMacroReader.java:115)
        at org.apache.poi.poifs.macros.TestVBAMacroReader.bug59830(TestVBAMacroReader.java:249)
Caused by: java.io.IOException: Skipped only -1 while trying to skip 67116544 bytes.  This should never happen.
        at org.apache.poi.poifs.macros.VBAMacroReader.trySkip(VBAMacroReader.java:182)
        at org.apache.poi.poifs.macros.VBAMacroReader.readMacros(VBAMacroReader.java:242)

From the MS-OVBA spec [1], a value of 0x0002 corresponds to a PROJECTLCID Record (section 2.3.4.2.1.2). The size of this record must be 0x00000004 according to the spec. See an example [2].
LCID is an abbreviation for language code identifier, "a 32-bit number that identifies the user interface human language dialect or variation that is supported by an application or a client computer" [3].
Is the 67116544 bytes number referring to the length of the PROJCTLCID record?

[1] https://msdn.microsoft.com/en-us/library/office/cc313094(v=office.12).aspx
[2] https://msdn.microsoft.com/en-us/library/dd952163(v=office.12).aspx
[3] https://msdn.microsoft.com/en-us/library/dd908523(v=office.12).aspx#gt_c7f99c66-592f-4053-b62a-878c189653b6

I did not commit the doc file though. Would you be able to extract the vbaProject.bin out of this malicious document? I'd feel more comfortable committing a file that can't execute itself. Probably the easiest way to get this file is to use Word to save-as to docm, then rename the docm with a .zip extension, and then pull out the file named vbaProject.bin.

FYI, I think the infected file can only harm Windows computers, as the document contains 3 macros that call powershell.exe on document open. Nonetheless, please exercise caution.

It may also be helpful to see what POI can read from the document using org.apache.poi.poifs.dev.POIFSDump.main. Keep in mind that the extracted files will contain ascii and non-ascii characters, as the extracted files are likely run-length encoded.
Comment 2 brooke 2016-07-11 18:49:33 UTC
I believe I was able to successfully extract the vbaProject.bin, using the suggested procedure.

I also ran POIFSDump.main on the problem doc file. I zipped the directory that it generated and attached it to this bug report.

Let me know if there is anything else I can do to help track this down.

Thanks!
Comment 3 brooke 2016-07-11 18:51:00 UTC
Created attachment 34030 [details]
The vbaProject.bin file found after saving as a .docm using Word
Comment 4 brooke 2016-07-11 18:51:52 UTC
Created attachment 34031 [details]
The results of running org.apache.poi.poifs.dev.POIFSDump.main on the problem document.
Comment 5 Tim Allison 2016-10-18 15:49:38 UTC
r1765468

Thank you for extracting the malicious payload!

I found a non-malicious file in govdocs1 that triggered the same exception.

Please re-open if this doesn't fix your problem.