Bug 52949 - [PATCH] How to extract VBA Macros code from Excel file by using POI?
Summary: [PATCH] How to extract VBA Macros code from Excel file by using POI?
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.8-dev
Hardware: PC Windows XP
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Keywords: PatchAvailable
Depends on:
Reported: 2012-03-20 06:37 UTC by shh
Modified: 2016-04-12 06:24 UTC (History)
0 users

Class which can extrace macro source code (4.04 KB, application/x-zip-compressed)
2013-03-14 20:12 UTC, Barry Lagerweij

Note You need to log in before you can comment on or make changes to this bug.
Description shh 2012-03-20 06:37:14 UTC
Hi,Nick Burch
Recently, I want to extract VBA Macros code from Excel file.
Is it possible to extract Macros code from the inputstream of the Workbook by using POI's method? When the value of the parameter [preserveNodes] in the construct of HSSFWorkbook(InputStream s, boolean preserveNodes) is true, if will preserve macros nodes. Could I have some method to extract the macros from these nodes?

Thanks a lot!
Comment 1 Yegor Kozlov 2012-03-20 08:29:22 UTC
Unfortunately POI cannot read macro code. The main difficulty is that VBA isn't stored as plain text but instead MS Office uses pretty complex format as described in [MS-OVBA].pdf. 

At minimum, you can grab the node holding VBA code and try to parse it yourself. The main source of information how to do that is [MS-OVBA].pdf, you can download it from the Microsoft site.

Both HSSF and XSSF preserve macro nodes, this means that you can create templates with macros in MS Office and then populates them with data using POI. 

Comment 2 shh 2012-03-23 12:17:56 UTC
Thanks for your answer!
I also noticed your email on [http://apache-poi.1045710.n5.nabble.com/Google-Summer-of-Code-Apache-POI-td5582557.html].
Depending on your tips, I get some ideas from this website[http://www.cpearson.com/Excel/vbe.aspx], and in this way, it may export the VBComponent code module to a text file(Of course, my approach is not a pure java way).
First, using the JACOB(Java COM Bridge) to call the macro which in the VBA.
Second, the macro in the VBA which exports the existing VBComponent Code Module to a text file.
Finally, we could extrat the source code from the output text file(".frm",".bas" etc.).
Unfortunately, now I can't make the VBA source code visible, when the VBA Project has been encrypted and I can't supply the password.
Is there anyway I could make the make the VBA source code fully visible without providing password?

Any idea is welcome.
Thanks a lot!
Comment 3 Barry Lagerweij 2013-03-14 20:12:35 UTC
Created attachment 30052 [details]
Class which can extrace macro source code

Since POI does not provide access to this, I've written a class which allows you to extract the sourcecode as text.

The two attached classes can be used together with POI (I've tested with 3.8 and 3.9) to process the xl/vbaProject.bin (for ooxml) or XLS file and retrieve the sources.

The RLEDecompressingInputStream is an InputStream which can be used to decompress the chunks as described in the MS-OVBA specification. It wraps around a compressed inputstream (ussually a DocumentInputStream from the POIFS) and decompresses on the fly to preserve memory.

The VBAMacroExtractor processes the OLE binary stream records, records the CodePage (in order to convert byte-arrays to Strings) and will store the ModuleOffset. This offset specifies the location in the MemoryStream where the sourcecode starts. The VBAMacroExtractor has been written to automatically detect XLSM or XLS, and uses POIFSReader to process the file only once and preserve memory.

It might be worthwhile to enhance the POI workbook with classes which provide access to the VBA modules, see Andrey Yesyev's contributions to the poi-dev mailinglist.

I hope it's useful, feel free to use the sources under Apache2 license.
Comment 4 Parshant Sehrawat 2013-09-12 11:18:28 UTC
How would I extract the vba macro code from a doc and OOXML docx file format. Does it necessary to store them in macro enabled format. May be this is a wrong place for posting this question. But I don't know whether I should create another bug or continue in the same because the two are related.
Comment 5 Duncan Jauncey 2016-04-09 10:39:29 UTC
(In reply to Barry Lagerweij from comment #3)
> Created attachment 30052 [details]
> Class which can extrace macro source code
> Since POI does not provide access to this, I've written a class which allows
> you to extract the sourcecode as text.

I know this is reviving an old thread, but this is the only Java code that I've found for extracting VBA macros from Excel as plain text, which works really well.

One minor improvement: on line 83 of VBAMacroExtractor.java, there is a path with a  backslash hence the code will only work on Windows.

"\\VBA" should become: File.separatorChar + "VBA"

Thanks again for sharing this code.
Comment 6 Javen O'Neal 2016-04-09 19:33:24 UTC
The fix mentioned in comment #5 has been spun-off to bug 59296
Comment 7 Javen O'Neal 2016-04-09 20:02:57 UTC
Disregard comment 6. Comment 5 refers to attachment 30052 [details] from comment 3, not existing code, so comment 5's fix will be handled in this bug.
Comment 8 Nick Burch 2016-04-10 13:06:11 UTC
Thanks for this. Committed as of r1738431 with a number of tweaks (using NPOIFS, opening logic similar to WorkbookFactory etc)

There are now two classes you can call, one which returns the macros as in this, the other which is more a tool to extract to disk or STDOUT

No documentation beyond javadocs yet though, help appreciated on that!

Comment 9 Javen O'Neal 2016-04-10 13:39:25 UTC
I wrote a few unit tests for attachment 30052 [details]. I was planning to wait until after 3.15-beta1 was released before committing, but it looks like you beat me to it :)
Comment 10 Javen O'Neal 2016-04-10 13:50:00 UTC
See changes: r1738418, r1738423, r1738427, r1738429, r1738431
Comment 11 Javen O'Neal 2016-04-11 21:45:24 UTC
Added unit tests for document, slideshow, and diagram files in r1738438, r1738439, r1738513, r1738651

Currently supported for VBA macro reading:
* Spreadsheet: HSSF, XSSF
* Document: HWPF, XWPF
* Slideshow: XSLF (HSLF not supported yet, see bug 59302)
* Diagram: XDGF (HDGF not supported yet)
Comment 12 Nick Burch 2016-04-12 06:24:00 UTC
It's probably worth adding a quite note on this to the site, maybe under the POIFS Embedded resources page?