I am getting a NullPointerException when trying to extract the macro VBA from a particular Excel file. I am using org.apache.poi.poifs.macros.VBAMacroReader The following code consistently reproduces the NullPointerException: File file = new File("npe_example.xls"); VBAMacroReader reader = new VBAMacroReader(file); Map<String, String> macros = reader.readMacros(); I have attached the file which causes the error.
Created attachment 34038 [details] Example xls that causes readMacros() to throw a NullPointerException
Created attachment 34039 [details] The results of running org.apache.poi.poifs.dev.POIFSDump.main on the problem document.
Could you provide a stack trace?
Here is the stack trace: Exception in thread "main" java.lang.NullPointerException at org.apache.poi.poifs.macros.VBAMacroReader.readMacros(VBAMacroReader.java:258) at org.apache.poi.poifs.macros.VBAMacroReader.findMacros(VBAMacroReader.java:148) at org.apache.poi.poifs.macros.VBAMacroReader.findMacros(VBAMacroReader.java:153) at org.apache.poi.poifs.macros.VBAMacroReader.findMacros(VBAMacroReader.java:153) at org.apache.poi.poifs.macros.VBAMacroReader.findMacros(VBAMacroReader.java:153) at org.apache.poi.poifs.macros.VBAMacroReader.readMacros(VBAMacroReader.java:115) at poitester.POITester.main(POITester.java:39)
A module offset was not set before trying to read the stream. https://svn.apache.org/viewvc/poi/trunk/src/java/org/apache/poi/poifs/macros/VBAMacroReader.java?revision=1738674&view=markup#l258
Added unit test that reproduces the problem in r1752776.
Replaced NullPointerException with IOException with an error message of the name of the module that the VBAMacroReader failed to read in r1752778.
This file has two _VBA_PROJECT_CUR directories; both have Sheet1, Sheet2 and Sheet3 and thisWorkbook. The _VBA_PROJECT_CUR under MDB00082648 has only empty (zero-byte) Sheet1, etc.; whereas the _VBA_PROJECT_CUR under root has meaningful content. We are keying only off the name of the stream (e.g. "Sheet2") in our module map. This means that we're overwriting (or skipping) the other "Sheet2". For now, I propose checking if the module.buf is null. If it is, then we expect an offset and we can read go about reading it. Longer term, we might consider a way to prevent overwriting/skipping of streams with the same module name? Perhaps this is what is meant by "TODO Refactor this to fetch dir then do the rest"?
r1765479 For now, I've added a check to see if we've already read the module with that name.