Bug 59786 - NPE at HMEFContentsExtractor.java:78
Summary: NPE at HMEFContentsExtractor.java:78
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HMEF (show other bugs)
Version: 3.15-dev
Hardware: PC Mac OS X 10.1
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-07-03 12:09 UTC by Sebb
Modified: 2016-07-15 18:38 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sebb 2016-07-03 12:09:34 UTC
Mini:poi-3.15-beta2 sebb$ java -classpath poi-3.15-beta2.jar:poi-scratchpad-3.15-beta2.jar org.apache.poi.hmef.extractor.HMEFContentsExtractor ../../TEMP/winmail.dat x
Extracting...
Exception in thread "main" java.lang.NullPointerException
	at org.apache.poi.hmef.extractor.HMEFContentsExtractor.extractMessageBody(HMEFContentsExtractor.java:78)
	at org.apache.poi.hmef.extractor.HMEFContentsExtractor.main(HMEFContentsExtractor.java:57)
Comment 1 Sebb 2016-07-03 12:15:39 UTC
The following patch works for me (against the REL_3_15_BETA2 tag)

### Eclipse Workspace Patch 1.0
#P ApachePOI
Index: src/scratchpad/src/org/apache/poi/hmef/extractor/HMEFContentsExtractor.java
===================================================================
--- src/scratchpad/src/org/apache/poi/hmef/extractor/HMEFContentsExtractor.java	(revision 1751146)
+++ src/scratchpad/src/org/apache/poi/hmef/extractor/HMEFContentsExtractor.java	(working copy)
@@ -75,7 +75,9 @@
       try {
           MAPIRtfAttribute body = (MAPIRtfAttribute)
              message.getMessageMAPIAttribute(MAPIProperty.RTF_COMPRESSED);
-          fout.write(body.getData());
+          if (body != null) {
+              fout.write(body.getData());
+          }
       } finally {
     	  fout.close();
       }
Comment 2 Nick Burch 2016-07-03 13:21:19 UTC
Any chance you could share a small file that triggers the bug? We can then use that for a unit test, to ensure it's both fixed, and stays fixed!
Comment 3 Sebb 2016-07-03 14:04:03 UTC
Sorry, no I cannot; the attachement was from a private mail.

However it should not be necessary.

The Javadoc for the method getMessageMAPIAttribute says it may return null, so clearly any callers need to allow for this.
Comment 4 Sebb 2016-07-03 14:57:34 UTC
I ran the Dumper, and the original body text was shown as below:

(unknown 3fd9) [16345] <original body text>

Maybe that tag is sometimes used instead of MAPIProperty.RTF_COMPRESSED?

The message was quite short (ca. 127 ASCII chars); perhaps that's relevant.

Otherwise the message just contained an image (PNG) attachment.
This was extracted fine.

BTW, the Dumper showed a lot of other "(unknown xxxx)" tags.
Comment 5 Javen O'Neal 2016-07-03 23:15:16 UTC
Applied in r1751180.

Added some unit tests for HMEFContentsExtractor because there were none, but still not testing the case where the message body is null.
Comment 6 Nick Burch 2016-07-04 18:07:21 UTC
As best as I can without the file, I've had a go in r1751361 at adding extraction support for this non-standard uncompressed file. (I tried, but couldn't get Outlook to generate one like yours for testing). Any chance you could try with your test file, and report how HMEFContentsExtractor works on your file with that change in?
Comment 7 Sebb 2016-07-04 21:29:19 UTC
[It's a bit of a pain building POI, because compile-ooxml-lite unexpectedly runs loads of tests which don't always work for me, and even if they do, they take a long time and produce lots of output]

I have now got "ant jar" to complete after commenting out the target compile-ooxml-lite.

However it does not extract any body text; I get:

Extracting...
No message body found, POI/message.rtf not created
Extraction completed

A bit of experimentation shows that the new attribute has to be defined as a new entry in MAPIProperty.java; if this is done it is then picked up by the dumper as well. 

Using MAPIProperty.createCustom locally does not work; the method getMessageMAPIAttribute returns null.

There's another change that needs to be done: if the body is an instance of MAPIStringAttribute (as here) then writing the byte data directly to the file produces an unreadable file which appears to be in UTF-16LE (no BOM).

Here's a very crude patch that works for me:

### Eclipse Workspace Patch 1.0
#P ApachePOI
Index: src/scratchpad/src/org/apache/poi/hmef/extractor/HMEFContentsExtractor.java
===================================================================
--- src/scratchpad/src/org/apache/poi/hmef/extractor/HMEFContentsExtractor.java	(revision 1751374)
+++ src/scratchpad/src/org/apache/poi/hmef/extractor/HMEFContentsExtractor.java	(working copy)
@@ -95,7 +95,12 @@
         
         OutputStream fout = new FileOutputStream(dest);
         try {
-            fout.write(body.getData());
+            if (body instanceof MAPIStringAttribute) {
+                fout.write(((MAPIStringAttribute) body).getDataString().getBytes()); // TODO fix the output charset
+            } else {
+                fout.write(body.getData());
+            }
         } finally {
             fout.close();
         }
@@ -104,13 +109,7 @@
     protected MAPIAttribute getBodyAttribute() {
         MAPIAttribute body = message.getMessageMAPIAttribute(MAPIProperty.RTF_COMPRESSED);
         if (body != null) return body;
-        
-        // See bug #59786 - we'd really like a test file to confirm if this
-        //  is the right properties + if this is truely general or not!
-        MAPIProperty uncompressedBody = 
-                MAPIProperty.createCustom(0x3fd9, Types.ASCII_STRING, "Uncompressed Body");
-        // Return this uncompressed one, or null if that isn't their either
-        return message.getMessageMAPIAttribute(uncompressedBody);
+        return message.getMessageMAPIAttribute(MAPIProperty.WINMAILNEW);
     }
     
     /**


Where the following was added to MAPIProperty.java:

   public static final MAPIProperty WINMAILNEW = // TODO fix the names!
      new MAPIProperty(0x3fd9, Types.UNICODE_STRING, "Uncompressed Body","WINMAILNEW");
Comment 8 Nick Burch 2016-07-04 22:52:26 UTC
I'm reluctant to add a well-known MAPIProperty for it, until we know more about what it is. Searching the Microsoft published specs all seem to suggest it's in an un-used range, and none of the ID listings give a name or similar for it

I've fixed things up so that custom properties can now correctly be retrieved by HMEFMessage, and set HMEFContentsExtractor to use a predictable (UTF8) encoding for the raw strings rather than whatever encoding they happened to have in the source file. Does that now behave right for you?

The ooxml-tests and lite-building ought to run relatively quickly and without too many errors - it produces ~50 lines of output on my laptop, and runs in just over a minute. If that isn't the case, that's an issue for another bug!
Comment 9 Sebb 2016-07-05 10:37:37 UTC
(In reply to Nick Burch from comment #8)
> I'm reluctant to add a well-known MAPIProperty for it, until we know more
> about what it is. Searching the Microsoft published specs all seem to
> suggest it's in an un-used range, and none of the ID listings give a name or
> similar for it

OK

> I've fixed things up so that custom properties can now correctly be
> retrieved by HMEFMessage, and set HMEFContentsExtractor to use a predictable
> (UTF8) encoding for the raw strings rather than whatever encoding they
> happened to have in the source file. Does that now behave right for you?

Yes, works OK now.
 
> The ooxml-tests and lite-building ought to run relatively quickly and
> without too many errors - it produces ~50 lines of output on my laptop, and
> runs in just over a minute. If that isn't the case, that's an issue for
> another bug!

Takes nearly two minutes on my system and runs almost 2000 tests.
And it runs on every minor change.
I've raised Bug 59799.
Comment 10 Dominik Stadler 2016-07-15 18:38:05 UTC
The main issue here is fixed as far  as I see, right? There is another to discuss the build.xml stuff. If not, then please reopen this one as well.