Bug 54021

Summary: Cannot read message body from 2010 MSG file
Product: POI Reporter: wernernaehle
Component: HSMFAssignee: POI Developers List <dev>
Status: RESOLVED WORKSFORME    
Severity: major    
Priority: P2    
Version: 3.8-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: This is a MSG file example
unit test

Description wernernaehle 2012-10-17 17:55:59 UTC
Hi, 

I have problems with differents MSG files. I open a file MSG with Oulook 2010 (32bit) and I can not read the body because the Outlook do not detect the content of the body. 

But I know that the MSG file has a body and I can read the body when I open the body with the Internet Explorer. 
It can make so: Actions-->View in the Browser 

Then, the POI library can neither processed the body from the MSG file. 

Can anybody help me? 

Thanks, 

Warmest regards. 

Werner
Comment 1 Nick Burch 2012-10-17 18:08:43 UTC
Can you please upload a problematic file, and ideally a testcase that shows what you're trying to do with POI and what isn't working?
Comment 2 wernernaehle 2012-10-18 07:26:16 UTC
Created attachment 29494 [details]
This is a MSG file example
Comment 3 wernernaehle 2012-10-18 07:28:19 UTC
Firstly, thanks for your answer.

Ok, I have uploaded a example file.

Regards
Comment 4 wernernaehle 2012-10-18 07:45:32 UTC
Hi, 

I use the following code to get a embedded MSG file:

MAPIMessage msg = new MAPIMessage(INPUTFILE);
AttachmentChunks[] attachments = msg.getAttachmentFiles();

for(AttachmentChunks attachment : attachments) {

   DirectoryChunk chunkDirectory = attachment.attachmentDirectory;
   MAPIMessage attachmentMSG = chunkDirectory.getAsEmbededMessage();
Comment 5 wernernaehle 2012-10-18 08:06:31 UTC
Hi, 

I use the following code to get a embedded MSG file:

MAPIMessage msg = new MAPIMessage(INPUTFILE);
AttachmentChunks[] attachments = msg.getAttachmentFiles();

for(AttachmentChunks attachment : attachments) {

   DirectoryChunk chunkDirectory = attachment.attachmentDirectory;
   MAPIMessage attachmentMSG = chunkDirectory.getAsEmbededMessage();
   String body = attachmentMSG.getTextBody(); 
}

The body is null because the library can not detect the content of the body and Outlook 2010 neither. But it can read the body with the Internet Explorer.
Comment 6 Javen O'Neal 2017-01-05 10:38:42 UTC
Created attachment 34595 [details]
unit test

You got a ChunkNotFoundException when calling getTextBody() because the message had an Html body instead of a plain text body. Only one body type (plain text, html, rtf) can be saved in a message.

This is the error that you would get if you checked getTextBody() on a message containing an html body.

Chunk not found
org.apache.poi.hsmf.exceptions.ChunkNotFoundException: Chunk not found
        at org.apache.poi.hsmf.MAPIMessage.getStringFromChunk(MAPIMessage.java:181)
        at org.apache.poi.hsmf.MAPIMessage.getTextBody(MAPIMessage.java:194)
        at org.apache.poi.hsmf.TestFileWithAttachmentsRead.test54021_read_text_body(TestFileWithAttachmentsRead.java:190)
Comment 7 Javen O'Neal 2017-01-05 12:53:04 UTC
(In reply to Javen O'Neal from comment #6)
>  Only one body type (plain text, html, rtf) can be saved in a message.
It appears that this is a false statement.

A message can contain more than one body type, as indicated in test-data/hpsf/outlook_30_msg.msg from scratchpad/testcases/o.a.p.hpsf.TestBasics#testBody [1] in r1777463. However, a message is not required to store a plain text chunk. It's up to the mail client to figure out how to display a message by rendering the HTML or RTF if a text version is not present (plain text seems like a fallback anyways).

[1] http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/testcases/org/apache/poi/hsmf/TestBasics.java?r1=1777463&r2=1777462&pathrev=1777463