Created attachment 22563 [details] Changes to the StringChunk class This applies to the hsmf component in the scratchpad area. This might be the same bug as 45048. Instantiating a MAPIMessage object on an Outlook 3.0 .msg file does not permit access to any of the items such as the subject or the content. For example, calling getSubject() triggers a ChunkNotFoundException. The simple solution was to add a new constructor to StringChunk which lets you specify both the chunkId and the type. Previously, the type defaulted to Types.STRING, which doesn't match any of the string items in my .msg files. With the new constructor, I can retrieve both the message header and content as follows: MAPIMessage msg = new MAPIMessage("test.msg"); String header = msg.getStringFromChunk(new StringChunk(0x007D, 0x001F)); String content = msg.getStringFromChunk(StringChunk(0x1000, 0x001F));
Thanks for this patch. Any chance you could upload a sample file that triggers this problem, so we can add a test, and we can also investigate if we should tweak the main class too
Created attachment 22571 [details] An example Outlook 3.0 .msg file This is an email file I extracted from Outlook 3.0 by copying it, then pasting it into a directory in Windows XP explorer.
For the attached .msg file, all of the string items in the message have labels of the form: "__substg1.0_0078001F". The last 8 hex digits are comprised of the chunkId and the type. For test.msg, the type is always 0x001F. The chunkId varies depending on whether it's the subject, text body, from, to etc.. But the StringChunk class, without my changes, always uses 0x001E as the type, so the methods of MAPIMessage always throw a ChunkNotFoundException. The message items are stored in a hash map in an object of POIFSChunkParser. These items are supposed to be retrieved using the appropriate StringChunks as keys. The new constructor for StringChunk lets me create appropriate StringChunks for my .msg files. I could've fixed the problem by redefining Types.STRING as 0x001F in org.apache.poi.hsmf.datatypes.Types. But I figured that 0x001E probably works with whatever .msg files this was first developed for. Maybe the lib could detect the version of the .msg file and define the StringChunk type appropriately as either 0x001E or 0x001F.
Thanks for the patch, test file and investigations In the end, I've got HSMF working with both the old and the new style outlook files. Hopefully this'll work even better for you now! :) In general though, HSMF isn't being actively developed right now (Travis isn't around ATM), so any further patches for HSMF are greatfully received!