Created attachment 30916 [details] Outlook MSG that gives the above error Hello I've a message (attached to this bug), saved from Outlook 2010, that gives me ChunkNotFoundException when I try to call getRtfBody Could you please check if there's a bug ? I'm using the latest stable release 3.9 on Android 4.0.1 Thanks a lot ! Paolo
Are you sure your outlook file actually has a RTF part? (Not all of them do)
(In reply to Nick Burch from comment #1) > Are you sure your outlook file actually has a RTF part? (Not all of them do) Don't know for sure (don't know how to read a MSG by hand like an EML) but in Outlook it shows some richly formatted text.
Not all richly formatted text in Outlook is done using RTF! Does your message have a HTML chunk instead? Take a look at the Tika Outlook parser if you want a detailed example of using HSMF to process msg files: https://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java
(In reply to Nick Burch from comment #3) > Not all richly formatted text in Outlook is done using RTF! Does your > message have a HTML chunk instead? > > Take a look at the Tika Outlook parser if you want a detailed example of > using HSMF to process msg files: > https://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/ > apache/tika/parser/microsoft/OutlookExtractor.java You're right, but I already get text, html and RTF and looks like there was none. So I thought there may be some kind of bug, since Outlook showed some formatted text. Here is and extract of my code: try { this.messaggioHTML = msg.getHtmlBody(); if (MainActivity.DEBUG) { android.util.Log.d(MainActivity.TAG, "HTML Body: " + this.messaggioHTML); } } catch (ChunkNotFoundException e) { android.util.Log.e(MainActivity.TAG, "HTML Body: not found"); this.messaggioHTML = ""; } try { this.messaggioTesto = msg.getTextBody(); if (MainActivity.DEBUG) { android.util.Log.d(MainActivity.TAG, "TXT Body: " + this.messaggioTesto); } } catch (ChunkNotFoundException e) { android.util.Log.e(MainActivity.TAG, "TXT Body: not found"); this.messaggioTesto = ""; } try { String messaggioRtf = msg.getRtfBody(); if (MainActivity.DEBUG) { android.util.Log.d(MainActivity.TAG, "RTF Body: " + messaggioRtf); } } catch (ChunkNotFoundException e) { android.util.Log.e(MainActivity.TAG, "RTF Body: not found"); } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); }
Outlook files tend to have one or two out of plain, rtf and html. It's very rare to have all 3. If your file only has rtf, and you really wanted something like html, you'd be best off using Apache Tika as that can convert for you
(In reply to Nick Burch from comment #5) > Outlook files tend to have one or two out of plain, rtf and html. It's very > rare to have all 3. If your file only has rtf, and you really wanted > something like html, you'd be best off using Apache Tika as that can convert > for you Maybe I didn't explain myself correctly. The attached example apparently has NO plain text NO HTML NO RTF according to Apache POI. But since I see text when opening in Outlook, I think there may be a problem. Did you test the MSG attachment to confirm my report ?
What kind of additional information do you need ? There's a MSG attached that to my tests shows this anomaly (no plain, no HTML and no RTF, yet when opened on Outlook presents formatted text). Don't know what's the problem, but I think I gave ample information to investigate... Please let me know.
Are you able to use one of the tools like POIFSViewer or POIFSDump to identify which chunk (POIFS Entry) actually contains your text? That will help us narrow down what's wrong
(In reply to Nick Burch from comment #8) > Are you able to use one of the tools like POIFSViewer or POIFSDump to > identify which chunk (POIFS Entry) actually contains your text? That will > help us narrow down what's wrong Thanks for the tip. I'll try that and get back with relevant information. Cheers
No update for a long time, therefore I am closing this for now, please reopen with the promised additional information if this is still an issue for you.
An old problem, I realise, but here's some extra information that explains the symptoms (and why this *isn't* a POI issue): The email message class is: IPM.Note.SMIME.MultipartSigned. This means it's a digitally signed email with modifications effectively disabled. As a signed email, this means the content is held in an attachment (one of "smime.p7m", "smime.txt" or "smime.p7s"). In this instance, the signed content is held in "smime.p7m" (which is pretty much the most common place). The content is of this attachment is mime encoded. Look for the "multipart/alternative" set, then pick out which part is best suited to your needs (I usually just pick out the "text/plain" part for body text extraction). Apache James Mime4j should do the trick...