Bug 67083 - Mail attachment is omitted on signed Outlook files
Summary: Mail attachment is omitted on signed Outlook files
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: HSMF (show other bugs)
Version: 5.2.3-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-30 10:16 UTC by Rainer Schnitker
Modified: 2023-08-30 15:13 UTC (History)
0 users



Attachments
Outlook-Mail with and without signature (109.07 KB, application/zip)
2023-08-30 15:07 UTC, Rainer Schnitker
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rainer Schnitker 2023-08-30 10:16:27 UTC
Usecase: Apache Tika text extract on signed Outlook Mail file (uses POI)

Bug: Text extract from PDF attachment is omitted.
Reason:  msg.getAttachmentFiles() is empty

Log message:
Warn poi.hsmf.MAPIMessage 127.0.0.1/38 I don't recognize message class 'IPM.Note.SMIME.MultipartSigned'. Please open an issue on POI's bugzilla

Example: see attachment here
Comment 1 Rainer Schnitker 2023-08-30 10:32:18 UTC
Attachment > 1mb - link here:
https://drive.google.com/file/d/1Do4JB-umviF5-xTRTjsx0D6r60V1rTOe/view?usp=sharing
Comment 2 Nick Burch 2023-08-30 11:06:15 UTC
Are you able to produce a much smaller file that shows the same bug, that we could use for unit tests etc? We try to avoid large files in the test suite where possible

From a quick look at the file supplied, it seems to be much the same as a normal outlook file, with an additional smime.p7m attachment. (Plus a few unknown + unsupported chunks)
Comment 3 Rainer Schnitker 2023-08-30 15:07:14 UTC
Created attachment 38937 [details]
Outlook-Mail  with and without signature
Comment 4 Rainer Schnitker 2023-08-30 15:09:58 UTC
ZIP Attachment with Outlook E-Mail with and without signature

case signed:  only one chunk by msg.getAttachmentFiles()

case unsigned:  two chunks:  pdf and word file

Perhaps its a bug in the Apache Tika class org.apache.tika.parser.microsoft.OutlookExtractor
Comment 5 Rainer Schnitker 2023-08-30 15:13:47 UTC
sorry, edit for last comment:

case with signature:    only one chunk by msg.getAttachmentFiles()

case without signature: two chunks:  pdf and word file