Usecase: Apache Tika text extract on signed Outlook Mail file (uses POI) Bug: Text extract from PDF attachment is omitted. Reason: msg.getAttachmentFiles() is empty Log message: Warn poi.hsmf.MAPIMessage 127.0.0.1/38 I don't recognize message class 'IPM.Note.SMIME.MultipartSigned'. Please open an issue on POI's bugzilla Example: see attachment here
Attachment > 1mb - link here: https://drive.google.com/file/d/1Do4JB-umviF5-xTRTjsx0D6r60V1rTOe/view?usp=sharing
Are you able to produce a much smaller file that shows the same bug, that we could use for unit tests etc? We try to avoid large files in the test suite where possible From a quick look at the file supplied, it seems to be much the same as a normal outlook file, with an additional smime.p7m attachment. (Plus a few unknown + unsupported chunks)
Created attachment 38937 [details] Outlook-Mail with and without signature
ZIP Attachment with Outlook E-Mail with and without signature case signed: only one chunk by msg.getAttachmentFiles() case unsigned: two chunks: pdf and word file Perhaps its a bug in the Apache Tika class org.apache.tika.parser.microsoft.OutlookExtractor
sorry, edit for last comment: case with signature: only one chunk by msg.getAttachmentFiles() case without signature: two chunks: pdf and word file