Bug 63955 - HMEFContentsExtractor fails to extract content from winmail.dat
Summary: HMEFContentsExtractor fails to extract content from winmail.dat
Alias: None
Product: POI
Classification: Unclassified
Component: HMEF (show other bugs)
Version: 4.1.1-FINAL
Hardware: PC Linux
: P2 critical (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Blocks: 66335
  Show dependency tree
Reported: 2019-11-22 15:46 UTC by Andreas Joseph Krogh
Modified: 2022-11-02 19:16 UTC (History)
0 users

Unparseable winmail.dat (503.79 KB, application/octet-stream)
2019-11-22 16:00 UTC, Andreas Joseph Krogh
screenshot of Run-configuration in IDEA (67.21 KB, image/png)
2019-11-22 17:53 UTC, Andreas Joseph Krogh

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Joseph Krogh 2019-11-22 15:46:00 UTC
After upgrading poi-scratchpad to 4.1.1 I'm getting this exception trying HMEFContentsExtractor.main:

Exception in thread "main" java.lang.IllegalArgumentException: Unknown type 72 / 0x0048 - CLS ID GUID @ 16
	at org.apache.poi.hmef.attribute.MAPIAttribute.getLength(MAPIAttribute.java:204)
	at org.apache.poi.hmef.attribute.MAPIAttribute.create(MAPIAttribute.java:170)
	at org.apache.poi.hmef.attribute.TNEFMAPIAttribute.<init>(TNEFMAPIAttribute.java:41)
	at org.apache.poi.hmef.attribute.TNEFAttribute.create(TNEFAttribute.java:71)
	at org.apache.poi.hmef.HMEFMessage.processMessage(HMEFMessage.java:99)
	at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:81)
	at org.apache.poi.hmef.HMEFMessage.<init>(HMEFMessage.java:66)
	at org.apache.poi.hmef.extractor.HMEFContentsExtractor.<init>(HMEFContentsExtractor.java:74)
	at org.apache.poi.hmef.extractor.HMEFContentsExtractor.main(HMEFContentsExtractor.java:58)

This worked fine with v4.1.0.
OS: Ubuntu Linux 19.10
JAVA: 13.0.1
Comment 1 PJ Fanning 2019-11-22 15:50:50 UTC
thanks for the issue - could you provide a sample file as it helps us debug the issue and will help us form a regression corpus?
Comment 2 Andreas Joseph Krogh 2019-11-22 16:00:53 UTC
Created attachment 36894 [details]
Unparseable winmail.dat

This doesn't parse using 4.1.1, but works fine using 4.1.0
Comment 3 Dominik Stadler 2019-11-22 17:32:18 UTC
Are you sure it worked with 4.1.0? I tried it quickly, but it also failed with 4.1.0 and there were no changes in the HMEF area which would cause such a regression.
Comment 4 Andreas Joseph Krogh 2019-11-22 17:45:33 UTC
It is working in my project when downgrading, and it stops working when upgrading again.

I'll try to make a separate maven-project for reproducing.
Comment 5 Andreas Joseph Krogh 2019-11-22 17:46:12 UTC
It is working in my project when downgrading, and it stops working when upgrading again.

I'll try to make a separate maven-project for reproducing.
Comment 6 Andreas Joseph Krogh 2019-11-22 17:53:25 UTC
Created attachment 36895 [details]
screenshot of Run-configuration in IDEA
Comment 7 Andreas Joseph Krogh 2019-11-22 17:55:27 UTC
This project: https://github.com/andreak/tnefextractorfail
fails using the run-configuration in the attached screenshot.

Changing the poi-scratchpad version to 4.1.0 with this property


makes it work again.
Comment 8 Andreas Joseph Krogh 2019-11-23 08:17:38 UTC
FWIW; jtnef parses it fine: https://www.freeutils.net/source/jtnef/
Comment 9 Andreas Joseph Krogh 2019-11-23 13:12:21 UTC
Are you able to reproduce this, ie. make it work with 4.1.0 and not 4.1.1?
Let me know if there's anything more I can do.
Comment 10 Dominik Stadler 2019-11-23 15:56:28 UTC
Thanks for the details, "git bisect" identifies the following commit as causing this regression:

162f69655fc9146c94dfc4b4e101cbaf46255356 is the first bad commit
commit 162f69655fc9146c94dfc4b4e101cbaf46255356
Date:   Wed Apr 17 20:18:29 2019 +0000

    #github-143 - MAPIType.isFixedLength: not true in case of length > 8

See also r1857708
Comment 11 Andreas Joseph Krogh 2019-11-23 16:28:45 UTC
Glad you found it, looking forward to next release:-)
Comment 12 Andreas Beeker 2019-12-01 21:09:09 UTC
Fixed via r1870692

May I use your example winmail.dat in our corpus or can you provide an anonymized one? (this would help to keep the extraction valid)
Comment 13 Andreas Joseph Krogh 2019-12-01 21:51:52 UTC
Use it, it's fine.
Comment 14 Andreas Beeker 2020-01-08 00:50:32 UTC
added the example file via r1872480 and optimized a few unit tests