For the PowerPoint 97 file attached to the JIRA issue below, the embedded WMF file fails to be read within the WMF class as it appears not be compressed. See https://issues.apache.org/jira/browse/TIKA-1046 for more details.
Similar issue re-discovered in https://issues.apache.org/jira/browse/TIKA-1612 during analysis of most common caught exceptions in govdocs1. However, for the file I posted in TIKA-1612, I'm not sure that there is a valid WMF file that is being extracted. See TIKA-1612 for the example file and the result of getRawBytes().
The error was, that some pictures have two checksum/UID fields, this was ignored up till now - see [MS-ODRAW] 2.2.25 OfficeArtBlipWMF fixed with r1687398 ... hopefully I don't forget to merge it back when I merge the common_sl branch ... :|
Thank you for fixing this! With the ppt on TIKA-1612, I'm no longer getting an exception. Great! However, the bytes that I'm extracting (with .getData())aren't valid png (or any other image, as far as I can tell). Is there something else going on, too? Should I open a separate issue?
Created attachment 32856 [details] file that triggers issue from govdocs1
The new testcase uses the two tika files [TIKA-1612]/[TIKA-1046]. I could extract the WMF and open it with irfanview. Please drop me an email and I'll check it next week - I don't have much WIFI available until Sunday night ... Andi
Tim, WMF is its own format, not png or jpeg, see https://en.wikipedia.org/wiki/Windows_Metafile, Windows machines will display it, not sure about Java support, though.
Doh! User error, of course...WMF not PNG. Sorry! And thank you, again!