The loading of a document that contains images of type 'image/png;base64' fails. Caused by: org.apache.poi.openxml4j.exceptions.InvalidFormatException: The specified content type 'image/png;base64' is not compliant with RFC 2616: malformed content type. at org.apache.poi.openxml4j.opc.internal.ContentType.<init>(ContentType.java:154) at org.apache.poi.openxml4j.opc.ZipPackagePart.<init>(ZipPackagePart.java:83) at org.apache.poi.openxml4j.opc.ZipPackage$EntryTriple.register(ZipPackage.java:334) at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:291) at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:742) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:315) at org.apache.poi.ooxml.util.PackageHelper.open(PackageHelper.java:47) There is a stripped down example at https://github.com/Portree-Kid/testdoc
Created attachment 37872 [details] Example document
Noone else has reported a similar issue and the docx [Content_Types].xml just seems wrong. <Override PartName="/word/media/rId21.png" ContentType="image/png;base64" /> rId21.png is not base64 encoded - it is a valid png file without base64 encoding if this was a common issue, I would agree with hacking POI to handle it - but so far, this seems like a bug in whatever app produced the attached docx
Sounds like a problem with the application which produces these files. Unless it happens more often, we do not plan to introduce more graceful parsing/handling of such files in Apache POI.