Created attachment 36245 [details] Example file While testing Tika-1.19.1, POI throws the following exception with some corrupt docx files (MS Word complains but fixes them) previously handled without problems by POI-3.17. See TIKA-2765 for more info. Stacktrace bellow: org.apache.poi.openxml4j.exceptions.InvalidOperationException: Could not open the specified zip entry source stream at org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:214) at org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:196) at org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:170) at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:151) at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:123) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:234) at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:81) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:110) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 43 more Caused by: java.io.EOFException at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.readFully(ZipArchiveInputStream.java:803) at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.readFully(ZipArchiveInputStream.java:795) at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.skipRemainderOfArchive(ZipArchiveInputStream.java:1014) at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextZipEntry(ZipArchiveInputStream.java:257) at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.getNextEntry(ZipArchiveThresholdInputStream.java:139) at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:47) at org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:212) ... 51 more{code}
fixed via r1849252
Thank you, Andreas!