Created attachment 38182 [details] unit test that creates an xlsx file with XSSFWorkbook and SXSSFWorkbook and uses ZipInputStream to read the entries A java.util.zip.ZipException occurs when reading the entries of an xlsx zip file created with SXSSFWorkbook after 5.0.0. The error does not occur on xlsx zip files created with XSSFWorkbook after 5.0.0. The error does not occur on xlsx zip files created with either class in 4.2.2 and below The size of the [ContentTypes].xml appears to be incorrect in the zip entry. java.util.zip.ZipException: invalid entry size (expected 0 but got 1053 bytes) at java.base/java.util.zip.ZipInputStream.readEnd(ZipInputStream.java:398) at java.base/java.util.zip.ZipInputStream.read(ZipInputStream.java:197) at com.e2open.issues.poi.XSSFZipTest.readEntry(XSSFZipTest.java:121) at com.e2open.issues.poi.XSSFZipTest.testSXSSFWorkbook(XSSFZipTest.java:56) I've attached a unit test that demonstrates the error. The unit test attempts to read the [ContentTypes].xml file. It is able on the workbook created by XSSFWorkbook but unable for the SXSSFWorkbook. Note that the error also occurs if simply trying to iterate the zip entries via ZipInputStream.nextEntry() and ZipInputSTream.closeEntry().
Additional info: * fails on both windows and linux * we're using java 11 openjdk version "11.0.6" 2020-01-14 OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.6+10) OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.6+10, mixed mode)
POI has switched away from using Java Zip classes because of issues that were reported to us about large zip files[1]. We use Apache Commons Compress. Could you try unzipping using commons-compress? https://commons.apache.org/proper/commons-compress/examples.html [1] https://rzymek.github.io/post/excel-zip64/
I would also recommend that you switch to a newer Java runtime like Java 11.0.12 - 11.0.6 has numerous issues that have been fixed in patch releases.
I took your code and rewrote some of it (successfully) to read the zip file using commons-compress. import org.apache.commons.compress.archivers.zip.ZipArchiveEntry; import org.apache.commons.compress.archivers.zip.ZipFile; protected String readEntry2(File xlsxFile, String entryName) throws IOException, FileNotFoundException { assertNotNull(xlsxFile); assertNotNull(entryName); String entryContent = null; try (ZipFile zf = new ZipFile(xlsxFile)) { Enumeration<ZipArchiveEntry> entryEnum = zf.getEntries(); while (entryContent == null && entryEnum.hasMoreElements()) { ZipArchiveEntry ze = entryEnum.nextElement(); String fileName = ze.getName(); if (fileName.equals(entryName)) { byte[] buf = new byte[4096]; ByteArrayOutputStream bos = new ByteArrayOutputStream(); int len; try (InputStream zis = zf.getInputStream(ze)) { while ((len = zis.read(buf, 0, buf.length)) != -1) { bos.write(buf, 0, len); } } entryContent = new String(bos.toByteArray(), StandardCharsets.UTF_8); assertNotNull(entryContent); assertTrue(entryContent.length() > 0); } } } return entryContent; }
Created attachment 38183 [details] Updated unit test with test w/ commons-compress
Man, I'm sure i added a comment but i don't see it... I was able to read the entries using commons-compress and i updated the attached test case to include the change...I suspect it's very close to the change you posted. I don't think, though, that we'll be able to use the commons-compress approach. We have a common/shared component that is generating the xlsx files. Those files are being delivered to about 100 different client deployments. The client deployments are using the java zip api to check the zip entries but it's the shared component that we're trying to update to poi 5.x
The Java Zip code is not as good as commons-compress and has this terrible bug where it can't handle zip64 files. It is not the Apache POI team's problem if you don't want to use commons-compress. With SXSSFWorkbook, you could use https://poi.apache.org/apidocs/dev/org/apache/poi/xssf/streaming/SXSSFWorkbook.html#setZip64Mode-org.apache.commons.compress.archivers.zip.Zip64Mode- Zip64Mode.Never would probably ensure the xlsx files are readable by Java's buggy zip code - Zip64Mode.AsNeeded should work too and if it does then this is a better choice than Never.
It's not a case that I don't want to use commons-compress...it's simply that we have a large set of deployments, that aren't easily changed, that are unable to consume/process the spreadsheets generated by the SXSSFWorkbook post 5.0. That being said, I understand your perspective. I appreciate the pointer for the Zip64 flags and I'll try them out.
I've done some testing with various Zip64 flags and I think the Zip64.AsNeeded flag will work for us. Thanks for the information.