Bug 65865 - java.util.zip.ZipException reading xlsx file created with SXSSFWorkbook
Summary: java.util.zip.ZipException reading xlsx file created with SXSSFWorkbook
Status: RESOLVED INFORMATIONPROVIDED
Alias: None
Product: POI
Classification: Unclassified
Component: SXSSF (show other bugs)
Version: 5.1.0-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-02-02 15:51 UTC by Jerry Williamson
Modified: 2022-02-03 18:08 UTC (History)
0 users



Attachments
unit test that creates an xlsx file with XSSFWorkbook and SXSSFWorkbook and uses ZipInputStream to read the entries (4.42 KB, text/x-csrc)
2022-02-02 15:51 UTC, Jerry Williamson
Details
Updated unit test with test w/ commons-compress (6.45 KB, text/x-csrc)
2022-02-02 19:21 UTC, Jerry Williamson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jerry Williamson 2022-02-02 15:51:23 UTC
Created attachment 38182 [details]
unit test that creates an xlsx file with XSSFWorkbook and SXSSFWorkbook and uses ZipInputStream to read the entries

A java.util.zip.ZipException occurs when reading the entries of an xlsx zip file created with SXSSFWorkbook after 5.0.0.
The error does not occur on xlsx zip files created with XSSFWorkbook after 5.0.0.
The error does not occur on xlsx zip files created with either class in 4.2.2 and below

The size of the [ContentTypes].xml appears to be incorrect in the zip entry.

java.util.zip.ZipException: invalid entry size (expected 0 but got 1053 bytes)
	at java.base/java.util.zip.ZipInputStream.readEnd(ZipInputStream.java:398)
	at java.base/java.util.zip.ZipInputStream.read(ZipInputStream.java:197)
	at com.e2open.issues.poi.XSSFZipTest.readEntry(XSSFZipTest.java:121)
	at com.e2open.issues.poi.XSSFZipTest.testSXSSFWorkbook(XSSFZipTest.java:56)

I've attached a unit test that demonstrates the error.
The unit test attempts to read the [ContentTypes].xml file.
It is able on the workbook created by XSSFWorkbook but unable for the SXSSFWorkbook.

Note that the error also occurs if simply trying to iterate the zip entries via ZipInputStream.nextEntry() and ZipInputSTream.closeEntry().
Comment 1 Jerry Williamson 2022-02-02 16:08:06 UTC
Additional info:
* fails on both windows and linux
* we're using java 11

openjdk version "11.0.6" 2020-01-14
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.6+10)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.6+10, mixed mode)
Comment 2 PJ Fanning 2022-02-02 17:45:42 UTC
POI has switched away from using Java Zip classes because of issues that were reported to us about large zip files[1]. We use Apache Commons Compress. Could you try unzipping using commons-compress? https://commons.apache.org/proper/commons-compress/examples.html

[1] https://rzymek.github.io/post/excel-zip64/
Comment 3 PJ Fanning 2022-02-02 18:12:09 UTC
I would also recommend that you switch to a newer Java runtime like Java 11.0.12 - 11.0.6 has numerous issues that have been fixed in patch releases.
Comment 4 PJ Fanning 2022-02-02 18:40:39 UTC
I took your code and rewrote some of it (successfully) to read the zip file using commons-compress.

import org.apache.commons.compress.archivers.zip.ZipArchiveEntry;
import org.apache.commons.compress.archivers.zip.ZipFile;

    protected String readEntry2(File xlsxFile, String entryName) throws IOException, FileNotFoundException {
        assertNotNull(xlsxFile);
        assertNotNull(entryName);

        String entryContent = null;

        try (ZipFile zf = new ZipFile(xlsxFile)) {
            Enumeration<ZipArchiveEntry> entryEnum = zf.getEntries();
            while (entryContent == null && entryEnum.hasMoreElements()) {
                ZipArchiveEntry ze = entryEnum.nextElement();
                String fileName = ze.getName();

                if (fileName.equals(entryName)) {
                    byte[] buf = new byte[4096];
                    ByteArrayOutputStream bos = new ByteArrayOutputStream();

                    int len;
                    try (InputStream zis = zf.getInputStream(ze)) {
                        while ((len = zis.read(buf, 0, buf.length)) != -1) {
                            bos.write(buf, 0, len);
                        }
                    }

                    entryContent = new String(bos.toByteArray(), StandardCharsets.UTF_8);
                    assertNotNull(entryContent);
                    assertTrue(entryContent.length() > 0);
                }
            }
        }

        return entryContent;
    }
Comment 5 Jerry Williamson 2022-02-02 19:21:02 UTC
Created attachment 38183 [details]
Updated unit test with test w/ commons-compress
Comment 6 Jerry Williamson 2022-02-02 19:31:34 UTC
Man, I'm sure i added a comment but i don't see it...

I was able to read the entries using commons-compress and i updated the attached test case to include the change...I suspect it's very close to the change you posted.

I don't think, though, that we'll be able to use the commons-compress approach.
We have a common/shared component that is generating the xlsx files. Those files are being delivered to about 100 different client deployments. The client deployments are using the java zip api to check the zip entries but it's the shared component that we're trying to update to poi 5.x
Comment 7 PJ Fanning 2022-02-02 19:46:41 UTC
The Java Zip code is not as good as commons-compress and has this terrible bug where it can't handle zip64 files. It is not the Apache POI team's problem if you don't want to use commons-compress.

With SXSSFWorkbook, you could use https://poi.apache.org/apidocs/dev/org/apache/poi/xssf/streaming/SXSSFWorkbook.html#setZip64Mode-org.apache.commons.compress.archivers.zip.Zip64Mode-

Zip64Mode.Never would probably ensure the xlsx files are readable by Java's buggy zip code - Zip64Mode.AsNeeded should work too and if it does then this is a better choice than Never.
Comment 8 Jerry Williamson 2022-02-02 19:58:29 UTC
It's not a case that I don't want to use commons-compress...it's simply that we have a large set of deployments, that aren't easily changed, that are unable to consume/process the spreadsheets generated by the SXSSFWorkbook post 5.0.

That being said, I understand your perspective.

I appreciate the pointer for the Zip64 flags and I'll try them out.
Comment 9 Jerry Williamson 2022-02-03 15:32:34 UTC
I've done some testing with various Zip64 flags and I think the Zip64.AsNeeded flag will work for us.

Thanks for the information.