Created attachment 36225 [details] Sampel for java.io.IOException: This archive contains unclosed entries The new 4.0 seems to have a problem when writing large XLSX files (e.g. >100k rows but less than 1 Mill. rows) using SXSSFWorkbook: java.io.IOException: This archive contains unclosed entries. at org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream.finish(ZipArchiveOutputStream.java:467) at org.apache.poi.xssf.streaming.SXSSFWorkbook.injectData(SXSSFWorkbook.java:406) at org.apache.poi.xssf.streaming.SXSSFWorkbook.write(SXSSFWorkbook.java:936) Please note, small files (e.g. less than 100k rows seem to work fine) and also everything works fine with 3.18 (same code, same data). The attached sample reproduces the error.
Created attachment 36226 [details] Test Case
Based on discussion on user-list, this worked in 3.17.
Reproducible with 600_000 rows, not reproducible with 400_000 rows. Temporary file is aprox. 200MB, so not a 2GB issue When the exception is thrown in ZipArchiveOutputStream, the variables have the following contents: this.finished = false this.entry = {ZipArchiveOutputStream$CurrentEntry@2501} entry = {ZipArchiveEntry@2523} "xl/worksheets/sheet1.xml" method = 8 size = 4497373312 internalAttributes = 0 versionRequired = 0 versionMadeBy = 0 platform = 0 rawFlag = 0 externalAttributes = 0 alignment = 0 extraFields = null unparseableExtra = null name = "xl/worksheets/sheet1.xml" rawName = null gpb = {GeneralPurposeBit@2525} localHeaderOffset = -1 dataOffset = -1 isStreamContiguous = false nameSource = {ZipArchiveEntry$NameSource@2526} "NAME" commentSource = {ZipArchiveEntry$CommentSource@2527} "COMMENT" ZipEntry.name = "xl/worksheets/sheet1.xml" xdostime = 276176132385 mtime = null atime = null ctime = null crc = 2326399640 ZipEntry.size = -1 csize = 228388641 ZipEntry.method = -1 flag = 0 extra = null comment = null localDataStart = 2340 dataStart = 2380 bytesRead = 4497373312 causedUseOfZip64 = false hasWritten = false
My guess is, this is a shaded exception in SXSSFWorkbook.injectData. If you add some checkpoint variables in, it will enter the "finally" for "zos.closeArchiveEntry()", but actually doesn't finish it.
It is a shaded exception complaining about the 4gb limit. When using "zos.setUseZip64(Zip64Mode.Always)" the test runs through successfully, but the result can't be opened in Libre Office I'm now checking if the xml differs on 3.17 vs 4.0.0 or if it's caused by the 64bit zip stream.
Patched via r1845629 The change now implicitly sets the stream entry to 64bit based on the given filesize, the others stream entries stay in normal mode (... I guess ...) I've compared the sheet content in 3.17 vs trunk, but there wasn't any differences. As this still produces files which can't be opened in Libre/MS Office, I leave this issue open.
Created attachment 36258 [details] modified test I forgot to mention, that also with POI 3.17 I couldn't produce a file which could be opened by Libre Office ... I haven't tried with MS Excel. This also due the double "i++" in the test, which creates more than the 1048576 logical rows with a ROW_COUNT of 600000. Please run your fixed test again, with POI 3.17 vs. the current trunk. For comparison of the unzipped xmls, you might want to use the same test data. I've attached my modified version which inserts reproducible timestamps.
LibreOffice does not seem to support ZIP64. I have created 2 small files with the same content, but only one written as ZIP64. Gnumeric was able to open both files, LibreOffice fails with the ZIP64. I have refreshed LibreOffice bug 82984 accordingly (https://bugs.documentfoundation.org/show_bug.cgi?id=82984#c10). Setting Zip64 to Always for archives exceeding 4GB and/or 65536 entries would be the correct solution. When doing so, a warning related to the use of LibreOffice would be great.
Created attachment 36286 [details] AS NEEDED
Created attachment 36287 [details] ALWAYS
I added a method to SXSSFWorkbook so that you can set the Zip64Mode - https://svn.apache.org/viewvc?view=revision&revision=1848179
I think we provide as much as we can here by allowing the user to select Zip64Mode if necessary. Unfortunately not all tools seem to be able to handle such files, so we cannot use this as default right now. We may be able to switch it in the future when all the major tools have enabled support. Please reopen if there is still something that can be done in Apache POI here.
Dominik - Zip64Mode.Always is the default in SXSSFWorkbook as of POI 5.0.0
Ah, thanks for the note. But files created via SXSSFWorkbook seem to still work in LibreOffice even with this, so what was making them corrupt here?
This is an old issue - not really sure what was wrong over 2 years ago.