I read an XLSX file this way: workBookPackage = OPCPackage.open(path.toFile(), PackageAccess.READ); workBook = new XSSFWorkbook(workBookPackage); I'm using this method, as the documentation says that "Creating a XSSFWorkbook from a file-backed OPC Package has a lower memoryfootprint than an InputStream backed one." Now my application got interrupted, and this resulted in a zero byte XLSX file. Unfortunately, I don't have the complete stack trace anymore, but I got a `java.nio.channels.ClosedByInterruptException` from FileHelper.copyFile(). Looking at ZipPackage.closeImpl(), it looks like it *always*, and unconditionally clobbers the original file, even if I had used PackageAccess.READ to open the package. The other issue is, that closeImpl() does not even try to use an atomic move to make replacing the original file saver.
I accidentally copied the wrong sources; when the problem appeared, I did not specify PackageAccess.READ in OPCPackage.open().
can you provide a reproducible test case?
Sorry, if I had the time for that, I would have delivered a patch already ;-) But I've read into the sources a bit more; turns out that only OPCPackage.close() calls into ZipPackage.closeImpl(), and that is *not* called in case of PackageAccess.READ. So that part is a wrong assumption of mine. I should have called OPCPackage.revert() instead of close() -- not the most intuitive API choice, but my fault in the end. The only remaining issue is that the file is lost if ZipPackage.closeImpl() is aborted at the wrong time. It really should, if possible, use an atomic move to replace the original file instead.
What makes this bug even worse is that the temp file is deleted in any case; ZipPackage.closeImpl() deletes the temp file in a finally block. Even if an atomic move is not possible (if the temp dir is on a different file system, for example), you can still make it more secure by renaming the original file first, and only delete that in case everything worked out.
Only my misuse of the API depends on #59287 -- the actual issue does not, however. Even when using the API correctly, the file may end up being corrupted at the end.
a unit test to reproduce corruption : try(OPCPackage pkg = OPCPackage.open(path.toFile(), PackageAccess.READ_WRITE)) { // add a marshaller that will throw an exception on save pkg.addMarshaller("poi/junit", (part, out) -> { throw new RuntimeException("Bugzilla 63029"); }); pkg.createPart(PackagingURIHelper.createPartName("/poi/test.xml"), "poi/junit"); } catch (RuntimeException e){ assert("Bugzilla 63029", e.getMessage()); } // try to read the source file once again try ( ZipFile zip = new ZipFile(path.toFile())){ // throws java.util.zip.ZipException: archive is not a ZIP archive } an exception while saving *may* result in a clobbered file. The size of the corrupted data depends on how much was saved and flushed on disk: it can be anywhere from zero-byte to the length of the original file. a simple change to avoid corruption would be to replace the original file only if save() succeeded. Something like this: boolean success = false; try { save(tempFile); success = true; } finally { // Close the current zip file, so we can overwrite it on all platforms IOUtils.closeQuietly(this.zipArchive); try { // Copy the new file over the old one if(success) { FileHelper.copyFile(tempFile, targetFile); } } finally { // Either the save operation succeed or not, we delete the temporary file if (!tempFile.delete()) { LOG.log(POILogger.WARN, "The temporary file: '" + targetFile.getAbsolutePath() + "' cannot be deleted ! Make sure that no other application use it."); } } } this would leave the origin intact in case of an exception.
Fixed in r1853454, a unit test added. Yegor