Thanks to PJ and Andi and the POI team, we've integrated 5.2.0 into Apache Tika, and we're about to make a release of 2.3.0. I noticed a bunch of logged warnings: "The close() method is intended to SAVE a package. This package is open in READ ONLY mode, use the revert() method instead!" We've wrapped our use of OPCPackage so that we're calling revert() everywhere. However, in our zip detector, when trying to figure out if a zip is an OPCPackage, we call OPCPackage.open(ZipEntrySource zipEntry), and if there's an exception, we swallow it and figure that zip file is not an OPCPackage. The problem is that at the POI level, IOUtils.closeQuietly() is called on the pack after it is created in the catch block. Is there a cleaner way with POI to determine if a zip file is an OPCPackage? Should we be calling revert() instead at the POI level to avoid this warning? Code: public static OPCPackage open(ZipEntrySource zipEntry) throws InvalidFormatException { ZipPackage pack = new ZipPackage(zipEntry, PackageAccess.READ); try { if (pack.partList == null) { pack.getParts(); } return pack; } catch (RuntimeException | InvalidFormatException var3) { IOUtils.closeQuietly(pack); throw var3; } }
I changed the logging in r1897562. I don't think this warrants a warning log.
Not sure if using "org.apache.poi.poifs.filesystem.FileMagic" would help you to determine the file-type before trying to do actual parsing? It looks at the first few bytes to quickly see if it looks like a zip-file, binary-OLE2 or some other older formats. In general I would use "revert()" at these points when opening the file fails and we are opening it read-only. I have the relevant changes locally, can add them to this bug later.
Applied some changes via r1899071 which uses revert() instead of close() whenever the package was opened read-only.