Bug 65854

Summary: OPCPackage should revert() instead of close() on failed open in some cases
Product: POI Reporter: Tim Allison <tallison>
Component: OPCAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: trivial    
Priority: P2    
Version: 5.2.0-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: Linux   

Description Tim Allison 2022-01-27 18:03:08 UTC
Thanks to PJ and Andi and the POI team, we've integrated 5.2.0 into Apache Tika, and we're about to make a release of 2.3.0.

I noticed a bunch of logged warnings: "The close() method is intended to SAVE a package. This package is open in READ ONLY mode, use the revert() method instead!"

We've wrapped our use of OPCPackage so that we're calling revert() everywhere.  

However, in our zip detector, when trying to figure out if a zip is an OPCPackage, we call OPCPackage.open(ZipEntrySource zipEntry), and if there's an exception, we swallow it and figure that zip file is not an OPCPackage.

The problem is that at the POI level, IOUtils.closeQuietly() is called on the pack after it is created in the catch block. 


Is there a cleaner way with POI to determine if a zip file is an OPCPackage?


Should we be calling revert() instead at the POI level to avoid this warning?

Code:

    public static OPCPackage open(ZipEntrySource zipEntry) throws InvalidFormatException {
        ZipPackage pack = new ZipPackage(zipEntry, PackageAccess.READ);

        try {
            if (pack.partList == null) {
                pack.getParts();
            }

            return pack;
        } catch (RuntimeException | InvalidFormatException var3) {
            IOUtils.closeQuietly(pack);
            throw var3;
        }
    }
Comment 1 PJ Fanning 2022-01-28 10:43:18 UTC
I changed the logging in r1897562. I don't think this warrants a warning log.
Comment 2 Dominik Stadler 2022-01-30 17:22:06 UTC
Not sure if using "org.apache.poi.poifs.filesystem.FileMagic" would help you to determine the file-type before trying to do actual parsing? It looks at the first few bytes to quickly see if it looks like a zip-file, binary-OLE2 or some other older formats.

In general I would use "revert()" at these points when opening the file fails and we are opening it read-only. I have the relevant changes locally, can add them to this bug later.
Comment 3 Dominik Stadler 2022-03-20 07:47:03 UTC
Applied some changes via r1899071 which uses revert() instead of close() whenever the package was opened read-only.