Created attachment 37268 [details] corrupted file Contents of the unpacked xlsx file and apache poi The initial problem is the inability to open the xlsx file through poi (OPCPackage.open(fileName, PackageAccess.READ)), while in Excel it opens. A detailed study of the poi showed that the problem lies in the contents of the xlsx file. If you unzip xslx file, then in the xl folder, in addition to all other files there will be two due to which there is a problem xl/metadata xl/metadata.xml when using poi method OPCPackage.open(fileName, PackageAccess.READ) this leads to an error: org.apache.poi.openxml4j.exceptions.InvalidFormatException: You can't add a part with a part name derived from another part ! [M1.11] which occurs due to the same file names in PackagePartCollection.put method. If I just copy the contents of the entire xlsx file to a new created xlsx file and save it, then the xl/metadata file will not be there and it will open through poi well. But I don’t have the task of just fixing the file, I need to figure out why this problem could arise. it looks like a slightly incorrect xlsx, but I can still open it through exel, is there any way to open it through poi? Is there any idea about the occurrence of xl/metadata in the contents of the xlsx?
Also pay attention to the documentation. I found only the draft version, but I think that the difference there is not big. https://www.ecma-international.org/activities/Office%20Open%20XML%20Formats/Draft%20ECMA-376%203rd%20edition,%20March%202011/Office%20Open%20XML%20Part%202%20-%20Open%20Packaging%20Conventions.pdf item 9.1.1.4 Part Naming A package implementer shall neither create nor recognize a part with apart name derived from another part name by appending segments to it. [M1.11][Example:If a package contains a part named“/segment1/segment2/.../segmentn”, then other parts in that packageshall not have names such as: “/segment1”, “segment1/segment2”, or “/segment1/segment2/.../segmentn-1”. endexample] But also look at the item: 9.1.1 Part Names Each part has a name. Part namesrefer to parts within a package. [Example:The part name “/hello/world/doc.xml” contains three segments: “hello”, “world”, and “doc.xml”.The first two segments in the sample represent levelsin the logical hierarchy and serve to organize the parts of the package, whereas the ECMA-376 Part214third contains actual content.Note that segments are not explicitly representedas foldersin the package model, and no directory of folders exists in the package model.end example] In this example, “doc.xml” the name of this file is considered along with the extension, whereas in the POI in the class PackagePartCollection in method PackagePart put (final PackagePartName partName, final PackagePart part) Comparison is made only by file names, not considering their extension, which is possibly a mistake.
It's possible we'll change POI code but the next release could be weeks away. It's worth investigating where your xlsx file came from to find out why its contents are not standard.
Created attachment 37929 [details] Zip file with files to reproduce the bug We also have the same issue. I tried to find the steps on how we can create such files that apache poi can't read. Pre-requisites: Excel from MS Office 365 files 1.xlsx and 2.xlsx (you can find them in the attached zip file). 1.xlsx contains "xl/metadata" and 2.xlsx contains "xl/metadata.xml" Steps: 1. Open 1.xlsx in Excel 2. Open 2.xlsx in Excel 3. Right click on the worksheet tab and select Move or Copy. 4. Select the 1.xlsx option at the To Book drop-down list. 5. Press OK. 6. Save 1.xlsx. After save you will have 1.xlsx which contains both xl/metadata and xl/metadata.xml You can find result of the steps above in the folder "result-of-merge" in the same attached zip file. This file can't be read by POI but can be opened in Excel.
The issue appeared after https://bz.apache.org/bugzilla/show_bug.cgi?id=61942 ticket in revision 1819708. I think the dot symbol in the regexp is unnecessary in the line : "(?=["+PackagingURIHelper.FORWARD_SLASH_STRING+".])"; ^ this See https://svn.apache.org/viewvc/poi/trunk/poi-ooxml/src/main/java/org/apache/poi/openxml4j/opc/PackagePartCollection.java?revision=1819708&view=markup#l64
Created attachment 37964 [details] [PATCH] for fixing the issue created by the following command: ant -f patch.xml
Thanks Yury - merged with r1891692
I see the same error occur with poi 5.1.0 and poi-ooxml 5.1.0. The xlsx file I am trying to open indeed contains both metadata and metadata.xml. Is there any way I can help troubleshoot this?
Hi Simone - we need a reproducible test case to debug this or you can try debugging yourself. Can you open a new issue? We fixed Yury's problem with this issue - so it is best to track any similar issues with a new bugzilla issue.
Giving us a file that reproduces the issue would be the main step towards debugging the problem.