Bug 49609

Summary: Part name comparison in ZipPackage should be case-insensitive
Product: POI Reporter: Ed Beaty <ebeaty>
Component: XSSFAssignee: POI Developers List <dev>
Severity: normal    
Priority: P2    
Version: 3.7-dev   
Target Milestone: ---   
Hardware: Macintosh   
OS: All   
Attachments: XLSX file that can't be opened with POI 3.7-dev

Description Ed Beaty 2010-07-16 18:53:24 UTC
Created attachment 25777 [details]
XLSX file that can't be opened with POI 3.7-dev

Attempting to open the attached file fails with the following error:

Exception in thread "main" org.apache.poi.openxml4j.exceptions.InvalidFormatException: Package should contain a content type part [M1.13]
	at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:147)
	at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:588)
	at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:222)
	at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:63)

The file opens normally in Microsoft Excel 2008 for Mac.  If the file is saved as a Microsoft 97 file, POI opens it normally.

The file is from a non-standard source (a scientific instrument that saves its output as an .xlsx file).  The file contains a part name "[content_types].xml" (lower case), but the ZipPackage.getPartsImpl method expects the name "[Content_Types].xml" (mixed case).

The exception is caused by a call to entry.getName().equals(ContentTypeManager.CONTENT_TYPES_PART_NAME).  According to the Open Packaging Convention, "Part name equivalence is determined by comparing part names as case-insensitive ASCII strings."

The bug also occurs in Windows XP.
Comment 1 Yegor Kozlov 2010-07-18 12:14:22 UTC
Fixed in r965258

There were two problems with the attached file:

1. [content_types].xml vs [Content_Types].xml 

You are correct, the comparison of part names should be case-insensitive. 

2. The file appears to use backslashes as path separators. 

The OPC spec tolerates backslashes in part names, see Annex A.3. I fixed POI to do the same.