Bug 49609 - Part name comparison in ZipPackage should be case-insensitive
Summary: Part name comparison in ZipPackage should be case-insensitive
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.7-dev
Hardware: Macintosh All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2010-07-16 18:53 UTC by Ed Beaty
Modified: 2010-07-18 12:14 UTC (History)
0 users

XLSX file that can't be opened with POI 3.7-dev (102.95 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2010-07-16 18:53 UTC, Ed Beaty

Note You need to log in before you can comment on or make changes to this bug.
Description Ed Beaty 2010-07-16 18:53:24 UTC
Created attachment 25777 [details]
XLSX file that can't be opened with POI 3.7-dev

Attempting to open the attached file fails with the following error:

Exception in thread "main" org.apache.poi.openxml4j.exceptions.InvalidFormatException: Package should contain a content type part [M1.13]
	at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:147)
	at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:588)
	at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:222)
	at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:63)

The file opens normally in Microsoft Excel 2008 for Mac.  If the file is saved as a Microsoft 97 file, POI opens it normally.

The file is from a non-standard source (a scientific instrument that saves its output as an .xlsx file).  The file contains a part name "[content_types].xml" (lower case), but the ZipPackage.getPartsImpl method expects the name "[Content_Types].xml" (mixed case).

The exception is caused by a call to entry.getName().equals(ContentTypeManager.CONTENT_TYPES_PART_NAME).  According to the Open Packaging Convention, "Part name equivalence is determined by comparing part names as case-insensitive ASCII strings."

The bug also occurs in Windows XP.
Comment 1 Yegor Kozlov 2010-07-18 12:14:22 UTC
Fixed in r965258

There were two problems with the attached file:

1. [content_types].xml vs [Content_Types].xml 

You are correct, the comparison of part names should be case-insensitive. 

2. The file appears to use backslashes as path separators. 

The OPC spec tolerates backslashes in part names, see Annex A.3. I fixed POI to do the same.