Bug 47950

Summary: No case insensitivity handling for OLE2 entry names
Product: POI Reporter: Trejkaz (pen name) <trejkaz>
Component: POIFSAssignee: POI Developers List <dev>
Status: NEW ---    
Severity: normal    
Priority: P2    
Version: 3.5-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: Windows NT   

Description Trejkaz (pen name) 2009-10-06 15:48:49 UTC
I created some test cases to test case sensitivity in OLE2 files.

    @Test
    public void testPoiCaseInsensitivityInMemory() throws Exception
    {
        POIFSFileSystem fs = new POIFSFileSystem();
        DirectoryEntry dir = fs.getRoot().createDirectory("A");
        dir.createDocument("B", new ByteArrayInputStream(new byte[] { 0, 1, 2, 3, 4, 5 }));

        DirectoryEntry dir2 = (DirectoryEntry) fs.getRoot().getEntry("a");
        DocumentEntry doc2 = (DocumentEntry) dir2.getEntry("b");
        assertArrayEquals("Wrong data read back", new byte[] { 0, 1, 2, 3, 4, 5 },
                          IOUtils.toByteArray(new DocumentInputStream(doc2)));
    }
    
    @Test
    public void testPoiCaseInsensitivityAfterReadingFromStorage() throws Exception
    {
        POIFSFileSystem fs = new POIFSFileSystem();
        DirectoryEntry dir = fs.getRoot().createDirectory("A");
        dir.createDocument("B", new ByteArrayInputStream(new byte[] { 0, 1, 2, 3, 4, 5 }));

        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        fs.writeFilesystem(baos);

        POIFSFileSystem fs2 = new POIFSFileSystem(new ByteArrayInputStream(baos.toByteArray()));
        DirectoryEntry dir2 = (DirectoryEntry) fs2.getRoot().getEntry("a");
        DocumentEntry doc2 = (DocumentEntry) dir2.getEntry("b");
        assertArrayEquals("Wrong data read back", new byte[] { 0, 1, 2, 3, 4, 5 },
                          IOUtils.toByteArray(new DocumentInputStream(doc2)));
    }

Both of these fail looking up "a" as it doesn't exist, but the comparison is supposed to be case insensitive according to available documentation.

Specifically, [MS-CFB] has the following to say about how entries in an OLE2 directory should be compared:

(2.6.1 pg 23)

When locating an object in the compound file except for the root storage, the directory entry name is compared using a special case-insensitive upper-case mapping, described in Red-Black Tree.

(2.6.4 "Red-Black Tree" pg 26)

  * For each UTF-16 code point, convert to upper-case with the Unicode Default Case Conversion
    Algorithm, simple case conversion variant (simple case foldings), with the following notes.<2> 

  *  Unicode surrogate characters are never upper-cased, since they are represented by two UTF-16
     code points, while the sorting relationship upper-cases a single UTF-16 code point at a time.

  * Lowercase characters defined in a newer, later version of the Unicode standard can be upper-
    cased by an implementation that conforms to that later Unicode standard.

Note <2> goes into further detail on which version of Unicode is used to perform the folding:

(pg 39)

For Windows XP and Windows Server 2003: The compound file implementation conforms to the Unicode 3.0.1 Default Case Conversion Algorithm, simple case folding (http://www.unicode.org/Public/3.1-Update1/CaseFolding-4.txt) with the following exceptions.
(table omitted for now)
For Windows Vista and Windows Server 2008: The compound files implementation conforms to the Unicode 5.0 Default Case Conversion Algorithm, simple case folding (http://www.unicode.org/Public/5.0.0/ucd/CaseFolding.txt) with the following exceptions.
(table omitted for now)


References:

[MS-CFB]: Compound File Binary File Format, Revision 0.01 (Wednesday, June 18, 2008)