Bug 55748 - Exception with message:Duplicate name " DocumentSummaryInformation" when reading xls File
Summary: Exception with message:Duplicate name " DocumentSummaryInformation" when read...
Status: RESOLVED WONTFIX
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-06 09:38 UTC by kenny.chang
Modified: 2013-12-26 09:09 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description kenny.chang 2013-11-06 09:38:55 UTC
Exception Type: IOException.
Exception Message: Duplicate name "DocumentSummaryInformation"
Detail:
    When reading an excel file that was tranfered from a oversea teammate, I found the exception happened and just could not go further even debugging with POI Source code (version 3.9 HSSF).
    And there is a  very surprising thing: If I opened the xls file and did some operation like "Pressing Ctrl + S" on it or saved it as another xls file, It could be read then.
    At first I thought It might be file encoding prolbem. but after I query the encoding with cpdetector_1.0.0.jar, the encoding charset of the 2 files are the same ,both "UTF-16BE".
    Then I debugged with the source code and found something interesting:
    1) running to the method: convertToProperties(ListManagedBlock [] blocks) at class "PropertyFactory.java" ,the paramter "ListManagedBlock [] blocks" with 2 blocks in the list , was different between these 2 cases: reading orginal file and reading another "save as" file.
    2) Method: addChild(final Property property) throwed the exception:
      if (_children_names.contains(name))
        {
            throw new IOException("Duplicate name \"" + name + "\"");
        }
    3) in the 2 blocks "ListManagedBlock [] blocks": 
       When reading OK,
     the first block read: Root Entry  Workbook  SummaryInformation 	DocumentSummaryInformation 
     the second block read: CompObj  IrmToolInfoStream IrmToolSLevelInfo.


     When reading with exception:
     the first block read: Root Entry  Workbook  SummaryInformation 	DocumentSummaryInformation 
     the second block read: DocumentSummaryInformation.
      
     It read "DocumentSummaryInformation" again and the code check in the "Set" contains the object and throw the exception.

The following is the StackTrace:
java.io.IOException: Duplicate name "DocumentSummaryInformation"
	at org.apache.poi.poifs.property.DirectoryProperty.addChild(DirectoryProperty.java:266)
	at org.apache.poi.poifs.property.PropertyTableBase.populatePropertyTree(PropertyTableBase.java:115)
	at org.apache.poi.poifs.property.PropertyTableBase.<init>(PropertyTableBase.java:63)
	at org.apache.poi.poifs.property.PropertyTable.<init>(PropertyTable.java:63)
	at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:159)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:322)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:303)
	at org.apache.poi.hssf.usermodel.examples.HSSFReadWrite.readFile(HSSFReadWrite.java:51)
	at org.apache.poi.hssf.usermodel.examples.HSSFReadWrite.main(HSSFReadWrite.java:163)


Sorry to tell you i can't upload the file that can't be read.But I think you enginners may know that kind of exception and give me a quick reponse. 

Thank you very much. 

    Kenny.
Comment 1 Nick Burch 2013-11-06 13:07:01 UTC
Try with NPOIFSFileSystem instead of POIFSFileSystem, just in case it's something odd

Otherwise, without the file, there's not much that we can do. Most likely though, it's just an invalid file - entries within the POIFS (ole2) structure must have unique names. From the exception, it looks like you've somehow ended up with two entries with the same name, which isn't allowed
Comment 2 kenny.chang 2013-11-08 02:04:19 UTC
Same Exception after substitued with NPOIFSFileSystem.
When tracing with POI Source code , NPOIFSFileSystem and POIFSFileSystem Both  run into the same method when parsing the head_block (entry of excel). 

Any other suggestion?

What is more, between the following Two ways of reading the excel file:
1) directly read: No change on the excel file, report exception :Duplicate name "DocumentSummaryInformation";
2) read the excel file after save it as another one:  ALL OK.

Could you share me some experience about the reason why this strange thing happened?
May be I can have some idea for solution by your saying.


Thank you.
Comment 3 Nick Burch 2013-11-11 23:11:50 UTC
Without the file, I can't be sure, but my hunch is that your file didn't come from Excel but came from something that doesn't follow the spec properly

As far as I can tell, your file contains two OLE2 streams with exactly the same entry name. I don't know which one Excel is picking (if any) when you do the open and re-save, but it is clearly throwing one away to fix up your file. 

You should probably find out where the file came from, and see if you can talk to whoever wrote that software to not duplicate the properties. In the mean time, I'm very reluctant to remove the duplicate check as it'll catch other problems, especially if I don't know where your broken file came from, and without having that file (or one like it) to use in unit testing
Comment 4 Dominik Stadler 2013-12-26 09:09:40 UTC
For now I think this will not be fixed as the check is on-purpose unless we know where the invalid file comes from. Please reopen if you can provide more information.