Summary: | Non-MS Office Docs with Valid Header Signature | ||
---|---|---|---|
Product: | POI | Reporter: | Jacob Zwiers <apache_bugzilla> |
Component: | HPSF | Assignee: | POI Developers List <dev> |
Status: | CLOSED FIXED | ||
Severity: | normal | ||
Priority: | P3 | ||
Version: | 3.0-dev | ||
Target Milestone: | --- | ||
Hardware: | All | ||
OS: | other | ||
Attachments: |
An offending Corel Presentation File
Source Code to demonstrate ClassCastException on SummaryInformation.getWordCount() for .shw files |
Description
Jacob Zwiers
2003-07-21 17:02:59 UTC
Created attachment 7425 [details]
An offending Corel Presentation File
Created attachment 7426 [details]
Source Code to demonstrate ClassCastException on SummaryInformation.getWordCount() for .shw files
Just because it doesn't have SummaryInformation or does SummaryInformation wrong, that doesn't mean POIFS is wrong to recognize it as an OLE 2 Compound Document format file. Is this a problem just with HPSF not recognizing the SummaryInformation stream isn't what it thinks it is? Here's what I've figured out. I'll let you decide which it is. If I spin through all the properties that I get back, the following is returned for the word count (ID#15 according to the PropertyIDMap.PID_WORDCOUNT constant); element 13 in the array of properties). DocumentPropertyReader - props[13].getID() = 15 DocumentPropertyReader - props[13].getType() = 0 DocumentPropertyReader - props[13].getValue() = [B@9eca9c26 DocumentPropertyReader - props[13].getValue().getClass() = class [B The type that's created here (in the default of the switch in the org.apache.poi.hpsf.Property constructor based on the return of org.apache.poi.hpsf.littleendian.DWord.intValue()) means that value was not recognized as a Variant.VT_I4. Instead, it's a zero == VT_EMPTY. This gets put into the properties as a byte array which causes the ClassCastException when calling getWordCount(). I'm not sure (not knowing the nuts and bolts) if the Corel doc is actually behaving properly as an OLE2 doc. If it isn't, I guess the problem is that POIFS thinks it is. If it is, the the problem is either that the VT_EMPTY doesn't get treated as a null OR that the getWordCount() doesn't propertly take this into account. POIFS is right, HPSF is wrong. HPSF does not yet support VT_EMPTY. A proper implementation would be to return null. See http://msdn.microsoft.com/library/default.asp?url=/library/en-us/automat/htm/chap6_7zdz.asp for an explanation of the variant types. I'll prepare a patch. HPSF is now able to read properties which are given in the property set stream but which don't have a value. The type of such properties is VT_EMPTY. PropertySet's getXXX methods return either a null or a 0 whichever is appropriate. Details about return types can be found in the API documentation. |