|Summary:||Non-MS Office Docs with Valid Header Signature|
|Product:||POI||Reporter:||Jacob Zwiers <apache_bugzilla>|
|Component:||HPSF||Assignee:||POI Developers List <dev>|
An offending Corel Presentation File
Source Code to demonstrate ClassCastException on SummaryInformation.getWordCount() for .shw files
Description Jacob Zwiers 2003-07-21 17:02:59 UTC
The Corel Presentation software (at least versions 8 and 9) fake the recognized header signature that POIFS recognizes (0xE11AB1A1E011CFD0L) and the properties are then read from this file which causes a problem. The net result is that SummaryInformation.getWordCount() throws a ClassCastException in this situation. I will attach (if possible on a later screen) a small test class and .shw file for demonstration purposes.
Comment 1 Jacob Zwiers 2003-07-21 17:04:14 UTC
Created attachment 7425 [details] An offending Corel Presentation File
Comment 2 Jacob Zwiers 2003-07-21 17:08:24 UTC
Created attachment 7426 [details] Source Code to demonstrate ClassCastException on SummaryInformation.getWordCount() for .shw files
Comment 3 Andy Oliver 2003-07-21 17:25:04 UTC
Just because it doesn't have SummaryInformation or does SummaryInformation wrong, that doesn't mean POIFS is wrong to recognize it as an OLE 2 Compound Document format file. Is this a problem just with HPSF not recognizing the SummaryInformation stream isn't what it thinks it is?
Comment 4 Jacob Zwiers 2003-07-21 18:14:34 UTC
Here's what I've figured out. I'll let you decide which it is. If I spin through all the properties that I get back, the following is returned for the word count (ID#15 according to the PropertyIDMap.PID_WORDCOUNT constant); element 13 in the array of properties). DocumentPropertyReader - props.getID() = 15 DocumentPropertyReader - props.getType() = 0 DocumentPropertyReader - props.getValue() = [B@9eca9c26 DocumentPropertyReader - props.getValue().getClass() = class [B The type that's created here (in the default of the switch in the org.apache.poi.hpsf.Property constructor based on the return of org.apache.poi.hpsf.littleendian.DWord.intValue()) means that value was not recognized as a Variant.VT_I4. Instead, it's a zero == VT_EMPTY. This gets put into the properties as a byte array which causes the ClassCastException when calling getWordCount(). I'm not sure (not knowing the nuts and bolts) if the Corel doc is actually behaving properly as an OLE2 doc. If it isn't, I guess the problem is that POIFS thinks it is. If it is, the the problem is either that the VT_EMPTY doesn't get treated as a null OR that the getWordCount() doesn't propertly take this into account.
Comment 5 Andy Oliver 2003-07-24 17:24:02 UTC
POIFS is right, HPSF is wrong.
Comment 6 Rainer Klute 2003-07-25 09:30:24 UTC
HPSF does not yet support VT_EMPTY. A proper implementation would be to return null. See http://msdn.microsoft.com/library/default.asp?url=/library/en-us/automat/htm/chap6_7zdz.asp for an explanation of the variant types. I'll prepare a patch.
Comment 7 Rainer Klute 2003-07-26 21:52:57 UTC
HPSF is now able to read properties which are given in the property set stream but which don't have a value. The type of such properties is VT_EMPTY. PropertySet's getXXX methods return either a null or a 0 whichever is appropriate. Details about return types can be found in the API documentation.