Hi,I digged around HPSF and found the following bug. Word 8.0/97 docs DocumentSummaryInformation have 2 sections, but getCategory() (Category is located within the section with index 0) (implicitly) calls GetSingleSection() which throws an exception if sectionCount != 1. Word 6.0/95 has single section and this works fine. Here's my solution to the problem until you find a better way... Than my class can be simply removed and everything will work ok... After putting the code below through this form it may need some beautifying (indentation)... Regards, Mickey <code> /** * This class is a manual work around HPSF * <NOBR>DocumentSummaryInformation.getCategory()</NOBR> bug. This method calls * <NOBR>getProperty();</NOBR> which further calls * <NOBR>getSingleSection().getProperty();</NOBR>. Now, <NOBR>getSingleSection()</NOBR> * throws a <I>NoSingleSectionException</I> for <NOBR>Word 8.0/97- 2000</NOBR> documents * because these have two sections and only one is expected. Here's the stack trace: <BR> * <PRE> * org.apache.poi.hpsf.NoSingleSectionException: Property set contains 2 sections. * at org.apache.poi.hpsf.PropertySet.getSingleSection(PropertySet.java) * at org.apache.poi.hpsf.SpecialPropertySet.getSingleSection (SpecialPropertySet.java) * at org.apache.poi.hpsf.PropertySet.getProperty(PropertySet.java) * at org.apache.poi.hpsf.DocumentSummaryInformation.getCategory (DocumentSummaryInformation.java) * </PRE> * * @author Miroslav Obradovic (micky@eunet.yu) */ public class MyDocumentSummaryInformation extends DocumentSummaryInformation { /** * Creates a DocumentSummaryInformation from a given PropertySet. */ public MyDocumentSummaryInformation(final PropertySet ps) throws org.apache.poi.hpsf.UnexpectedPropertySetTypeException { super(ps); } /** * Returns the stream's category (or <code>null</code>). */ public String getCategory() { int pid = org.apache.poi.hpsf.wellknown.PropertyIDMap.PID_CATEGORY; // equals 2 String category = null; List sections = getSections(); int sectionCount = (int) getSectionCount(); org.apache.poi.hpsf.Section section = null; org.apache.poi.hpsf.Property[] properties = null; // Iterate through sections, get their properties and look for Category. // Category should be found in the section with index 0. for (int i = 0; i < sectionCount; i++) { try { // Get the current section. section = (org.apache.poi.hpsf.Section) sections.get(i); // Get section properties and look for Category. properties = section.getProperties(); for (int j = 0; j < properties.length; j++) { if (properties[j].getID() == pid) { category = (String) properties[j].getValue(); break; } } // If Category found, break the loop. if (category != null) { break; } } catch (Exception e) { category = null; } } return category; } } </code>
Miroslav, can you please attach your Word file to this bug in Bugzilla? Or better, can you create a minimal Word file which behaves as you described? I need a test case to verify the bug. Thanks!
The author of this bug did not provide a test file nor did he respond to any e-mail.
hi there, i'm sorry for the delay. i'm not used to using these forums and stuff... i've just found a work-around for the problem i once had and thought it would be useful if i post it, in case someone else needs it. it was long ago, but i'll try to find the sample word file. best regards, miroslav
Created attachment 6990 [details] here's the java code (word document plain text content extractor) i have developed when i noticed the bug...
Created attachment 6991 [details] this is the POI library i have used for my project when noticed the bug...
well, here we are. i have added two attachments and here are a few words about these: the second attachment (POI library) is poi-1.5.1.jar file i used when i noticed the bug (or what i think it was a bug). i don't remember the date well, but i think it was the latest stable version at the moment i have written the code. the first attachment is a part of the project i have worked on when i noticed this bug. it's a content (plain text) extractor for word file format. i don't know if you have something similar added to POI, but if you find this code useful (there are a lot of comments in there!), you can freely use this code (though, it would be nice of you if you'd mention me as a developer somewhere, :-) ) the problem is that there are some new Summary Info "pages" added with new versions of ms word and i think you have assumed there (in poi) that there is only a single one. i guess you could use a solution similar to the one i have attached (in MyDocumentSummaryInfo.java), since Micro$oft can add more and more of these new "pages" with new releases of office. i hope this was useful :-) best regards, miroslav
sorry, i forgot to mention. the sample word file you requested (sample.doc) is included in the first of the two attachments. mickey
oh, i'm the most boring man today... i tried to download attachments, but i guess you must know which type of binary files is in it to properly download and save the file. the first attachment should be saved as .zip (created win WinZip 8.1) the second attachment should be saved as .jar i hope this is the last one :-) bye, m
The current CVS HEAD can process your sample application without any flaws. I suggest an upgrade.