Bug 14734

Summary: DocumentSummaryInformation.getCategory() BUG
Product: POI Reporter: Miroslav Obradovic <micky>
Component: HPSFAssignee: POI Developers List <dev>
Status: CLOSED WORKSFORME    
Severity: normal    
Priority: P3    
Version: 1.5.1   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: here's the java code (word document plain text content extractor) i have developed when i noticed the bug...
this is the POI library i have used for my project when noticed the bug...

Description Miroslav Obradovic 2002-11-21 13:21:41 UTC
Hi,I digged around HPSF and found the following bug. Word 8.0/97 docs 
DocumentSummaryInformation have 2 sections, but getCategory() (Category is 
located within the section with index 0) (implicitly) calls GetSingleSection() 
which throws an exception if sectionCount != 1. Word 6.0/95 has single section 
and this works fine. Here's my solution to the problem until you find a better 
way... Than my class can be simply removed and everything will work ok... After 
putting the code below through this form it may need some beautifying 
(indentation)...

Regards, 
Mickey

<code>

    /**
     * This class is a manual work around HPSF
     * <NOBR>DocumentSummaryInformation.getCategory()</NOBR> bug. This method 
calls 
     * <NOBR>getProperty();</NOBR> which further calls
     * <NOBR>getSingleSection().getProperty();</NOBR>. Now, 
<NOBR>getSingleSection()</NOBR>
     * throws a <I>NoSingleSectionException</I> for <NOBR>Word 8.0/97-
2000</NOBR> documents
     * because these have two sections and only one is expected. Here's the 
stack trace: <BR>
     * <PRE>
     * org.apache.poi.hpsf.NoSingleSectionException: Property set contains 2 
sections.
     *     at org.apache.poi.hpsf.PropertySet.getSingleSection(PropertySet.java)
     *     at org.apache.poi.hpsf.SpecialPropertySet.getSingleSection
(SpecialPropertySet.java)
     *     at org.apache.poi.hpsf.PropertySet.getProperty(PropertySet.java)
     *     at org.apache.poi.hpsf.DocumentSummaryInformation.getCategory
(DocumentSummaryInformation.java)
     * </PRE>
     *
     * @author  Miroslav Obradovic (micky@eunet.yu)
     */
    public class MyDocumentSummaryInformation extends 
DocumentSummaryInformation {
        
        /**
         * Creates a DocumentSummaryInformation from a given PropertySet.
         */
        public MyDocumentSummaryInformation(final PropertySet ps) 
                throws org.apache.poi.hpsf.UnexpectedPropertySetTypeException {
            
            super(ps);
        }
        
        /**
         * Returns the stream's category (or <code>null</code>).
         */
        public String getCategory() {
            
            int pid = 
org.apache.poi.hpsf.wellknown.PropertyIDMap.PID_CATEGORY;  // equals 2
            String category = null;
            
            List sections = getSections();
            int sectionCount = (int) getSectionCount();
            org.apache.poi.hpsf.Section section = null;
            org.apache.poi.hpsf.Property[] properties = null;
            
            // Iterate through sections, get their properties and look for 
Category.
            // Category should be found in the section with index 0.
            for (int i = 0; i < sectionCount; i++) {
                
                try {
                    
                    // Get the current section.
                    section = (org.apache.poi.hpsf.Section) sections.get(i);
                    
                    // Get section properties and look for Category.
                    properties = section.getProperties();
                    for (int j = 0; j < properties.length; j++) {
                        
                        if (properties[j].getID() == pid) {
                            
                            category = (String) properties[j].getValue();
                            break;
                        }
                    }
                    
                    // If Category found, break the loop.
                    if (category != null) {
                        
                        break;
                    }
                    
                } catch (Exception e) {
                    
                    category = null;
                }
            }
            
            return category;
        }
    }
</code>
Comment 1 Rainer Klute 2002-11-21 15:43:52 UTC
Miroslav, can you please attach your Word file to this bug in Bugzilla? Or
better, can you create a minimal Word file which behaves as you described? I
need a test case to verify the bug. Thanks!
Comment 2 Rainer Klute 2003-06-26 09:04:41 UTC
The author of this bug did not provide a test file nor did he respond to any e-mail.
Comment 3 Miroslav Obradovic 2003-06-26 11:54:00 UTC
hi there,

i'm sorry for the delay. i'm not used to using these forums and stuff... i've 
just found a work-around for the problem i once had and thought it would be 
useful if i post it, in case someone else needs it.

it was long ago, but i'll try to find the sample word file.

best regards,
miroslav
Comment 4 Miroslav Obradovic 2003-06-26 13:48:58 UTC
Created attachment 6990 [details]
here's the java code (word document plain text content extractor) i have developed when i noticed the bug...
Comment 5 Miroslav Obradovic 2003-06-26 13:52:35 UTC
Created attachment 6991 [details]
this is the POI library i have used for my project when noticed the bug...
Comment 6 Miroslav Obradovic 2003-06-26 14:07:31 UTC
well, here we are. i have added two attachments and here are a few words about 
these:

the second attachment (POI library) is poi-1.5.1.jar file i used when i 
noticed the bug (or what i think it was a bug). i don't remember the date 
well, but i think it was the latest stable version at the moment i have 
written the code.

the first attachment is a part of the project i have worked on when i noticed 
this bug. it's a content (plain text) extractor for word file format. i don't 
know if you have something similar added to POI, but if you find this code 
useful (there are a lot of comments in there!), you can freely use this code 
(though, it would be nice of you if you'd mention me as a developer 
somewhere, :-) )

the problem is that there are some new Summary Info "pages" added with new 
versions of ms word and i think you have assumed there (in poi) that there is 
only a single one. i guess you could use a solution similar to the one i have 
attached (in MyDocumentSummaryInfo.java), since Micro$oft can add more and 
more of these new "pages" with new releases of office.

i hope this was useful :-)

best regards,
miroslav



Comment 7 Miroslav Obradovic 2003-06-26 14:13:31 UTC
sorry, i forgot to mention.

the sample word file you requested (sample.doc) is included in the first of 
the two attachments.

mickey
Comment 8 Miroslav Obradovic 2003-06-26 14:20:13 UTC
oh, i'm the most boring man today...
i tried to download attachments, but i guess you must know which type of 
binary files is in it to properly download and save the file.

the first attachment should be saved as .zip (created win WinZip 8.1)
the second attachment should be saved as .jar

i hope this is the last one :-)
bye,
m
Comment 9 Rainer Klute 2003-09-20 15:35:22 UTC
The current CVS HEAD can process your sample application without any flaws. I
suggest an upgrade.