Bug 46287

Summary: ExcelExtractor always uses header & footer
Product: POI Reporter: Axel Rose <axel.roeslein>
Component: HSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: enhancement    
Priority: P2    
Version: 3.5-dev   
Target Milestone: ---   
Hardware: All   
OS: All   

Description Axel Rose 2008-11-24 23:34:29 UTC
When extracting the pure text with org.apache.poi.hssf.extractor.ExcelExtractor
the header and footer of sheets are unconditionally extracted.

Most often those contain something like "&page", which isn't resolved.

I'd prefer to not extract header & footer.

It would be ideal if there were a method setHeaderFooterExtraction() to
influence the behaviour.

Thanks,
Axel.
Comment 1 Axel Rose 2008-12-02 00:20:39 UTC
Patch suggestion:
--- src/java/org/apache/poi/hssf/extractor/ExcelExtractor.java  (revision 722394)
+++ src/java/org/apache/poi/hssf/extractor/ExcelExtractor.java  (working copy)
@@ -46,6 +46,7 @@
        private boolean formulasNotResults = false;
        private boolean includeCellComments = false;
        private boolean includeBlankCells = false;
+       private boolean includeHeaderFooter = true;
        
        public ExcelExtractor(HSSFWorkbook wb) {
                super(wb);
@@ -79,6 +80,12 @@
         this.includeCellComments = includeCellComments;
     }
        /**
+     * Should header and footer be included? Default is true
+     */
+    public void setIncludeHeaderFooter(boolean includeHeaderFooter) {
+        this.includeHeaderFooter = includeHeaderFooter;
+    }
+       /**
         * Should blank cells be output? Default is to only
         *  output cells that are present in the file and are
         *  non-blank.
@@ -111,7 +118,7 @@
                        }
                        
                        // Header text, if there is any
-                       if(sheet.getHeader() != null) {
+                       if(sheet.getHeader() != null && includeHeaderFooter) {
                                text.append(
                                                _extractHeaderFooter(sheet.getHeader())
                                );
@@ -201,7 +208,7 @@
                        }
                        
                        // Finally Feader text, if there is any
-                       if(sheet.getFooter() != null) {
+                       if(sheet.getFooter() != null && includeHeaderFooter) {
                                text.append(
                                                _extractHeaderFooter(sheet.getFooter())
                                );
Comment 2 Nick Burch 2009-05-17 11:49:42 UTC
Thanks. Applied with tweaks, and with the same thing for XSSFExcelExtractor too