Bug 55733 - NullPointerException when attempting to parse a Word document with no headers
Summary: NullPointerException when attempting to parse a Word document with no headers
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XWPF (show other bugs)
Version: 3.9-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-01 17:11 UTC by david.patrone
Modified: 2013-11-01 19:44 UTC (History)
1 user (show)



Attachments
Two Word test files without headers - one throws NullPointerException, one doesn't (2.97 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2013-11-01 17:11 UTC, david.patrone
Details

Note You need to log in before you can comment on or make changes to this bug.
Description david.patrone 2013-11-01 17:11:02 UTC
Created attachment 30990 [details]
Two Word test files without headers - one throws NullPointerException, one doesn't

I was given a programmatically generated Word document that did not contain any headers. MS Word is able to open this, however I get a NullPointerException when attempting with XWPFWordExtractor.getText(). Specifically:

java.lang.NullPointerException
	at org.apache.poi.xwpf.extractor.XWPFWordExtractor.extractHeaders(XWPFWordExtractor.java:162)
	at org.apache.poi.xwpf.extractor.XWPFWordExtractor.getText(XWPFWordExtractor.java:87)
	at Test.testPrintDoc(Test.java:16)
	at Test.main(Test.java:26)

Looking at the code, it looks like hfPolicy is passed in as null to XWPFWordExtractor.extractHeaders() from XWPFWordExtractor.getText():

public String getText() {
    StringBuffer text = new StringBuffer();
    XWPFHeaderFooterPolicy hfPolicy = document.getHeaderFooterPolicy();
    
    // Start out with all headers
    extractHeaders(text, hfPolicy);

which says the headerFooterPolicy of the Document (from Document.getHeaderFooterPolicy()) is never set in Document, and is the source of the null propagated to cause the error.

I'd chalk it up to an invalid Word document, however MS Word can open the file. If you open it in Word, don't make any changes but just re-save it out, it still reports it doesn't have headers, but the new file can be read by XWPFWordExtractor.getText() without the NullPointerException.

Example word documents without a header that throw the error and don't throw it are attached. Here's the test code I used to print out what was in the file.

import java.io.FileInputStream;

import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;

public class Test {

    public static void testPrintDoc(String file) throws Exception {
        FileInputStream fis = new FileInputStream(file);
        System.err.println("Reading " + file);
        try {
            XWPFDocument doc = new XWPFDocument(fis);
            XWPFWordExtractor textExtractor = new XWPFWordExtractor(doc);
            System.err.println(textExtractor.getText());
        } finally {
            fis.close();
        }
    }    
    
    public static void main(String[] args) {
        
        try {
            Test.testPrintDoc("noHeaders.docx");
        } catch (Exception e) {
            e.printStackTrace();
        }
        try {
            Test.testPrintDoc("noHeaders_resaved.docx");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}
Comment 1 Nick Burch 2013-11-01 19:44:05 UTC
Thanks for the test file. Fixed in r1538044.