Created attachment 30990 [details] Two Word test files without headers - one throws NullPointerException, one doesn't I was given a programmatically generated Word document that did not contain any headers. MS Word is able to open this, however I get a NullPointerException when attempting with XWPFWordExtractor.getText(). Specifically: java.lang.NullPointerException at org.apache.poi.xwpf.extractor.XWPFWordExtractor.extractHeaders(XWPFWordExtractor.java:162) at org.apache.poi.xwpf.extractor.XWPFWordExtractor.getText(XWPFWordExtractor.java:87) at Test.testPrintDoc(Test.java:16) at Test.main(Test.java:26) Looking at the code, it looks like hfPolicy is passed in as null to XWPFWordExtractor.extractHeaders() from XWPFWordExtractor.getText(): public String getText() { StringBuffer text = new StringBuffer(); XWPFHeaderFooterPolicy hfPolicy = document.getHeaderFooterPolicy(); // Start out with all headers extractHeaders(text, hfPolicy); which says the headerFooterPolicy of the Document (from Document.getHeaderFooterPolicy()) is never set in Document, and is the source of the null propagated to cause the error. I'd chalk it up to an invalid Word document, however MS Word can open the file. If you open it in Word, don't make any changes but just re-save it out, it still reports it doesn't have headers, but the new file can be read by XWPFWordExtractor.getText() without the NullPointerException. Example word documents without a header that throw the error and don't throw it are attached. Here's the test code I used to print out what was in the file. import java.io.FileInputStream; import org.apache.poi.xwpf.extractor.XWPFWordExtractor; import org.apache.poi.xwpf.usermodel.XWPFDocument; public class Test { public static void testPrintDoc(String file) throws Exception { FileInputStream fis = new FileInputStream(file); System.err.println("Reading " + file); try { XWPFDocument doc = new XWPFDocument(fis); XWPFWordExtractor textExtractor = new XWPFWordExtractor(doc); System.err.println(textExtractor.getText()); } finally { fis.close(); } } public static void main(String[] args) { try { Test.testPrintDoc("noHeaders.docx"); } catch (Exception e) { e.printStackTrace(); } try { Test.testPrintDoc("noHeaders_resaved.docx"); } catch (Exception e) { e.printStackTrace(); } } }
Thanks for the test file. Fixed in r1538044.