Hello I am trying to use poi to extract the text of some word documents with the following code StringWriter writer = new StringWriter(); WordDocument doc = new WordDocument("C:\\arj\\pdf\\peer.doc"); doc.openDoc(); doc.writeAllText(writer); System.out.println(writer.toString()); some word files respond with the following exception java.lang.NullPointerException at org.apache.poi.hdf.extractor.Utils.convertBytesToShort(Utils.java:47) at org.apache.poi.hdf.extractor.StyleSheet.doCHPOperation(StyleSheet.java:176) at org.apache.poi.hdf.extractor.StyleSheet.uncompressProperty(StyleSheet.java:685) at org.apache.poi.hdf.extractor.StyleSheet.uncompressProperty(StyleSheet.java:565) at org.apache.poi.hdf.extractor.WordDocument.addParagraphContent(WordDocument.java:1050) at org.apache.poi.hdf.extractor.WordDocument.createParagraph(WordDocument.java:942) at org.apache.poi.hdf.extractor.WordDocument.addBlockContent(WordDocument.java:876) at org.apache.poi.hdf.extractor.WordDocument.writeSection(WordDocument.java:681) at org.apache.poi.hdf.extractor.WordDocument.<init>(WordDocument.java:211) at org.apache.poi.hdf.extractor.WordDocument.<init>(WordDocument.java:186) at zb.sts.text.WordTester.main(WordTester.java:27) Exception in thread "main" The text of other word files is not completely extracted
I've been rummaging through bugzilla. Without an sample showing this bug it will be impossible to fix it.
This bug references a very old version of POI. As no new comments have been added in a long time, I'm assuming that this bug has now been fixed If the bug still exists with the latest version of POI, please re-open the bug and add a comment indicating this, ideally also with a failing unit test