Bug 35045 - Extracting text from word files fails
Summary: Extracting text from word files fails
Status: RESOLVED LATER
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: 2.5-FINAL
Hardware: PC Windows 2000
: P2 critical (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-05-24 19:01 UTC by Robert Eberhardt
Modified: 2010-06-03 07:26 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Eberhardt 2005-05-24 19:01:07 UTC
Hello

I am trying to use poi to extract the text of some word documents with the
following code
StringWriter writer = new StringWriter();
WordDocument doc = new WordDocument("C:\\arj\\pdf\\peer.doc");
doc.openDoc();
doc.writeAllText(writer);
System.out.println(writer.toString());
some word files respond with the following exception
java.lang.NullPointerException
	at org.apache.poi.hdf.extractor.Utils.convertBytesToShort(Utils.java:47)
	at org.apache.poi.hdf.extractor.StyleSheet.doCHPOperation(StyleSheet.java:176)
	at org.apache.poi.hdf.extractor.StyleSheet.uncompressProperty(StyleSheet.java:685)
	at org.apache.poi.hdf.extractor.StyleSheet.uncompressProperty(StyleSheet.java:565)
	at
org.apache.poi.hdf.extractor.WordDocument.addParagraphContent(WordDocument.java:1050)
	at org.apache.poi.hdf.extractor.WordDocument.createParagraph(WordDocument.java:942)
	at org.apache.poi.hdf.extractor.WordDocument.addBlockContent(WordDocument.java:876)
	at org.apache.poi.hdf.extractor.WordDocument.writeSection(WordDocument.java:681)
	at org.apache.poi.hdf.extractor.WordDocument.<init>(WordDocument.java:211)
	at org.apache.poi.hdf.extractor.WordDocument.<init>(WordDocument.java:186)
	at zb.sts.text.WordTester.main(WordTester.java:27)
Exception in thread "main" 


The text of other word files is not completely extracted
Comment 1 David Fisher 2010-04-30 18:50:06 UTC
I've been rummaging through bugzilla.

Without an sample showing this bug it will be impossible to fix it.
Comment 2 Nick Burch 2010-06-03 07:26:06 UTC
This bug references a very old version of POI. As no new comments have been added in a long time, I'm assuming that this bug has now been fixed

If the bug still exists with the latest version of POI, please re-open the bug and add a comment indicating this, ideally also with a failing unit test