Bug 35045

Summary: Extracting text from word files fails
Product: POI Reporter: Robert Eberhardt <RobertEberhardt>
Component: POI OverallAssignee: POI Developers List <dev>
Severity: critical    
Priority: P2    
Version: 2.5-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: Windows 2000   

Description Robert Eberhardt 2005-05-24 19:01:07 UTC

I am trying to use poi to extract the text of some word documents with the
following code
StringWriter writer = new StringWriter();
WordDocument doc = new WordDocument("C:\\arj\\pdf\\peer.doc");
some word files respond with the following exception
	at org.apache.poi.hdf.extractor.Utils.convertBytesToShort(Utils.java:47)
	at org.apache.poi.hdf.extractor.StyleSheet.doCHPOperation(StyleSheet.java:176)
	at org.apache.poi.hdf.extractor.StyleSheet.uncompressProperty(StyleSheet.java:685)
	at org.apache.poi.hdf.extractor.StyleSheet.uncompressProperty(StyleSheet.java:565)
	at org.apache.poi.hdf.extractor.WordDocument.createParagraph(WordDocument.java:942)
	at org.apache.poi.hdf.extractor.WordDocument.addBlockContent(WordDocument.java:876)
	at org.apache.poi.hdf.extractor.WordDocument.writeSection(WordDocument.java:681)
	at org.apache.poi.hdf.extractor.WordDocument.<init>(WordDocument.java:211)
	at org.apache.poi.hdf.extractor.WordDocument.<init>(WordDocument.java:186)
	at zb.sts.text.WordTester.main(WordTester.java:27)
Exception in thread "main" 

The text of other word files is not completely extracted
Comment 1 David Fisher 2010-04-30 18:50:06 UTC
I've been rummaging through bugzilla.

Without an sample showing this bug it will be impossible to fix it.
Comment 2 Nick Burch 2010-06-03 07:26:06 UTC
This bug references a very old version of POI. As no new comments have been added in a long time, I'm assuming that this bug has now been fixed

If the bug still exists with the latest version of POI, please re-open the bug and add a comment indicating this, ideally also with a failing unit test