wordDocument.writeAllText() return null when a ms word doc file contains many tabs and indent. This is experienced whenever the the WordDocument of this kind is being process. Please give me an email address or a url where i could upload the doc file.
Bellow is a method that could be use to replicate the condition public String convertToText(String theWordDoc) throws java.lang.Exception { StringWriter out=null; String result=null; try { wordDocument=new WordDocument(theWordDoc); out=new StringWriter(); wordDocument.writeAllText(out); out.flush(); result=out.getBuffer().toString(); } finally { if(out!=null) { out.close(); } } return result; // null is being returned } I could send the word doc as well. Please let me know id required
Created attachment 7610 [details] this is the word document that will return null when parsed by wordDocument.writeAllText()
Created attachment 7611 [details] Contains program that would make this problem show up. Unzip and run java -jar Bug22014Replicator.jar <path-to-word-document>
When running the program (java -jar Bug22014Replicator.jar <path-to-word- document>), with the word document attached, I get the following: Exception in thread "main" java.io.IOException: Invalid header signature; read 2 90763650945099227, expected -2226271756974174256 at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockRead er.java:124) at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSyste m.java:120) at org.apache.poi.hdf.extractor.WordDocument.<init>(WordDocument.java:22 9) at org.apache.poi.hdf.extractor.WordDocument.<init>(WordDocument.java:22 2) at bug22014.ReplicateBug22014.replicate(ReplicateBug22014.java:39) at bug22014.ReplicateBug22014.main(ReplicateBug22014.java:30) But the word document is viewable using Word Viewer, Microsoft Word and Open Office.
This document is from Word 2.0. Next time you have this problem go into Word and try to Save As. The version of the format will show up in the "Save as type" field. We don't support Word 2.0 and we have no plans to support Word 2.0...Sorry