Created attachment 26514 [details] MS Word document on which the effect can be reproduced. When MS Word document (please see the attachment) containing greek characters is passed to org.apache.poi.hdf.extractor.WordDocument. Method writeAllText returns incorrect-incomplete result. No exception is thrown to indicate the problem. Steps to reproduce: 1. Use the MS Word document from attachment. 2. Create the input stream of the document and then use this snippet: WordDocument wd = new WordDocument(inputStream); StringWriter docTextWriter = new StringWriter(); PrintWriter pw = new PrintWriter(docTextWriter); wd.writeAllText(pw); result = docTextWriter.toString(); 3. Expected result is string containing "Process description document τεστ new" 4. Actual result is "Process description" 5. No sign of internal error indicated, no exception is thrown. I would expect at least exception thrown as an indicator that something went wrong.
HDF is no longer supported, and only remains for existing legacy users. Please try with HWPF