Created attachment 31474 [details] test word file hi poi: i use poi-3.8 for extration text from world。But Not normally get a digital title 。like this: world original text: 一、 i like poi 1、 i like poi 2、 i like poi 二、 i like poi 三、 i like poi use poi-3.8 get: 1、 i like poi 1、 i like poi 2、 i like poi 1、 i like poi 2、 i like poi java code: FileInputStream fis = new FileInputStream(new File("e:/test.doc")); HWPFDocument doc = new HWPFDocument(fis); WordToTextConverter wordToTextConverter= new WordToTextConverter(); wordToTextConverter.processDocument( doc ); System.out.println(wordToTextConverter.getText());
sorry!!! world ---> word
I don't believe that WordToTextConverter does any style related things There are other converters which do, including ones in Apache POI, along with Apache Tika, I'd suggest you try one of those
hi nick: Thank you for your answer. And hope give me some suggestion about converters. I can't find this api (In reply to Nick Burch from comment #2) > I don't believe that WordToTextConverter does any style related things > > There are other converters which do, including ones in Apache POI, along > with Apache Tika, I'd suggest you try one of those
The discussion here stalled at some point, if you still need more information about ways to extract text/style information, then please start a discussion on StackOverflow or on the mailing lists.