Summary: | hidden characters not removed | ||
---|---|---|---|
Product: | POI | Reporter: | sebastian.a.aguirre |
Component: | HWPF | Assignee: | POI Developers List <dev> |
Status: | NEW --- | ||
Severity: | critical | CC: | hgobir |
Priority: | P2 | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | All | ||
Attachments: | sample doc file to test |
Created attachment 33442 [details] sample doc file to test After reading the file and turning it into a String the hidden characters are not removed. This happens in XWPF as well. For reading the file I'm using a very simple method. File file = new File("file.doc"); FileInputStream fis; fis = new FileInputStream(file); HWPFDocument doc = new HWPFDocument(fis); WordExtractor ex = new WordExtractor(doc); String toReturn = ex.getText(); Same thing happens when using XWPF, very simple code. XWPFDocument doc = new XWPFDocument(fis); XWPFWordExtractor ex = new XWPFWordExtractor(doc); String toReturn = ex.getText(); I'm attaching a file you can use as sample. You can show/hide the hidden characters with ctrl+shift+8 Thanks.