Summary: | getText() of XWPFParagraph returns deleted text if in "review" mode | ||
---|---|---|---|
Product: | POI | Reporter: | femmer |
Component: | XWPF | Assignee: | POI Developers List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | femmer |
Priority: | P2 | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Hardware: | Macintosh | ||
OS: | All | ||
Bug Depends on: | |||
Bug Blocks: | 61787 | ||
Attachments: |
A test file to reproduce the problem with
Patch |
Created attachment 32844 [details]
Patch
Here is a patch, that checks if there is a deletion item associated with a run, before it adds the text. I'm not sure which other items could contain such a deletion, so I just checked for XWPFRuns.
The fix is a simple check: if (run instanceof XWPFRun) { + XWPFRun xRun = (XWPFRun) run; + if (xRun.getCTR().getRsidDel() == null) { + out.append(xRun.toString()); + } + } Here is the output: bffvalidator c:\temp\58061good.xls BFFValidator: "c:\temp\58061good.xls" FAILED at 06/22/15 16:42:09 Log at: c:\temp\58061good.xls.bffvalidator.06-22-15_16-42-09.xml See: http://msdn.microsoft.com/en-us/library/A6FFF2B4-470A-463D-A6E9-9DAD9676CD44 for more information bffvalidator c:\temp\58061corrupt.xls BFFValidator: "c:\temp\58061corrupt.xls" NOT RECOGNIZED (The Microsoft Office Binary File Format Validator encountered an error reading the file you specified, OR The Microsof t Office Binary File Format Validator supports Word, Excel, and PowerPoint binary file formats only. The file you specified is an unsupported file type.) at 06/22/15 16:42:14 Log at: c:\temp\58061corrupt.xls.bffvalidator.06-22-15_16-42-14.xml sorry, wrong bug! |
Created attachment 32843 [details] A test file to reproduce the problem with Dear all, I’m looking for a simple solution to parse only the newest version of an XWPF file (as if all changes are accepted or so). As far as I could google and browse through the javadoc there is no such functionality in apache poi, is that correct? I.e.: - Open a MS Word document - Track changes - Remove text from the document (in tracked-mode) - Save. (see file attached) - Open file with apache POI - iterate through paragraphs - call getText() on the paragraphs Outcome: The removed text is returned. Expected: Only text of the "final version" of the document is returned. Best, Henning