Bug 50972 - XWPFWordExtractor ignores <w:br/> entries
Summary: XWPFWordExtractor ignores <w:br/> entries
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XWPF (show other bugs)
Version: 3.8-dev
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-25 12:40 UTC by Igor Rogov
Modified: 2011-03-25 13:00 UTC (History)
1 user (show)



Attachments
Test document (9.93 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2011-03-25 12:40 UTC, Igor Rogov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Igor Rogov 2011-03-25 12:40:58 UTC
Created attachment 26797 [details]
Test document

Two words separated by a line break character are glued together.

I tried to debug the issue and found a code in XWPFRun.toString() method:

if (o instanceof CTEmpty) {
   // Some inline text elements get returned not as
   //  themselves, but as CTEmpty, owing to some odd
   //  definitions around line 5642 of the XSDs
   String tagName = o.getDomNode().getNodeName();
   if ("w:tab".equals(tagName)) {
      text.append("\t");
   }
   if ("w:br".equals(tagName)) {
      text.append("\n");
   }
   <...>
}

The issue is that "o" is an instance of CTBrImpl, not CTEmpty. So this element is ignored.

Attached a test document.
Comment 1 Nick Burch 2011-03-25 13:00:38 UTC
Ah, looks like someone fixed the code for one set of ooxml-schemas, but not the other

Fixed in r1085471.