Bug 66260 - XWPF should have a getNumberOfTextRuns() method
Summary: XWPF should have a getNumberOfTextRuns() method
Status: REOPENED
Alias: None
Product: POI
Classification: Unclassified
Component: XWPF (show other bugs)
Version: 5.2.2-FINAL
Hardware: All All
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-09-14 01:57 UTC by PJ Fanning
Modified: 2022-12-26 09:47 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description PJ Fanning 2022-09-14 01:57:38 UTC
XWPFRun.getText(int n) returns the TextRun data for position n but there is no public API way to find how how many TextRuns are represented by the XWPFRun instance.
Comment 1 PJ Fanning 2022-09-14 19:40:05 UTC
According to https://stackoverflow.com/questions/73691624/how-to-remove-line-breaks-with-apache-poi/73692616#comment130179141_73692616, we only seem to need getText(0) - that it isn't needed to call getText(1), etc.

I don't really use the XWPF APIs but I find them confusing. Seems that we should have a getText() that does the same as getText(0) and deprecate the latter - likewise, for the setText method.
Comment 2 Axel Richter 2022-09-15 04:48:47 UTC
Sorry. That SO comment was misleading. There really may be multiple `w:t` elements in one `w:r`. See example in 

ECMA-376-1:2016
Office Open XML File Formats — Fundamentals
and Markup Language Reference
October 2016

page 333:

... However, if a text wrapping break character (a
typical line break) were inserted after the word is, as follows:

<w:r>
 <w:t>This is</w:t>
 <w:br/>
 <w:t xml:space="preserve"> a simple sentence.</w:t>
</w:r>

But I've never seen Word itself doing as such. It always uses multiple runs then:

<w:r>
 <w:t>This is</w:t>
</w:r>
<w:r>
 <w:br/>
 <w:t xml:space="preserve"> a simple sentence.</w:t>
</w:r>
Comment 3 Nick Burch 2022-09-15 05:24:17 UTC
The XWPF APIs have largely grown by community contribution, without some of the overall review that the Excel format code has had. If someone has time to do a full review and suggest tweaks that'd be amazing!