Bug 56350 - XWPFRun seperating text in the same format
Summary: XWPFRun seperating text in the same format
Alias: None
Product: POI
Classification: Unclassified
Component: XWPF (show other bugs)
Version: unspecified
Hardware: PC All
: P2 major (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2014-04-04 20:01 UTC by Navaneeth Anantharaman
Modified: 2014-04-05 16:34 UTC (History)
1 user (show)

the template word Document used in program. (12.43 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2014-04-04 20:01 UTC, Navaneeth Anantharaman

Note You need to log in before you can comment on or make changes to this bug.
Description Navaneeth Anantharaman 2014-04-04 20:01:47 UTC
Created attachment 31477 [details]
the template word Document used in program.

I am trying to execute the code below

var doc = new XWPFDocument(OPCPackage.open(("D:\\template.docx")));

for ( p in doc.getParagraphs()) {
  for ( r in p.getRuns()) {
    var text = r.toString();
    if (text.contains("$ClaimNumber")) {
      text = text.replace("$ClaimNumber", claimRes.ClaimNumber);
   if (text.contains("$NoOfExposures")) {
      text = text.replace("$NoOfExposures", claimRes.Exposures.length);

try {
  outStream = new FileOutputStream("D:\\templateoutput.docx");
} catch (e : FileNotFoundException ) {

try {
  print("File updated successfully")
} catch (e : FileNotFoundException ) {
} catch (e :IOException ) {

The output is 

The Claim Number is $
There are $
File updated successfully

It can be clearly seen that the problem is with the XWPFRun which is reading it in a way that is not expected. I am trying to replace $ClaimNumber with the actual Claim Number.
Since the Run is behaving in an unexpected manner the $ and the word ClaimNumber is separated in to two runs which in turn fails to be replace in the output.
Also if you look at the document its pretty simple and straight forward and looking at it I can see that the entire Paragraph is in the Same format.
Which would make me assume that the Run is also in the same format. However looking at the output it is not so. Please let me know how to resolve this.

PS : 
The above code is a Gosu Code.
However its easily convertible to Java and the variable ClaimRes usages can also be replaced with any text
Comment 1 Nick Burch 2014-04-05 16:34:11 UTC
That's not a bug in Apache POI, that's just a quirk of how word writes the file. For some reason, Word has decided that the $ and the rest of your text have/had/may-have-had different formatting, so split them into two runs

You'll need to tweak your logic to detect and cope with the case when the search text spans several runs, then decide which run to replace in and which to delete