Bug 39177

Summary: Bug: TextRun.setText sets incorrect length of text in the underlying StyleTextPropAtom.
Product: POI Reporter: Yegor Kozlov <yegor>
Component: HSLFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: major    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: Other   
OS: All   
Attachments: Code which fixes the bug

Description Yegor Kozlov 2006-04-01 10:10:10 UTC
Nick, it seems there is a bug with stylings. I'm implementing TextBox shape and
get corrupted files.

Description:
 If change text via TextRun.setText and save the resulting ppt the output has
correct
structure but when opening it in PowerPoint it looks corrupted.

Test case:

    public void testChangeText() throws Exception {
        String dirname = System.getProperty("HSLF.testdata.path");
        
        String filename = dirname + "/with_textbox.ppt";
        HSLFSlideShow hss = new HSLFSlideShow(filename);
        SlideShow ppt = new SlideShow(hss);
        
        Slide sl = ppt.getSlides()[0];
        TextRun[] txt = sl.getTextRuns();
        for (int i = 0; i < txt.length; i++) {
          String text = txt[i].getText();

          //everything should remain the same. Existing data structures must be
preserved.
          txt[i].setText(text);
        }
         
        FileOutputStream out = new FileOutputStream("text_poi.ppt");
        ppt.write(out);
        out.close();

        //open it and see that it is corrupted
    }

If you compare the original file and the re-saved one you will see thast the
differences are
only in StyleTextPropAtom.
For example, for "Plain Text" string:

                   before           after
paragraph.length   12               11
characters.length  12               11

for "This is Times New Roman":

                   before           after
paragraph.length   24               23
characters.length  24               23

That is the length of text in stylings is always text.length while it should be
text.length+1.
If text consists of multiple RichTextRun object then the total size of all text
runs must be rawtext.length+1.

Quick fix:
Lines 277 and 288 in TextRun.setText should be changed as follows:

  pCol.updateTextSize(s.length()+1);
  cCol.updateTextSize(s.length()+1);

 With this fix everything works right. The re-saved ppt have the same structure
as the original file.


Regards, Yegor
Comment 1 Yegor Kozlov 2006-04-01 10:11:21 UTC
Created attachment 18012 [details]
Code which fixes the bug
Comment 2 Yegor Kozlov 2006-04-04 15:00:25 UTC
Hi Nick,

I got why the sum of lengths must be rawtext.length + 1.
Extra character controls the format of new text which is appended to the end.
For example, consider a string "Hello, World!" with 2 format runs:

<b>Hello,</b><i> World!</i>
The first part is bold, the second is italic. 

Total length of the string is 13 so the sum of lengths must be 13 + 1 = 14.
Extra character must be included either in  <b>Hello,</b> or to <i> World!</i>
fragment.

In first case if you edit the text in PowerPoint and append something to the end
it will be bold, in the second case - italic. 

Sounds tricky. Isn't it? I think we should keep it simple and automatically add
the extra character 
to the last format run. Any objections? 

Regards, Yegor
Comment 3 Nick Burch 2006-04-04 15:05:21 UTC
Ah, that's looking better. I was a bit worried about randomly just adding one
more onto the last length!

I'll probably run some tests on several files, to make sure that adding 1 to the
last text run is the right thing to do (it sounds right to me, but you never
know with PPT!) and then commit it.

Cheers for figuring this out :)
Comment 4 Nick Burch 2006-04-12 19:01:33 UTC
We also need a tweak to the last RichTextRun, to make that 1 longer too

I've committed a fix that covers both, along with some tests to ensure we don't
break things on re-writes.
Comment 5 Yegor Kozlov 2006-04-12 19:21:27 UTC
(In reply to comment #4)
> We also need a tweak to the last RichTextRun, to make that 1 longer too
> 
> I've committed a fix that covers both, along with some tests to ensure we don't
> break things on re-writes.

Hi Nick,

I was writing this post when you commited the fix. 
It looks like the problem is fixed. 

Below is an extract from what I was going to post, just to repeat it once more:

The sum of length of all text runs must equal to total length of the text plus 1.
The last text run always contains an extra character which defines styling of a
new text 
which is appended to the end. This rule works both for character and paragraph
styles. I.e. 
the sum of length of all character styles must be equal to rawtext.length +1
and the sum of length of all paragraph styles must be equal to rawtext.length +1.

It makes sense to reflect it in the documentation. Text styling is one of the
most weird things in PPT. Even if you look at the working code it's extremely
hard to figure out why it has the extra character at the end.

Best Regards, Yegor
Comment 6 Nick Burch 2006-04-12 19:58:42 UTC
Ah, I'd been wondering about why there was that extra 1 character on the end.

I'm going to need to do another review of the code, to make sure when we build
up the RichTextRuns, we're properly taking account of this extra character.

When it comes to writing it back out, I want to get my head straight on how it
interplays with both the reading in, and paragraph stylings (not all the final
paragraphs seem to have this extra char, so some are 1 shorter than the char
one). I have a feeling that the current code isn't always putting an extra 1
onto the paragraph styling length, so I want to know if we do need it or not,
and then get it consistent.

When I've got that all sorted, I can try to write some comments that make sense
to others!
Comment 7 Nick Burch 2006-06-13 15:02:52 UTC
OK, the latest re-attack of the RichTextCode should hopefully have this
completely correctly sorted. As a bonus, it's also got it documented, and
hopefully much easier to see why we're doing it :)