Bug 54925

Summary: [PATCH] StyleTextPropAtom incorrect text size causes StringIndexOutOfBoundsException in RichTextRun.getText()
Product: POI Reporter: virtuald <dustin>
Component: HSLFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 3.10-dev   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: Patch that truncates text size if it's larger than the parent

Description virtuald 2013-05-03 20:35:37 UTC
Created attachment 30253 [details]
Patch that truncates text size if it's larger than the parent

In some of my PPT files (which unfortunately I can't post), the StyleTextPropAtom contains paragraph and character runs which have incorrect text sizes in them (e.g., they are much larger than the parent text atom). A warning is logged ("Problem reading paragraph style runs: ... "), but because the incorrect text size is stored in the text properties, a StringIndexOutOfBoundsException is thrown when calling RichTextRun.getText(). Examination shows that it is calling substring on the parent text without validating the actual text length. 

I wasn't able to create a standalone file with this type of problem in it. The files that I do have all validate correctly with Microsoft's binary file format checker, and Microsoft Office is able to open them also without any complaints. The files do have additional warnings saying "Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: followed by 4002", but I don't feel this is related. 

The attached patch fixes this problem by modifying StyleTextPropAtom to truncate the text size if it is larger than the parent text size. This fixes my problem, passes all current unit tests.
Comment 1 Nick Burch 2013-06-25 15:41:28 UTC
4002 is a BaseTextPropAtom, which we don't currently have support for. As of r1496518 we should no longer give a warning for this. 

If you have a file handy that showed this, any chance you could share it, or at least the list of records around the BaseTextPropAtom? I'm interested to know if it goes in instead of a StyleTextPropAtom, or before it, as that would affect our detection logic
Comment 2 Nick Burch 2013-06-25 15:50:10 UTC
Hopefully fixed in r1496520, but without a unit test it's hard to be sure - could you confirm please?
Comment 3 Dominik Stadler 2013-09-29 20:30:31 UTC
Should be fixed, but unconfirmed for some time, therefore setting to RESOLVED, please reopen if this is still a problem with current builds.