Bug 54925 - [PATCH] StyleTextPropAtom incorrect text size causes StringIndexOutOfBoundsException in RichTextRun.getText()
Summary: [PATCH] StyleTextPropAtom incorrect text size causes StringIndexOutOfBoundsEx...
Alias: None
Product: POI
Classification: Unclassified
Component: HSLF (show other bugs)
Version: 3.10-dev
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2013-05-03 20:35 UTC by virtuald
Modified: 2013-09-29 20:30 UTC (History)
0 users

Patch that truncates text size if it's larger than the parent (886 bytes, patch)
2013-05-03 20:35 UTC, virtuald
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description virtuald 2013-05-03 20:35:37 UTC
Created attachment 30253 [details]
Patch that truncates text size if it's larger than the parent

In some of my PPT files (which unfortunately I can't post), the StyleTextPropAtom contains paragraph and character runs which have incorrect text sizes in them (e.g., they are much larger than the parent text atom). A warning is logged ("Problem reading paragraph style runs: ... "), but because the incorrect text size is stored in the text properties, a StringIndexOutOfBoundsException is thrown when calling RichTextRun.getText(). Examination shows that it is calling substring on the parent text without validating the actual text length. 

I wasn't able to create a standalone file with this type of problem in it. The files that I do have all validate correctly with Microsoft's binary file format checker, and Microsoft Office is able to open them also without any complaints. The files do have additional warnings saying "Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: followed by 4002", but I don't feel this is related. 

The attached patch fixes this problem by modifying StyleTextPropAtom to truncate the text size if it is larger than the parent text size. This fixes my problem, passes all current unit tests.
Comment 1 Nick Burch 2013-06-25 15:41:28 UTC
4002 is a BaseTextPropAtom, which we don't currently have support for. As of r1496518 we should no longer give a warning for this. 

If you have a file handy that showed this, any chance you could share it, or at least the list of records around the BaseTextPropAtom? I'm interested to know if it goes in instead of a StyleTextPropAtom, or before it, as that would affect our detection logic
Comment 2 Nick Burch 2013-06-25 15:50:10 UTC
Hopefully fixed in r1496520, but without a unit test it's hard to be sure - could you confirm please?
Comment 3 Dominik Stadler 2013-09-29 20:30:31 UTC
Should be fixed, but unconfirmed for some time, therefore setting to RESOLVED, please reopen if this is still a problem with current builds.