When invoking the method org.apache.poi.hslf.usermodel.RichTextRun.getText() an exception is thrown - StringIndexOutOfBoundsException. It seems that the length member of this instance has a very big value (1572863), while the getText() method of the TextRun returns a shorter String (55).
Created attachment 19158 [details] ppt example
Nick, It looks like the definition of the potential paragraph properties in StyleTextPropAtom was wrong. I added the following property to the end: new TextProp(2, 0x200000, "para_unknown_7") With this change everything works right. I don't know what it means. Just read it and make sense out of it later. The patch is attached. Regards, Yegor
Created attachment 19172 [details] The patch with the fix
Created attachment 19173 [details] ppt to add to src\scratchpad\testcases\org\apache\poi\hslf\data
Created attachment 19174 [details] Modified test case
Created attachment 19175 [details] Modified StyleTextPropAtom
The data format used in StyleTextPropAtom is so stupid and brittle I'm amazed we haven't had one of these before... If we don't know about all the different properties (especially the ones at the end), we'll think we're done with one set of properties, when there's still data left for it (since you can't tell that). We'll then try and treat the next bit of data as the start of a new set of properties, even though it's the data for the last one. Thus, we end up with really silly values for text length, because they're near the start :( Good spot, cheers for the patch. I've committed it.