I've discovered that some PPT files (eg data/PictureTypeZero.ppt in SVN) have a really crazy first SlideListWithText In theory, there should be 2 or 3 of these. The first with MainMasters, the second with Slides, and the optional third with Notes. However, on files like these, the first one contains both MainMasters and Slides! I've updated the code to throw a CorruptPowerPointFileException on these docs (previously we had a class cast exception), but we need to figure out what (if anything) we can do for them.
Created attachment 19303 [details] This file causes the exception As requested - here is an example file that causes HSLF to throw the exception
Nick, This is quite normal. The error happens when the first SLWT has a link to Title Matster. How to reproduce: - open PowerPoint and create a presentation - Menu view/slide master - Menu Insert/ New Title Master. After it you should have two masters: a slide master and a title master. - save. - Try to open it in HSLF and get the error. I attached a sample file. PowerPoint supports two types of slide masters: * Slide Master. The data is in MainMaster container * Title Master. The data is in Slide container. Weird? I think so. I don't know why pages with title layout use different master. I guess if it is missing the normal MainMaster is used but I didn't research it yet. I think your code in SlideShow.buildSlidesAndNotes has extra checks that can be ommitted. Just keep in mind, whatever you find in the fist SLWT it is about masters. For now we handle only references to MainMasters. Later we will add support for Title, Note and other masters. I won't be surprised if we have references to other exotic containers in the first SLWT. See how I changed SlideShow.buildSlidesAndNotes. Regards, Yegor
Created attachment 19380 [details] ppt with title master
Created attachment 19381 [details] Improved SlideShow.buildSlidesAndNotes
So, if a PPT file has the first SLWT with MainMaster Slide .... and the second SLWT with Slide Slide Then all's fine, we ignore (for now) anything other than MainMaster in the first SLWT, and grab the slides from the second one? If we have a ppt with MainMaster Slide .... and no second SLWT, we throw a CorruptPowerPointException?
>>Then all's fine, we ignore (for now) anything other than MainMaster in the first >>SLWT, and grab the slides from the second one? Yes. We always read master info from the first SLWT and slides from the second one. Just ignore what we don't support. >>If we have a ppt with >> MainMaster >> Slide >> .... >>and no second SLWT, we throw a CorruptPowerPointException? Yes. There MUST be two SLWTs. If either is missing it means the ppt is corrupted (or MS guys put another level of complexity into it :)). Yegor
OK, based on your findings (thanks for those!), I've applied your patch, and then tidied the code up a little more (and made a few variables have more sensible names) Hopefully this scheme will be a good starting position for supporting future kinds of masters. I haven't got anything to throw CorruptPowerPoint in the case of not enough SLWTs, do you think we should have that check?