PPT files created in 2007 return the headers and footers as text runs from Slide.getTextRuns(). PPT files created in 2003 don't. Both of them return the headers and footers via getHeadersFooters(), of course. This means that if you want to extract all the text, you can choose between two equally bad options: (a) Don't use getHeadersFooters(), and then earlier formats miss some of the text. (b) Do use getHeadersFooters(), and then later formats get some text doubled up. It would be nice if either text runs which are part of the header or footer were automatically omitted, or if the older formats had additional text runs inserted so that both formats can be treated identically. Via the usermodel, there doesn't appear to be any kind of API to distinguish the two, making it difficult to come up with a workaround.
Can you provide sample files and sample code which allow to see the differences without having to install both those versions of PowerPoint?
Created attachment 32911 [details] Sample from 2003
Created attachment 32912 [details] Sample from 2007
Attaching some contrasting examples.
I've added a convenience method isHeaderOrFooter() to the TextParagraphs, so it's easy to distinguish between normal paragraphs and header/footer Same applies for XSLF, i.e. it's available over the Common SL interface Applied via r1743769