The text contained in the header and footer of a power point 2007 document is not extracted. Find in attachments the JUnit test class and the documents used for testing. We expected to extract the word "testdoc". Notes on the attached document: - the document "Header_1.pptx" contain the word "testdoc" in the header. - the document "Footer_1.pptx" contain the word "testdoc" in the footer. "TestUnitPoi35Filter.java" is the JUnit class.
Created attachment 22365 [details] Contains JUnit test class and documents used for testing. The attachment is a ZIP file.
At least in the latest version 3.13 this works fine if you request the notes- and master-data in the enhanced getText() call with boolean parameters, you will need to cast to XSLFPowerPointExtractor for these to be available... I will add a verifying unit test after SVN stops choking on me...
e.g. use something like text = ((XSLFPowerPointExtractor)extr).getText(false, true); to get the notes-data and text = ((XSLFPowerPointExtractor)extr).getText(false, false, true); to get the data from the master-slide