Bug 45541 - Content from the header and footer of an Office 2007 pptx document is not extracted.
Summary: Content from the header and footer of an Office 2007 pptx document is not ext...
Status: RESOLVED WORKSFORME
Alias: None
Product: POI
Classification: Unclassified
Component: XSLF (show other bugs)
Version: unspecified
Hardware: PC Windows Server 2003
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-04 08:42 UTC by xtrim
Modified: 2015-09-28 19:46 UTC (History)
0 users



Attachments
Contains JUnit test class and documents used for testing. (564.61 KB, application/x-zip-compressed)
2008-08-04 08:43 UTC, xtrim
Details

Note You need to log in before you can comment on or make changes to this bug.
Description xtrim 2008-08-04 08:42:29 UTC
The text contained in the header and footer of a power point 2007 document is not extracted.
Find in attachments the JUnit test class and the documents used for testing.
We expected to extract the word "testdoc".

Notes on the attached document:

- the document "Header_1.pptx" contain the word "testdoc" in the header.

- the document "Footer_1.pptx" contain the word "testdoc" in the footer.


"TestUnitPoi35Filter.java" is the JUnit class.
Comment 1 xtrim 2008-08-04 08:43:29 UTC
Created attachment 22365 [details]
Contains JUnit test class and documents used for testing.

The attachment is a ZIP file.
Comment 2 Dominik Stadler 2015-09-28 19:45:08 UTC
At least in the latest version 3.13 this works fine if you request the notes- and master-data in the enhanced getText() call with boolean parameters, you will need to cast to XSLFPowerPointExtractor for these to be available...

I will add a verifying unit test after SVN stops choking on me...
Comment 3 Dominik Stadler 2015-09-28 19:46:06 UTC
e.g. use something like        

       text = ((XSLFPowerPointExtractor)extr).getText(false, true);

to get the notes-data and 

       text = ((XSLFPowerPointExtractor)extr).getText(false, false, true);

to get the data from the master-slide