Bug 58159 - getHeaderText() and getFooterText() duplicate text in sheet.getTextRuns()
Summary: getHeaderText() and getFooterText() duplicate text in sheet.getTextRuns()
Alias: None
Product: POI
Classification: Unclassified
Component: HSLF (show other bugs)
Version: 3.9-FINAL
Hardware: All All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Keywords: PatchAvailable
Depends on:
Blocks: 56570
  Show dependency tree
Reported: 2015-07-21 07:08 UTC by Luke Quinane
Modified: 2015-12-31 22:12 UTC (History)
1 user (show)

sample where text is duplicated (136.00 KB, application/vnd.ms-powerpoint)
2015-07-21 07:08 UTC, Luke Quinane
Adding common placeholder getter (68.72 KB, application/zip)
2015-12-20 17:31 UTC, Andreas Beeker

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Quinane 2015-07-21 07:08:32 UTC
Created attachment 32917 [details]
sample where text is duplicated

We are trying to write a text extractor which will convert a PPT to text, and we've noticed that if we only get the text from the sheet's text runs header and footer content is missing sometimes. If we add in calls to getHeaderText() and getFooterText() then for some items the text is duplicated in the data runs.

Can we change this behaviour to always return the header/footer text in the runs, or to remove the duplication?

Comment 1 Nick Burch 2015-07-21 07:29:58 UTC
3.9 is rather old, what happens if you try with 3.12, or better yet the 3.13 beta 1 release which is currently syncing out to all the mirrors?
Comment 2 Luke Quinane 2015-08-28 04:55:12 UTC
Hi Nick,

We've retested with 3.13-beta1-20150723 and it has the same problem.

Cheers, Luke
Comment 3 Andreas Beeker 2015-12-20 17:31:22 UTC
The patch adds getter/setter for Placeholder - so duplicate text shapes can be 
easily identified.
Apart of it ... it also contains (a lot of) related changes, which I've fixed 
in this go, i.e. ...
- a hslf specific escher client data record, for easier retrieval of child 
- RecordTypes enum, to minimize ambiguities of RecordTypes and actual Record
- the fix for #56570

I'll apply it after POI 3.14-Beta1 is out
Comment 4 Andreas Beeker 2015-12-20 17:31:25 UTC
Created attachment 33366 [details]
Adding common placeholder getter
Comment 5 Andreas Beeker 2015-12-31 22:12:14 UTC
Patch applied via r1722476