Bug 58159

Summary: getHeaderText() and getFooterText() duplicate text in sheet.getTextRuns()
Product: POI Reporter: Luke Quinane <luke.quinane>
Component: HSLFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal CC: david.sitsky
Priority: P2 Keywords: PatchAvailable
Version: 3.9-FINAL   
Target Milestone: ---   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 56570    
Attachments: sample where text is duplicated
Adding common placeholder getter

Description Luke Quinane 2015-07-21 07:08:32 UTC
Created attachment 32917 [details]
sample where text is duplicated

We are trying to write a text extractor which will convert a PPT to text, and we've noticed that if we only get the text from the sheet's text runs header and footer content is missing sometimes. If we add in calls to getHeaderText() and getFooterText() then for some items the text is duplicated in the data runs.

Can we change this behaviour to always return the header/footer text in the runs, or to remove the duplication?

Thanks!
Comment 1 Nick Burch 2015-07-21 07:29:58 UTC
3.9 is rather old, what happens if you try with 3.12, or better yet the 3.13 beta 1 release which is currently syncing out to all the mirrors?
Comment 2 Luke Quinane 2015-08-28 04:55:12 UTC
Hi Nick,

We've retested with 3.13-beta1-20150723 and it has the same problem.

Cheers, Luke
Comment 3 Andreas Beeker 2015-12-20 17:31:22 UTC
The patch adds getter/setter for Placeholder - so duplicate text shapes can be 
easily identified.
Apart of it ... it also contains (a lot of) related changes, which I've fixed 
in this go, i.e. ...
- a hslf specific escher client data record, for easier retrieval of child 
records
- RecordTypes enum, to minimize ambiguities of RecordTypes and actual Record
- the fix for #56570

I'll apply it after POI 3.14-Beta1 is out
Comment 4 Andreas Beeker 2015-12-20 17:31:25 UTC
Created attachment 33366 [details]
Adding common placeholder getter
Comment 5 Andreas Beeker 2015-12-31 22:12:14 UTC
Patch applied via r1722476