|Summary:||getHeaderText() and getFooterText() duplicate text in sheet.getTextRuns()|
|Product:||POI||Reporter:||Luke Quinane <luke.quinane>|
|Component:||HSLF||Assignee:||POI Developers List <dev>|
|Bug Depends on:|
sample where text is duplicated
Adding common placeholder getter
Description Luke Quinane 2015-07-21 07:08:32 UTC
Created attachment 32917 [details] sample where text is duplicated We are trying to write a text extractor which will convert a PPT to text, and we've noticed that if we only get the text from the sheet's text runs header and footer content is missing sometimes. If we add in calls to getHeaderText() and getFooterText() then for some items the text is duplicated in the data runs. Can we change this behaviour to always return the header/footer text in the runs, or to remove the duplication? Thanks!
Comment 1 Nick Burch 2015-07-21 07:29:58 UTC
3.9 is rather old, what happens if you try with 3.12, or better yet the 3.13 beta 1 release which is currently syncing out to all the mirrors?
Comment 2 Luke Quinane 2015-08-28 04:55:12 UTC
Hi Nick, We've retested with 3.13-beta1-20150723 and it has the same problem. Cheers, Luke
Comment 3 Andreas Beeker 2015-12-20 17:31:22 UTC
The patch adds getter/setter for Placeholder - so duplicate text shapes can be easily identified. Apart of it ... it also contains (a lot of) related changes, which I've fixed in this go, i.e. ... - a hslf specific escher client data record, for easier retrieval of child records - RecordTypes enum, to minimize ambiguities of RecordTypes and actual Record - the fix for #56570 I'll apply it after POI 3.14-Beta1 is out
Comment 4 Andreas Beeker 2015-12-20 17:31:25 UTC
Created attachment 33366 [details] Adding common placeholder getter