Summary: | getHeaderText() and getFooterText() duplicate text in sheet.getTextRuns() | ||
---|---|---|---|
Product: | POI | Reporter: | Luke Quinane <luke.quinane> |
Component: | HSLF | Assignee: | POI Developers List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | david.sitsky |
Priority: | P2 | Keywords: | PatchAvailable |
Version: | 3.9-FINAL | ||
Target Milestone: | --- | ||
Hardware: | All | ||
OS: | All | ||
Bug Depends on: | |||
Bug Blocks: | 56570 | ||
Attachments: |
sample where text is duplicated
Adding common placeholder getter |
3.9 is rather old, what happens if you try with 3.12, or better yet the 3.13 beta 1 release which is currently syncing out to all the mirrors? Hi Nick, We've retested with 3.13-beta1-20150723 and it has the same problem. Cheers, Luke The patch adds getter/setter for Placeholder - so duplicate text shapes can be easily identified. Apart of it ... it also contains (a lot of) related changes, which I've fixed in this go, i.e. ... - a hslf specific escher client data record, for easier retrieval of child records - RecordTypes enum, to minimize ambiguities of RecordTypes and actual Record - the fix for #56570 I'll apply it after POI 3.14-Beta1 is out Created attachment 33366 [details]
Adding common placeholder getter
|
Created attachment 32917 [details] sample where text is duplicated We are trying to write a text extractor which will convert a PPT to text, and we've noticed that if we only get the text from the sheet's text runs header and footer content is missing sometimes. If we add in calls to getHeaderText() and getFooterText() then for some items the text is duplicated in the data runs. Can we change this behaviour to always return the header/footer text in the runs, or to remove the duplication? Thanks!