Bug 57598

Summary: No way to associate shape with its textbox text range
Product: POI Reporter: sbuberl
Component: HWPFAssignee: POI Developers List <dev>
Severity: major CC: sbuberl
Priority: P5    
Version: 3.8-FINAL   
Target Milestone: ---   
Hardware: Macintosh   
OS: All   

Description sbuberl 2015-02-18 21:59:50 UTC
I am trying to match up multiples shape (OfficeDrawings) with their text in the getMainTextBoxRange. No text or anything that seemed like in an offset the range in the tmain textbox range pragraraphs.  It looks like shapes usually end in empty shapes in text range, but if shape text has blank line, it's no longer helpful.  Would be possible to add new API to help with this?

Comment 1 Dominik Stadler 2016-04-05 12:23:20 UTC
Can you attach a sample file that can be used for looking at this?
Comment 2 sbuberl 2016-04-06 04:19:59 UTC
This was 14 months ago when I was still working at my last job (and cared about this issue).  I may forgtoten something.  If I remember correctly there was an API method/class that returned all the shapes' text as one big range/block text.  All shape's texts were terminated by new lines (and I think in order from top to bottom).  Which may have been extremely helpful except shapes (especially comments which I think were also in there) can multiple lines over their own.  Now there is no reliable way to know when one shape ends and the next begins using this blob.  What would have been more helpful is if each diagram/shape knew its own text and exposed it via its own API rather then putting all shaepe text in one big blob.  Of if you want keep the blob, the shape could known it's own text offset and length in the blob.

We were trying to find all text in documents, no matter if paragraph, shape, whatever and order them to something resembling to top to bottom.  We tried all an API method until we discovered API was lacking things we needed so did most of the work in the OOXML itself and found a solution tht fit our needs.
Comment 3 Dominik Stadler 2016-11-25 21:16:42 UTC
Unfortunately it is very hard to reproduce/work on this without any sample code or sample file, so I am closing this for now, please provide more information, especially a sample file if you are having this issue.
Comment 4 sbuberl 2016-11-25 22:19:49 UTC
Did anyone actually read this?  It's not a bug.  There is nothing to really reproduce.  It was a question/request to see if your API could do something and maybe request it be added.  Make a new document which a handful of shapes.  Some with text and some with not.  All I wanted to do was given any shape in the document, find its text (if it had any).  The getMainTextBoxRange was the only way to get shape text, but its only big concatenation of all shapes into one big string/range with no way to distinguish which shape belonged to which part of the text.  We had to work around this by reading the ooxml itself, but it would have been helpful in the main API.  That's it.

I left that former job long ago but I don't care anymore but we had to a lot of ooxml manipulation across all your APIs for things it was lacking like this.
Comment 5 Dominik Stadler 2016-11-26 07:42:55 UTC
The source of POI is available, so the best course of action would have been to add the necessary API/functionality and to contribute it to the project. 

That's how volunteer-driven open source projects can grow and improve without depending on the limited time of a few non-paid volunteers.