Bug 57596 - OfficeDrawing doesn't return diagram escher records
Summary: OfficeDrawing doesn't return diagram escher records
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: 3.8-FINAL
Hardware: PC All
: P5 major (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2015-02-18 21:39 UTC by sbuberl
Modified: 2017-06-08 15:32 UTC (History)
1 user (show)


Note You need to log in before you can comment on or make changes to this bug.
Description sbuberl 2015-02-18 21:39:15 UTC
I'm trying to find all shapes, pictures, diagrams, etc in a Word .doc file.  I can find shapes using the escher records from my OfficeDrawing.  But there is a Smart Art digram, but that returns null from, because the parent is not a SpContainer.  Can you fix this method to handle both cases or add a new one that works with diagrams?

Comment 1 Dominik Stadler 2016-04-05 12:26:39 UTC
Can you attach a sample file and code that you tried? Idealy as self-sufficient unit-test.
Comment 2 sbuberl 2016-04-06 04:07:53 UTC
(In reply to Dominik Stadler from comment #1)
> Can you attach a sample file and code that you tried? Idealy as
> self-sufficient unit-test.

This was 14 months ago when I was still working at my last job (and cared about this issues).  I have forgooten the exact details.  In either getting the diagram from OfficeDiagrams or getting the picture data or something, it would only allow diagram whose parents where SPContainers.  But that doesn't apply to SmartArt so whichever method was returned NULL.

We were trying to find all text in documents, no matter if paragraph, shape, whatever and order them to something resembling to top to bottom.  We tried all an API method until we discovered API was lacking things we needed so did most of the work in the OOXML itself and found a solution tht fit our needs.
Comment 3 Oytun 2017-06-08 15:14:48 UTC
I think this may be about XSLFRelation class not providing a DIAGRAM relation. In PowerPoint files, SmartArt is kept under `./ppt/diagrams` and there is not relation to that in this class.

Initially, for us, Tika was not parsing `./diagrams` directory. Following the trail, I came here and this may be the issue why Tika is not supporting `diagrams`.

Tika issue: https://issues.apache.org/jira/browse/TIKA-1945
Comment 4 Nick Burch 2017-06-08 15:32:26 UTC
If someone could create very small sample PPTX and DOC/DOCX files, with these kinds of escher drawings in, we can use those to fix the problem + add unit tests to verify the fix

(Quite possibly it may need two fixes, one for PowerPoint and one for Word)