I'm trying to find all shapes, pictures, diagrams, etc in a Word .doc file. I can find shapes using the escher records from my OfficeDrawing. But there is a Smart Art digram, but that returns null from, because the parent is not a SpContainer. Can you fix this method to handle both cases or add a new one that works with diagrams? Thanks.
Can you attach a sample file and code that you tried? Idealy as self-sufficient unit-test.
(In reply to Dominik Stadler from comment #1) > Can you attach a sample file and code that you tried? Idealy as > self-sufficient unit-test. This was 14 months ago when I was still working at my last job (and cared about this issues). I have forgooten the exact details. In either getting the diagram from OfficeDiagrams or getting the picture data or something, it would only allow diagram whose parents where SPContainers. But that doesn't apply to SmartArt so whichever method was returned NULL. We were trying to find all text in documents, no matter if paragraph, shape, whatever and order them to something resembling to top to bottom. We tried all an API method until we discovered API was lacking things we needed so did most of the work in the OOXML itself and found a solution tht fit our needs.
I think this may be about XSLFRelation class not providing a DIAGRAM relation. In PowerPoint files, SmartArt is kept under `./ppt/diagrams` and there is not relation to that in this class. Initially, for us, Tika was not parsing `./diagrams` directory. Following the trail, I came here and this may be the issue why Tika is not supporting `diagrams`. Tika issue: https://issues.apache.org/jira/browse/TIKA-1945
If someone could create very small sample PPTX and DOC/DOCX files, with these kinds of escher drawings in, we can use those to fix the problem + add unit tests to verify the fix (Quite possibly it may need two fixes, one for PowerPoint and one for Word)