Bug 45561 - poi-3.5-beta1-20080718.jar - content from a SmartArt (diagram) object of a 2007 docx document is not extracted.
Summary: poi-3.5-beta1-20080718.jar - content from a SmartArt (diagram) object of a 20...
Status: RESOLVED WONTFIX
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: unspecified
Hardware: PC Windows Server 2003
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-05 07:38 UTC by xtrim
Modified: 2008-09-15 06:36 UTC (History)
0 users



Attachments
Contains JUnit test class and documents used for testing. (44.58 KB, application/x-zip-compressed)
2008-08-05 07:38 UTC, xtrim
Details

Note You need to log in before you can comment on or make changes to this bug.
Description xtrim 2008-08-05 07:38:10 UTC
Created attachment 22384 [details]
Contains JUnit test class and documents used for testing.

The text contained in a SmartArt object inserted/created in a word 2007 document is not extracted.
Find in attachments the JUnit test class and the document used for testing.
We expected to extract the words "List1", "process2", "Cycle1", "Pyramid1", "relationship3".

Notes on the attached documents:

- the document "classic_TextInSmartArt.docx" contains the words  "List1", "process2", "Cycle1", "Pyramid1", "relationship3" in the SmartArt objects inserted in the document.


"TestUnitPoi35Filter.java" is the JUnit class.
Comment 1 Nick Burch 2008-08-05 15:52:16 UTC
I'm not sure if we want to be going that far down into graphics objects by default.

If you'd like to submit a patch to extract the text, along with a flag to toggle the behaviour on/off, I'll happily apply it to svn :)
Comment 2 xtrim 2008-09-15 06:36:09 UTC
(In reply to comment #1)
> I'm not sure if we want to be going that far down into graphics objects by
> default.
> If you'd like to submit a patch to extract the text, along with a flag to
> toggle the behaviour on/off, I'll happily apply it to svn :)


hi,

Thanks for your comment.
I think the SmartArt objects are not really graphics objects. They are formatted objects allowing the user to enter text. 
The SmartArt objects used to be named "diagrams" in Office 2003.
The text inserted in a diagram in an office 2003 word document is properly extracted. Can we hope this text will be extracted in the future?

Regards,
Bénédicte