Bug 45549 - poi-3.5-beta1-20080718.jar - content from an embedded document (docx, xlsx or pptx) of a 2007 pptx document is not extracted.
Summary: poi-3.5-beta1-20080718.jar - content from an embedded document (docx, xlsx or...
Status: RESOLVED WONTFIX
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: unspecified
Hardware: PC Windows Server 2003
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
: 45553 45554 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-08-05 05:21 UTC by xtrim
Modified: 2008-08-05 06:09 UTC (History)
0 users



Attachments
Contains JUnit test class and documents used for testing. (559.76 KB, application/x-zip-compressed)
2008-08-05 05:21 UTC, xtrim
Details

Note You need to log in before you can comment on or make changes to this bug.
Description xtrim 2008-08-05 05:21:22 UTC
Created attachment 22372 [details]
Contains JUnit test class and documents used for testing.

The text contained in a document embedded in a power point 2007 document is not extracted (the embedded document may be a docx, a xlsx or a pptx document).
Find in attachments the JUnit test class and the documents used for testing.
We expected to extract the words "testdoc" and "test phrase".

Notes on the attached documents:

- the document "EmbeddedObject_word.pptx" contains the words "testdoc" and "test phrase" in the embedded docx document.

- the document "EmbeddedObject_excel.pptx" contains the words "testdoc" and "test phrase" in the embedded xlsx document.

- the documents "EmbeddedObject_ppt.pptx" contains the words "testdoc" and "test phrase" in the embedded pptx document.

"TestUnitPoi35Filter.java" is the JUnit class.
Comment 1 Nick Burch 2008-08-05 05:33:58 UTC
POI doesn't recurse into embeded documents automatically. You'll need to handle iterating through them yourself, extracting the embeded streams individually
Comment 2 Nick Burch 2008-08-05 06:08:39 UTC
*** Bug 45553 has been marked as a duplicate of this bug. ***
Comment 3 Nick Burch 2008-08-05 06:09:23 UTC
*** Bug 45554 has been marked as a duplicate of this bug. ***