Bug 43247

Summary: [PATCH] Support for getting OLE object data from slide show
Product: POI Reporter: Trejkaz (pen name) <trejkaz>
Component: HSLFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: enhancement Keywords: PatchAvailable
Priority: P2    
Version: 3.0-dev   
Target Milestone: ---   
Hardware: Other   
OS: other   
Attachments: Proposed patch
ole2-embedding-2003.ppt
TestOleEmbedding.java
Reviewed patch

Description Trejkaz (pen name) 2007-08-29 21:38:22 UTC
I did a bit of investigation on this and ended up implementing a couple of
things I really didn't need to implement (we only need the binary, I ended up
implementing some of the embedding side of it too, before I had figured out
where the binary actually was.)

New functionality:
  HSLFSlideShow.getEmbeddedObjects, returns ObjectData[] similar to getPictures.

New record types:
  ExEmbed
  ExEmbedAtom
  ExOleObjAtom
  ExOleObjStg
Comment 1 Trejkaz (pen name) 2007-08-29 21:38:39 UTC
Created attachment 20735 [details]
Proposed patch
Comment 2 Trejkaz (pen name) 2007-08-29 21:39:59 UTC
Created attachment 20736 [details]
ole2-embedding-2003.ppt

Attaching my test file.
Comment 3 Trejkaz (pen name) 2007-08-29 21:41:51 UTC
Created attachment 20737 [details]
TestOleEmbedding.java

Attaching a simple unit test.
Comment 4 Trejkaz (pen name) 2007-08-30 18:06:31 UTC
Created attachment 20748 [details]
Reviewed patch

Various fixes to the patch, spotted through code review by a second person.
Comment 5 Yegor Kozlov 2007-09-08 09:13:42 UTC
Patch applied. Thanks for it.

However, I will close this bug only when I see unit tests for low-level record
classes:

src/scratchpad/src/org/apache/poi/hslf/record/ExOleObjStg.java
src/scratchpad/src/org/apache/poi/hslf/record/ExOleObjAtom.java
src/scratchpad/src/org/apache/poi/hslf/record/ExEmbedAtom.java
src/scratchpad/src/org/apache/poi/hslf/record/ExEmbed.java

A minimal unit test should take a reference data from a ppt file and verify
getters/setters against it.
See unit tests in src/scratchpad/testcases/org/apache/poi/hslf/record and follow
the pattern.

Ideas for further development:
 (1) it should be possible to access OLE object properties contained in ExEmbed
container.
Did you figure out how to link ExOleObjStg and the corresponding ExEmbed? My
guess is that the order of ExEmbed in
Document.ExObjList corresponds to the order of ExOleObjStg records. That is for
the 1st ExOleObjStg we take the 1st Document.ExObjList.ExEmbed, etc.

Any thoughts?  
 
 (2)  I think OLE shapes should be instances of 
 OLEObjectShape extends SimpleShape. 

User code may look something like this:

Shape[] shape = slide.getShapes();
for (int i=0; i,shape.length; i++){
  if(shape[i] instanceof OLEObjectShape){
    OLEObjectShape obj = (OLEObjectShape)shape[i];
    ObjectData data = obj.getObjectData();

    //shoule be able to access object properties
    String clipboardName = obj.getClipboardName();
    if(clipboardName.equals("Microsoft Office Excel Worksheet")){
       //do something with the data
    }    

  }
}

 (3) Can we construct a workbook or a presentation given the binary data
retrieved from ObjectData.getData()?


Regards,
Yegor
Comment 6 Trejkaz (pen name) 2007-09-08 22:51:01 UTC
I'll see if I can get some time to make unit tests for those.  I'm not sure if I actually implemented setters 
so in theory I would only need to check that the getters return the expected values.  Only problem is 
that I'm currently stuck working on something more important (I do all this stuff during office hours, so 
I can't easily choose to work on unit testing something we don't use ourselves.  Since we only actually 
ended up using the method to get all the objects...)

As for the other suggestions...

(1) I'm pretty sure you're right about the IDs, since the IDs are small numbers and there is no other 
obvious way for them to be referenced.  The OLE properties are inside the storage itself so any API that 
goes as far as to allow access to them would need to load the whole filesystem from there.

(2) Hmm... might not be a bad idea.  I was just following the code done for pictures for this stuff, but in 
the event where the same object is embedded twice (I'm sure it must be possible since they're 
referenced by ID) this would allow us to see where they are and eventually render them perhaps.  
Although with regards to rendering, what I have found is that in the document there is also an EMF 
snapshop of the OLE object embedded as an ordinary picture.  So it may be that it already renders 
properly if anyone has written a renderer...

(3) Yep.  Passing that InputStream straight into a POIFSFileSystem results in a working filesystem, which 
can then be passed into whatever constructor is needed (although what we're doing is writing it to a 
temporary location first, so that we can potentially read from it multiple times without having to re-get 
the input stream.)  However what I have noticed is that in some cases, saving the InputStream to a file 
doesn't allow the file to be opened in the actual Office application, even if POI's classes have no 
problems accessing the contents.
Comment 7 Yegor Kozlov 2007-09-08 23:17:55 UTC
>I'm currently stuck working on something more important

No rush. Just put it in your TODO list. 

Yegor
Comment 8 Yegor Kozlov 2008-04-16 04:50:46 UTC
Finally I can resolve it.
I implemented OLEShape which extends Picture and can be used to retrieve the OLE data and some basic properties (progID, short and full names).

The usage is something like this:

Shape[] shape = slide.getShapes();
        for (int i = 0; i < shape.length; i++) {
            if (shape[i] instanceof OLEShape) {
                OLEShape ole = (OLEShape) shape[i];
                ObjectData data = ole.getObjectData();
                String name = ole.getInstanceName();
                if ("Worksheet".equals(name)) {
                    HSSFWorkbook wb = new HSSFWorkbook(data.getData());
                } else if ("Document".equals(name)) {
                    HWPFDocument doc = new HWPFDocument(data.getData());
                }
            }
        }

Regards,
Yegor