Bug 45738

Summary: [PATCH] New code for reading Office Art shapes
Product: POI Reporter: dnapoletano <domenico.napoletano>
Component: HWPFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: enhancement CC: domenico.napoletano
Priority: P2    
Version: 3.0-dev   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Attachments: Document with 2 shapes (the second is a group)

Description dnapoletano 2008-09-04 02:08:46 UTC
Created attachment 22524 [details]
Document with 2 shapes (the second is a group)

Office Art shapes (rectangles, circles, arrows, shape groups, etc.) are coded into Word documents in a PlexOfCps that starts at fib.getFcPlcspaMom(), ends at fib.getLcbPlcspaMom() and is made of structures 26 bytes wide. I've written some patch code to access these data by:


1) adding to "usermodel" package the class

package org.apache.poi.hwpf.usermodel;

import org.apache.poi.hwpf.model.GenericPropertyNode;
import org.apache.poi.util.LittleEndian;

public class Shape {
	int _id, _left, _right, _top, _bottom;
	boolean _inDoc; //true if the Shape bounds are within document (for example, it's false if the image left corner is outside the doc, like for embedded documents)
	
	public Shape(GenericPropertyNode nodo) {
		byte [] contenuto = nodo.getBytes();
		_id = LittleEndian.getInt(contenuto);
		_left = LittleEndian.getInt(contenuto, 4);
		_top = LittleEndian.getInt(contenuto, 8);
		_right = LittleEndian.getInt(contenuto, 12);
		_bottom = LittleEndian.getInt(contenuto, 16);
		_inDoc = (_left >= 0 && _right >= 0 && _top >= 0 && _bottom >= 0);
	}
	
	public int getId() {
		return _id;
	}
	
	public int getLeft() {
		return _left;
	}
	
	public int getRight() {
		return _right;
	}
	
	public int getTop() {
		return _top;
	}
	
	public int getBottom() {
		return _bottom;
	}
	
	public int getWidth() {
		return _right - _left + 1;
	}
	
	public int getHeight() {
		return _bottom - _top + 1;
	}
	
	public boolean isWithinDocument() {
		return _inDoc;
	}
}

2) adding to "model" package the class

package org.apache.poi.hwpf.model;

import java.util.ArrayList;
import java.util.List;

import org.apache.poi.hwpf.usermodel.Shape;

public class ShapesTable {
	private List<Shape> _shapes;
	private List<Shape> _shapesVisibili;  //holds visible shapes
	
	public ShapesTable(byte [] tblStream, FileInformationBlock fib) {
		PlexOfCps binTable = new PlexOfCps(tblStream, fib.getFcPlcspaMom(), fib.getLcbPlcspaMom(), 26);
		
		_shapes = new ArrayList<Shape>();
		_shapesVisibili = new ArrayList<Shape>();
		

		for(int i = 0; i < binTable.length(); i++) {
			GenericPropertyNode nodo = binTable.getProperty(i);
			
			Shape sh = new Shape(nodo);
			_shapes.add(sh);
			if(sh.isWithinDocument())
				_shapesVisibili.add(sh);
		}
	}
	
	public List<Shape> getAllShapes() {
		return _shapes;
	}
	
	public List<Shape> getVisibleShapes() {
		return _shapesVisibili;
	}
}

3) editing the HWPFDocument class, adding in the constructor

public HWPFDocument(DirectoryNode directory, POIFSFileSystem pfilesystem) throws IOException

the declaration

/** Holds Office Art objects */
protected ShapesTable _officeArts;

and the line (for example after the _pictures stream reading line)

_officeArts = new ShapesTable(_tableStream, _fib);

which reads shapes data from doc table stream

4) I've also written a simple main method to test this

public static void main(String[] args)
  {
          try
          {

        	  JFileChooser jfc = new JFileChooser();

                int esito = jfc.showOpenDialog(null);

                if(esito != JFileChooser.APPROVE_OPTION)
                {
                        JOptionPane.showMessageDialog(null, "No file selected");
                }
                else
                {
                        String path = jfc.getSelectedFile().getAbsolutePath();

                        HWPFDocument doc = new HWPFDocument(new FileInputStream(path));
                        Range r = doc.getRange();
                        
                        List<Shape> shapes = doc.getShapesTable().getAllShapes();
                        
                        for(Shape shp: shapes)
                        	System.out.println("SHAPE " + shp.getWidth() + "x" + shp.getHeight() + ", WITHIN DOC=" + shp.isWithinDocument());
                }
          }
          catch(Exception er)
          {
                  er.printStackTrace();
          }
  }

that can be tried with the attached doc

5) Perhaps the MS Open Specification Promise initiative can be helpful? They have published all Office doc specs...

Hope this helps :D
Comment 1 Nick Burch 2008-09-07 11:35:06 UTC
Thanks for this patch, I've applied it to svn, along with a simple unit test

In terms of the OSP docs, yes they are useful. They do make it much quicker now to add new basic features. Alas what we now lack is enough developers to make use of it all.