Bug 45738 - [PATCH] New code for reading Office Art shapes
Summary: [PATCH] New code for reading Office Art shapes
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: 3.0-dev
Hardware: PC Linux
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2008-09-04 02:08 UTC by dnapoletano
Modified: 2008-09-07 11:35 UTC (History)
1 user (show)

Document with 2 shapes (the second is a group) (52.50 KB, application/octet-stream)
2008-09-04 02:08 UTC, dnapoletano

Note You need to log in before you can comment on or make changes to this bug.
Description dnapoletano 2008-09-04 02:08:46 UTC
Created attachment 22524 [details]
Document with 2 shapes (the second is a group)

Office Art shapes (rectangles, circles, arrows, shape groups, etc.) are coded into Word documents in a PlexOfCps that starts at fib.getFcPlcspaMom(), ends at fib.getLcbPlcspaMom() and is made of structures 26 bytes wide. I've written some patch code to access these data by:

1) adding to "usermodel" package the class

package org.apache.poi.hwpf.usermodel;

import org.apache.poi.hwpf.model.GenericPropertyNode;
import org.apache.poi.util.LittleEndian;

public class Shape {
	int _id, _left, _right, _top, _bottom;
	boolean _inDoc; //true if the Shape bounds are within document (for example, it's false if the image left corner is outside the doc, like for embedded documents)
	public Shape(GenericPropertyNode nodo) {
		byte [] contenuto = nodo.getBytes();
		_id = LittleEndian.getInt(contenuto);
		_left = LittleEndian.getInt(contenuto, 4);
		_top = LittleEndian.getInt(contenuto, 8);
		_right = LittleEndian.getInt(contenuto, 12);
		_bottom = LittleEndian.getInt(contenuto, 16);
		_inDoc = (_left >= 0 && _right >= 0 && _top >= 0 && _bottom >= 0);
	public int getId() {
		return _id;
	public int getLeft() {
		return _left;
	public int getRight() {
		return _right;
	public int getTop() {
		return _top;
	public int getBottom() {
		return _bottom;
	public int getWidth() {
		return _right - _left + 1;
	public int getHeight() {
		return _bottom - _top + 1;
	public boolean isWithinDocument() {
		return _inDoc;

2) adding to "model" package the class

package org.apache.poi.hwpf.model;

import java.util.ArrayList;
import java.util.List;

import org.apache.poi.hwpf.usermodel.Shape;

public class ShapesTable {
	private List<Shape> _shapes;
	private List<Shape> _shapesVisibili;  //holds visible shapes
	public ShapesTable(byte [] tblStream, FileInformationBlock fib) {
		PlexOfCps binTable = new PlexOfCps(tblStream, fib.getFcPlcspaMom(), fib.getLcbPlcspaMom(), 26);
		_shapes = new ArrayList<Shape>();
		_shapesVisibili = new ArrayList<Shape>();

		for(int i = 0; i < binTable.length(); i++) {
			GenericPropertyNode nodo = binTable.getProperty(i);
			Shape sh = new Shape(nodo);
	public List<Shape> getAllShapes() {
		return _shapes;
	public List<Shape> getVisibleShapes() {
		return _shapesVisibili;

3) editing the HWPFDocument class, adding in the constructor

public HWPFDocument(DirectoryNode directory, POIFSFileSystem pfilesystem) throws IOException

the declaration

/** Holds Office Art objects */
protected ShapesTable _officeArts;

and the line (for example after the _pictures stream reading line)

_officeArts = new ShapesTable(_tableStream, _fib);

which reads shapes data from doc table stream

4) I've also written a simple main method to test this

public static void main(String[] args)

        	  JFileChooser jfc = new JFileChooser();

                int esito = jfc.showOpenDialog(null);

                if(esito != JFileChooser.APPROVE_OPTION)
                        JOptionPane.showMessageDialog(null, "No file selected");
                        String path = jfc.getSelectedFile().getAbsolutePath();

                        HWPFDocument doc = new HWPFDocument(new FileInputStream(path));
                        Range r = doc.getRange();
                        List<Shape> shapes = doc.getShapesTable().getAllShapes();
                        for(Shape shp: shapes)
                        	System.out.println("SHAPE " + shp.getWidth() + "x" + shp.getHeight() + ", WITHIN DOC=" + shp.isWithinDocument());
          catch(Exception er)

that can be tried with the attached doc

5) Perhaps the MS Open Specification Promise initiative can be helpful? They have published all Office doc specs...

Hope this helps :D
Comment 1 Nick Burch 2008-09-07 11:35:06 UTC
Thanks for this patch, I've applied it to svn, along with a simple unit test

In terms of the OSP docs, yes they are useful. They do make it much quicker now to add new basic features. Alas what we now lack is enough developers to make use of it all.