Bug 51683 - [HSSF] Improve support for Shapes and Shape Groups
Summary: [HSSF] Improve support for Shapes and Shape Groups
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: unspecified
Hardware: All All
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Blocks: 53010
  Show dependency tree
Reported: 2011-08-18 21:38 UTC by Hannes Erven
Modified: 2012-08-12 11:35 UTC (History)
0 users

First shot at loading and decoding Shape Groups and all their children (8.79 KB, application/octet-stream)
2011-08-18 21:38 UTC, Hannes Erven

Note You need to log in before you can comment on or make changes to this bug.
Description Hannes Erven 2011-08-18 21:38:09 UTC
Created attachment 27407 [details]
First shot at loading and decoding Shape Groups and all their children

At the moment (Rev#6041) the support for Shape Groups and Shapes in general is limited in HSSF; e.g. only the top Shape objects are decoded, any shape contained in a group is (silently!) skipped.

I've tried to rework the EscherAggregate.convertUserRecordsToModel() method to recursively parse down all the Shapes and their children. Due to my limited knowledge of the Escher file format and lack of understanding for POI interals, the attempt more or less failed: although POI seems to load the Shape tree correctly, it loses a lot of the shape's properties and saves only damaged XLS files.

I'll attach my patch attempt and would be glad for any feedback. I'll also be glad to hack along and/or contribute in other ways. Thanks.
Comment 1 Yegor Kozlov 2011-09-11 11:04:39 UTC
Thanks for the patch. The code looks good, but I'm reluctant to apply it without a unit test. 

Any chance you could create a test(s) that would create a worksheet with a drawing group, write, read back and assert the following:

 - shapes from top level group are decoded
 - shapes from nested groups are decoded

I see that you commented clearEscherRecords(). Can you explain why or add a test that justifies this change:

+		// Now, clear any trace of what records make up
+		//  the patriarch
+		// Otherwise, everything will go horribly wrong
+		//  when we try to write out again....
+//		clearEscherRecords();
+		drawingManager.getDgg().setFileIdClusters(new EscherDggRecord.FileIdCluster[0]);

Comment 2 Hannes Erven 2011-09-11 19:30:13 UTC
I have to confess that I solved the issue that made me investigate this by other means.
Anyways, I'm of course willing to try to prepare a test case. What is problematic is that a patched version will just read nested shape groups, but always produces corrupt files on output. I suspect that this is due to incomplete loading of the shape's properties, but I can only guess.

Regarding the clearEscher() comments, I seem to have overlooked this change when creating the patch. It is just a formatting change that my IDE seems to have automatically performed... :-/
Comment 3 Yegor Kozlov 2011-09-12 11:49:41 UTC
Corrupted output on read-write-read of workbooks with drawings is a weak place of HSSF. Current code is oriented on creating new drawings from scratch, but read-modify-write often results in corruption. In any case, it is better that you upload a test case for the "read only" case and we will check-in your code in svn.
Comment 4 Evgeniy Berlog 2012-08-12 11:35:52 UTC
This problem should be fixed in trunk.

Please try with a nightly build - see download links on http://poi.apache.org/
or build yourself from SVN trunk, see http://poi.apache.org/subversion.html