Bug 51683

Summary: [HSSF] Improve support for Shapes and Shape Groups
Product: POI Reporter: Hannes Erven <hannes>
Component: HSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: enhancement    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 53010    
Attachments: First shot at loading and decoding Shape Groups and all their children

Description Hannes Erven 2011-08-18 21:38:09 UTC
Created attachment 27407 [details]
First shot at loading and decoding Shape Groups and all their children

At the moment (Rev#6041) the support for Shape Groups and Shapes in general is limited in HSSF; e.g. only the top Shape objects are decoded, any shape contained in a group is (silently!) skipped.

I've tried to rework the EscherAggregate.convertUserRecordsToModel() method to recursively parse down all the Shapes and their children. Due to my limited knowledge of the Escher file format and lack of understanding for POI interals, the attempt more or less failed: although POI seems to load the Shape tree correctly, it loses a lot of the shape's properties and saves only damaged XLS files.

I'll attach my patch attempt and would be glad for any feedback. I'll also be glad to hack along and/or contribute in other ways. Thanks.
Comment 1 Yegor Kozlov 2011-09-11 11:04:39 UTC
Thanks for the patch. The code looks good, but I'm reluctant to apply it without a unit test. 

Any chance you could create a test(s) that would create a worksheet with a drawing group, write, read back and assert the following:

 - shapes from top level group are decoded
 - shapes from nested groups are decoded

I see that you commented clearEscherRecords(). Can you explain why or add a test that justifies this change:

+		// Now, clear any trace of what records make up
+		//  the patriarch
+		// Otherwise, everything will go horribly wrong
+		//  when we try to write out again....
+//		clearEscherRecords();
+		drawingManager.getDgg().setFileIdClusters(new EscherDggRecord.FileIdCluster[0]);


Regards,
Yegor
Comment 2 Hannes Erven 2011-09-11 19:30:13 UTC
I have to confess that I solved the issue that made me investigate this by other means.
Anyways, I'm of course willing to try to prepare a test case. What is problematic is that a patched version will just read nested shape groups, but always produces corrupt files on output. I suspect that this is due to incomplete loading of the shape's properties, but I can only guess.

Regarding the clearEscher() comments, I seem to have overlooked this change when creating the patch. It is just a formatting change that my IDE seems to have automatically performed... :-/
Comment 3 Yegor Kozlov 2011-09-12 11:49:41 UTC
Corrupted output on read-write-read of workbooks with drawings is a weak place of HSSF. Current code is oriented on creating new drawings from scratch, but read-modify-write often results in corruption. In any case, it is better that you upload a test case for the "read only" case and we will check-in your code in svn.
Comment 4 Evgeniy Berlog 2012-08-12 11:35:52 UTC
This problem should be fixed in trunk.

Please try with a nightly build - see download links on http://poi.apache.org/
or build yourself from SVN trunk, see http://poi.apache.org/subversion.html