Bug 53058

Summary: [PATCH] Utility for representing drawings contained in a binary Excel file as a XML tree
Product: POI Reporter: Evgeniy Berlog <superrubiroyd>
Component: HSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal CC: superrubiroyd
Priority: P2    
Version: 3.9-dev   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: Patch
Source xls file to show the structure of drawing records
Target file which represents structure of workbook.xls in xml format
Updated patch
Updated target file which represents structure of workbook.xls in xml format
Updated patch
The copy of workbook.xls made by POI
Xml file which represents structure of target.xls
Xml file which represents structure of workbook.xls
Xml file which represents structure of target.xls
Updated patch

Description Evgeniy Berlog 2012-04-10 22:02:43 UTC
Created attachment 28574 [details]
Patch
Comment 1 Evgeniy Berlog 2012-04-10 22:05:20 UTC
Created attachment 28575 [details]
Source xls file to show the structure of drawing records
Comment 2 Evgeniy Berlog 2012-04-10 22:06:30 UTC
Created attachment 28576 [details]
Target file which represents structure of workbook.xls in xml format
Comment 3 Evgeniy Berlog 2012-04-10 22:12:31 UTC
Hi, I have created this utility as part of my GSoC proposal.
http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/evgeniy44/1

I look forward to your feedback.
Comment 4 Yegor Kozlov 2012-04-12 08:57:02 UTC
Hi Evgeniy,

A very good start! I see that for now only a few records override toXML(): EscherSplitMenuColorsRecord, EscherBSERecord and EscherDggRecord.
Do you plan to implement toXML for the rest of DDF ? I think the answer is 'yes', just want to confirm it. 

For EscherOptRecord it would be great to show properties as child nodes, something like this:  

<EscherOptRecord>
  <Property id="127" value="32440321" name="protection.lockagainstgrouping"/>
  <Property id="128" value="809474328" name="text.textid"/>
  <Property id="191" value="262148"  name="text.sizetexttofitshape"/>
  <Property id="447" value="1114113"  name="fill.nofillhittest"/>
  <Property id="511" value="1638417" name="linestyle.nolinedrawdash"/>
  <Property id="831" value="524288"  name="shape.backgroundshape"/>
  <Property id="959" value="131072" name="groupshape.print"/>
</EscherOptRecord>

Some records show recordId as decimals, others as hex. It should be unified, I think hex everywhere would be OK. 

Records thar are not supported (UnknownEscherRecord) should print their data in hex form:
<UnknownEscherRecord>
  50 4B 03 04 14 00 06 00 08 00 00 00 21 00 98 28 26
</UnknownEscherRecord>

Other than that, very cool! 

Looking forward to working with you on GSoC!

Regards,
Yegor
Comment 5 Evgeniy Berlog 2012-04-12 22:39:29 UTC
Created attachment 28597 [details]
Updated patch
Comment 6 Evgeniy Berlog 2012-04-12 22:41:47 UTC
Created attachment 28598 [details]
Updated target file which represents structure of workbook.xls in xml format
Comment 7 Evgeniy Berlog 2012-04-12 22:44:08 UTC
Hi, I have updated patch according to your advices. 
Also I have updated target xml file.

I look forward to your feedback.
With best regards, Evgeniy Berlog
Comment 8 Yegor Kozlov 2012-04-14 10:14:47 UTC
Very cool. It is nearly complete in my opinion, I only have two things to say:

1. Complete the use case : modify the source workbook with POI. Call sheet.getDrawingPatriarch() for each sheet, save the result and run BiffDrawingToXml. We expect that the output will be identical to the source but I guess it will be not. 

2.  RecordId, Version and Instance are record attributes (as opposed to the record body) and they  ought to be XML attributes instead of child nodes:

<EscherClientAnchorRecord recordId="0xF010" version"0x00" instance="0x00">
...
</EscherClientAnchorRecord>

This way we clearly separate record attributes (i.e record header) from the record body.

Other than that, very cool.

Regards,
Yegor

(In reply to comment #7)
> Hi, I have updated patch according to your advices. 
> Also I have updated target xml file.
> 
> I look forward to your feedback.
> With best regards, Evgeniy Berlog
Comment 9 Evgeniy Berlog 2012-04-15 15:39:38 UTC
Created attachment 28609 [details]
Updated patch
Comment 10 Evgeniy Berlog 2012-04-15 15:45:06 UTC
Created attachment 28610 [details]
The copy of workbook.xls made by POI

Code for copying workbook.xls into target.xls:
    public static void main(String[] args) {
        String input = "workbook.xls";
        String output = "target.xls";
        try{
            POIFSFileSystem fs = new POIFSFileSystem(new
                    FileInputStream(input));
            HSSFWorkbook workbook = new HSSFWorkbook(fs);
            HSSFSheet sheet = workbook.getSheetAt(0);

            HSSFPatriarch patriarch = sheet.createDrawingPatriarch();

            FileOutputStream fileOut = new FileOutputStream(output);
            workbook.write(fileOut);
            fileOut.close();

        } catch(IOException e){
            System.out.println("Can't open file "+input+" or "+output);
            System.exit(1);
        }
    }
Comment 11 Evgeniy Berlog 2012-04-15 15:46:20 UTC
Created attachment 28611 [details]
Xml file which represents structure of target.xls
Comment 12 Evgeniy Berlog 2012-04-15 15:52:31 UTC
Created attachment 28612 [details]
Xml file which represents structure of workbook.xls
Comment 13 Evgeniy Berlog 2012-04-15 16:01:43 UTC
Hi Yegor,
Thanks much for your advices!
I have added 'recordId', 'version' and 'instance' fields as record attributes.
Also I copied workbook.xls into target.xls using the code above to compare their structures.
You were right. The stuctures were different, what you can see while comparing target.xml and workbook.xml. 
Please, check if any more changes are required or not.
My GSoC proposal includes resolving of this problem.


With best regards, Evgeniy Berlog
Comment 14 Evgeniy Berlog 2012-04-17 07:41:15 UTC
Created attachment 28622 [details]
Xml file which represents structure of target.xls
Comment 15 Yegor Kozlov 2012-04-28 10:40:52 UTC
Evgeniy,

this is nearly complete. The final touch will be a command-line interface. Something like this:

BiffDrawingToXml -sheet <sheetId> <inputFileName>
or
BiffDrawingToXml -sheet "sheetName" <inputFileName>

The name of the output xml file should be derived from the input file name, i.e. xmlFileName = inputFileName.replace(".xls", ".xml")

Please complete these changes and I will check this utility in SVN.

Yegor
Comment 16 Evgeniy Berlog 2012-04-30 14:34:16 UTC
Created attachment 28698 [details]
Updated patch

Hi, I have added command line interface to this utility. 
Here is utility description:

Synopsis:
BiffDrawingToXml [Options] <Source file>

Options:
-exclude-workbook  (excludes records which belong to the workbook)
-sheet-name "sheetName" (addes to xml file records of the sheet with specified name)
-sheet-indexes <indexesList> (addes to xml file records of the sheets with specified indexes. Indexes must be separated by comma)

If no sheets is specified, utility converts all of them into xml file

Thanks for your advices.
With best regards, Evgeniy Berlog