Bug 49423 - Data got lost when trying to get EscherContainerRecord
Summary: Data got lost when trying to get EscherContainerRecord
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: 3.6-FINAL
Hardware: PC Windows XP
: P1 normal with 1 vote (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-06-10 12:26 UTC by Jerry
Modified: 2015-03-22 17:24 UTC (History)
2 users (show)



Attachments
Class to reproduce the problem (4.33 KB, application/octet-stream)
2010-09-09 15:39 UTC, Oleg Kuryan
Details
Test file to use to repro problem (657.00 KB, application/vnd.ms-excel)
2010-09-09 15:41 UTC, Oleg Kuryan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jerry 2010-06-10 12:26:16 UTC
One of my project need to read images from an excel file (.xls). Besides reading out the image data, I also need to read out the image position (row and column) from excel file. During this process, I found there is a bug when I tried to get EscherContainerRecord. 
Here was my plan: I tried to read EscherAggregate from HSSFSheet, and then From EscherAggreage, I retrieved a list of the EscherRecord. By going though this list of EscherRecord, and their children, I can get EscherClientAnchorRecord which contains row number and column number. 

But every time if I process a file which has around 50+ or 60+ images, I got an warning: 
 WARNING: xxxx bytes remaining but no space left
 WARNING: xxxx bytes remaining but no space left

And from the output, I can see there are only about 30+ EscherClientAnchorRecord are retrieved. I believe that EscherClientAnchorRecord is a child of EscherContainerRecord this this case, and the process of creating this EscherContainerRecord get some problems. I checked around the source code, and found the only place where create this warning message is in class : 
org/apache/poi/ddf/EscherContainerRecord.java: with following method:
  public int fillFields(byte[] data, int pOffset, EscherRecordFactory recordFactory) {
        int bytesRemaining = readHeader(data, pOffset);
        int bytesWritten = 8;
        int offset = pOffset + 8;
        while (bytesRemaining > 0 && offset < data.length) {
            EscherRecord child = recordFactory.createRecord(data, offset);
            int childBytesWritten = child.fillFields(data, offset, recordFactory);
            bytesWritten += childBytesWritten;
            offset += childBytesWritten;
            bytesRemaining -= childBytesWritten;
            addChildRecord(child);
            if (offset >= data.length && bytesRemaining > 0) {
                System.out.println("WARNING: " + bytesRemaining + " bytes remaining but no space left");
            }
        }
        return bytesWritten;
    }
Comment 1 Oleg Kuryan 2010-09-09 15:39:44 UTC
Created attachment 26009 [details]
Class to reproduce the problem

Attaching java class to repro the problem
Comment 2 Oleg Kuryan 2010-09-09 15:41:15 UTC
Created attachment 26010 [details]
Test file to use to repro problem

Adding test file to make it easier to repro the problem. It shoulf be passed to the attached Main.java class
Comment 3 Oleg Kuryan 2010-09-09 15:42:50 UTC
Setting priority as it blocks development of the production task
Comment 4 Nick Burch 2010-09-09 16:33:35 UTC
If this is a blocker problem for you, then you'll need to either investigate it more yourself, or pay someone who provides POI consultancy to do so for you. POI is a volunteer project!

If you want to look into this yourself, you'll need to read up the Microsoft specifications on the file format, then manually decode the escher records by hand. Somewhere along the way, you'll hopefully spot the place where POI makes an incorrect assumption about one of the escher records. When you've found that, it will hopefully be quite a quick job to patch it, the hard bit is discovering where our assumptions on the file format and what actually crops up in some files differs.
Comment 5 Oleg Kuryan 2010-09-10 02:48:51 UTC
Hi Nick,

Thx, for your comments. Just a quick question, maybe you know about another way to get image position? I believe it is commonly used task that is good to have in the POI library.
Comment 6 Yegor Kozlov 2011-03-28 03:51:16 UTC
HSSF provides a way to iterate over shapes and read their positions. The following code works fine to me:

        HSSFWorkbook workbook = new HSSFWorkbook(new FileInputStream(file));
        int numSheets = workbook.getNumberOfSheets();
        for (int i = 0; i < numSheets; i++) {
            HSSFSheet sheet = workbook.getSheetAt(i);
            List<HSSFShape> shapes = sheet.getDrawingPatriarch().getChildren();
            for(HSSFShape shape : shapes){
                HSSFAnchor anchor = shape.getAnchor();

                if(anchor instanceof HSSFClientAnchor){
                    // absolute coordinates
                    HSSFClientAnchor clientAnchor = (HSSFClientAnchor)anchor;
                    System.out.println(clientAnchor.getRow1() + "," + clientAnchor.getRow2());
                } else if (anchor instanceof HSSFChildAnchor){
                    // shape is grouped and the anchor is expressed in the coordinate system of the group 
                    HSSFChildAnchor childAnchor = (HSSFChildAnchor)anchor;
                    System.out.println(childAnchor.getDy1() + "," + childAnchor.getDy2());
                }
            }
        }

If this code misses some images then please attach the problem file and a junit demonstrating what particular is missing. 

Yegor
Comment 7 Oleg Kuryan 2011-08-25 18:55:27 UTC
Yegor,

I have to reopen it. Please, try your code with excel file that I have attached to the current issue. I got the following Warnings:

WARNING: 9940 bytes remaining but no space left
WARNING: 9940 bytes remaining but no space left
Comment 8 Oleg Kuryan 2011-08-25 18:56:48 UTC
Forgot to mention that NO image coordinates are printed out at all
Comment 9 Dominik Stadler 2015-03-22 17:24:33 UTC
Seems to be fixed, I tried to reproduce this, but testing as described in the previous comments works now with the test file attached here.