One of my project need to read images from an excel file (.xls). Besides reading out the image data, I also need to read out the image position (row and column) from excel file. During this process, I found there is a bug when I tried to get EscherContainerRecord. Here was my plan: I tried to read EscherAggregate from HSSFSheet, and then From EscherAggreage, I retrieved a list of the EscherRecord. By going though this list of EscherRecord, and their children, I can get EscherClientAnchorRecord which contains row number and column number. But every time if I process a file which has around 50+ or 60+ images, I got an warning: WARNING: xxxx bytes remaining but no space left WARNING: xxxx bytes remaining but no space left And from the output, I can see there are only about 30+ EscherClientAnchorRecord are retrieved. I believe that EscherClientAnchorRecord is a child of EscherContainerRecord this this case, and the process of creating this EscherContainerRecord get some problems. I checked around the source code, and found the only place where create this warning message is in class : org/apache/poi/ddf/EscherContainerRecord.java: with following method: public int fillFields(byte[] data, int pOffset, EscherRecordFactory recordFactory) { int bytesRemaining = readHeader(data, pOffset); int bytesWritten = 8; int offset = pOffset + 8; while (bytesRemaining > 0 && offset < data.length) { EscherRecord child = recordFactory.createRecord(data, offset); int childBytesWritten = child.fillFields(data, offset, recordFactory); bytesWritten += childBytesWritten; offset += childBytesWritten; bytesRemaining -= childBytesWritten; addChildRecord(child); if (offset >= data.length && bytesRemaining > 0) { System.out.println("WARNING: " + bytesRemaining + " bytes remaining but no space left"); } } return bytesWritten; }
Created attachment 26009 [details] Class to reproduce the problem Attaching java class to repro the problem
Created attachment 26010 [details] Test file to use to repro problem Adding test file to make it easier to repro the problem. It shoulf be passed to the attached Main.java class
Setting priority as it blocks development of the production task
If this is a blocker problem for you, then you'll need to either investigate it more yourself, or pay someone who provides POI consultancy to do so for you. POI is a volunteer project! If you want to look into this yourself, you'll need to read up the Microsoft specifications on the file format, then manually decode the escher records by hand. Somewhere along the way, you'll hopefully spot the place where POI makes an incorrect assumption about one of the escher records. When you've found that, it will hopefully be quite a quick job to patch it, the hard bit is discovering where our assumptions on the file format and what actually crops up in some files differs.
Hi Nick, Thx, for your comments. Just a quick question, maybe you know about another way to get image position? I believe it is commonly used task that is good to have in the POI library.
HSSF provides a way to iterate over shapes and read their positions. The following code works fine to me: HSSFWorkbook workbook = new HSSFWorkbook(new FileInputStream(file)); int numSheets = workbook.getNumberOfSheets(); for (int i = 0; i < numSheets; i++) { HSSFSheet sheet = workbook.getSheetAt(i); List<HSSFShape> shapes = sheet.getDrawingPatriarch().getChildren(); for(HSSFShape shape : shapes){ HSSFAnchor anchor = shape.getAnchor(); if(anchor instanceof HSSFClientAnchor){ // absolute coordinates HSSFClientAnchor clientAnchor = (HSSFClientAnchor)anchor; System.out.println(clientAnchor.getRow1() + "," + clientAnchor.getRow2()); } else if (anchor instanceof HSSFChildAnchor){ // shape is grouped and the anchor is expressed in the coordinate system of the group HSSFChildAnchor childAnchor = (HSSFChildAnchor)anchor; System.out.println(childAnchor.getDy1() + "," + childAnchor.getDy2()); } } } If this code misses some images then please attach the problem file and a junit demonstrating what particular is missing. Yegor
Yegor, I have to reopen it. Please, try your code with excel file that I have attached to the current issue. I got the following Warnings: WARNING: 9940 bytes remaining but no space left WARNING: 9940 bytes remaining but no space left
Forgot to mention that NO image coordinates are printed out at all
Seems to be fixed, I tried to reproduce this, but testing as described in the previous comments works now with the test file attached here.