1. Create a XLS with Excel 2000 and add 58 checkbox controls 2. Load the file with HSSFWorkbook(), and save with HSSFWorkbook#write() 3. Open the saved file with Excel, appears an alert "Uable to read file" This problem is seems same as: http://issues.apache.org/bugzilla/show_bug.cgi?id=39512
Created attachment 21134 [details] xls has 58 controls
Created attachment 21135 [details] saved xls
Can you confirm if 58 controls is the point where the problem kicks in? (i.e. does a sheet with 57 controls save and re-open just fine, or is it some smaller number where that's the case?)
Yes, I confirmed that no problem in some case about 30 to 57 controls. But another confirmer in a japanese communitiy said: Less than 59 controls: fine 59 controls: problem (however no evidence)
I think to be able to debug this, we'll want 4 very closely related files: * file generated in excel, with 57 controls * same file, saved by poi, which excel can read in again * re-open excel file, add in 58 controls, do save as * that 58 controller file, saved by poi, which excel can't read They ought to be created something like that, so that pretty much the only differences will be the number of controls. That will help us narrow down the differences
Created attachment 21156 [details] 4 related files of the sequence Well, I've created 4 related files of the sequence that you said: 1. file generated in excel, with 57 controls 2. same file, saved by poi, which excel can read in again 3. re-open excel file, add in 58 controls, do save as 4. that 58 controller file, saved by poi, which excel can't read See new an attatched zip file.
Thanks. Not sure when someone will get to look at it, but at least all we need to do know is compare the 4 files, spot the differences, and hopefully spot what poi is doing wrong with all those controls (I suspect writing out a record that's too long or something like that)
I'm looking into the problem. 1. It's even worse in trunk. POI produces invalid xls from a workbook with the only checkbox control, i.e any number of checkboxes results in a invalid xls document. (In 3.0.1 it works fine, re-save of a workbook with 1 checkbox produces correct xls). My investigation revealed that the problem is in org.apache.poi.hssf.record.SubRecord. It was a patch by Daniel Noll with support for getting OLE objects from HSSFWorkbook ( bug 43222 ). After I commented out the following three lines the situation has improved. + case EmbeddedObjectRefSubRecord.sid: + r = new EmbeddedObjectRefSubRecord( in ); + break; Daniel, would you please look into it? We are going to release soon and if it is not fixed, I will have to revert your patch. 2. I confirmed that MSODRAWINGGROUP record gets corrupted if the number of controls is greater than 58. Since it works with smaller values most likely we have a continue record issue. I keep researching. Regards, Yegor
I think I know what's responsible for this. Just trying to figure out the correct behaviour.
Okay, I did some sanity fixes to the way that record class reads and writes, but it still doesn't open in Excel. In fact if I go into SubRecord and comment out those lines, that doesn't make it work either. So there may have been a problem in my code, but even when my code is omitted it still doesn't work.
Created attachment 21169 [details] Fix for EmbeddedObjectRefSubRecord Patch contains the following fixes made to EmbeddedObjectRefSubRecord: 1. serialize() wasn't writing the record's sid and length (I forgot that it was the subclass' responsibility... blame the API ;-)) 2. getRecordSize() was off by 2 as it wasn't taking into account the length of the stream ID offset field. 3. Storing the amount of padding instead of figuring it out from the offsets. Makes no difference but is somewhat more sane to look at. Still a hack because it should probably write all the stuff and then set the offset field once it knows how far in it is. In any case after this that file still doesn't work for me. But commenting out the EmbeddedObjectRefSubRecord check in SubRecord doesn't make it work either.
Daniel, 1. After I applied the fix for EmbeddedObjectRefSubRecord the test for 1 checkbox passes. That is if I create a xls with 1 checkbox ans re-save it by POI the xls is still valid. Patch applied, thanks. 2. The problem may also be related to EscherMetafileBlip. An attempt to view the binary structure of attached saved.xls using BiffViewer results in ArrayIndexOutOfBoundsException: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.poi.ddf.EscherMetafileBlip.fillFields(EscherMetafileBlip.java:87) at org.apache.poi.ddf.EscherBSERecord.fillFields(EscherBSERecord.java:94) at org.apache.poi.ddf.EscherContainerRecord.fillFields(EscherContainerRecord.java:56) at org.apache.poi.ddf.EscherContainerRecord.fillFields(EscherContainerRecord.java:56) at org.apache.poi.hssf.record.AbstractEscherHolderRecord.convertToEscherRecords(AbstractEscherHolderRecord.java:104) at org.apache.poi.hssf.record.AbstractEscherHolderRecord.fillFields(AbstractEscherHolderRecord.java:93) at org.apache.poi.hssf.record.Record.<init>(Record.java:53) at org.apache.poi.hssf.record.AbstractEscherHolderRecord.<init>(AbstractEscherHolderRecord.java:66) at org.apache.poi.hssf.record.DrawingGroupRecord.<init>(DrawingGroupRecord.java:42) at org.apache.poi.hssf.dev.BiffViewer.createRecord(BiffViewer.java:282) Although BiffViewer is happy with the original has58controls.xls. I can dump and see its structure I look at the code and it seems I see a bug: public int serialize( int offset, byte[] data, EscherSerializationListener listener ) { listener.beforeRecordSerialize(offset, getRecordId(), this); int pos = offset; LittleEndian.putShort( data, pos, getOptions() ); pos += 2; LittleEndian.putShort( data, pos, getRecordId() ); pos += 2; !bug LittleEndian.putInt( data, getRecordSize() - HEADER_SIZE ); pos += 4; it writes the record size at a wrong position. Should be LittleEndian.putInt( data, pos, getRecordSize() - HEADER_SIZE ); pos += 4; Please confirm. Yegor
Oops. I can't apply the patch because it causes org.apache.poi.hssf.usermodel.TestOLE2Embeding to fail. Definitely more work is needed. Yegor
Now that's interesting, it didn't affect our own OLE2 unit test at all. I don't suppose the one in POI was somehow reliant on the buggy behaviour? The one I sent through should have become the one which was used but perhaps it was adjusted somehow.
No, it wasn't adjusted. Current version is http://svn.apache.org/viewvc/poi/trunk/src/testcases/org/apache/poi/hssf/usermodel/TestOLE2Embeding.java?revision=573878&view=markup and the exception is java.io.FileNotFoundException: no such entry: "MBD00000000" at org.apache.poi.poifs.filesystem.DirectoryNode.getEntry(DirectoryNode.java:247) at org.apache.poi.hssf.usermodel.HSSFObjectData.getDirectory(HSSFObjectData.java:76) at org.apache.poi.hssf.usermodel.TestOLE2Embeding.testEmbeddedObjects(TestOLE2Embeding.java:52)
Weird, that looks exactly the same as the test I have which still works.
Are we checking against the same file? Please check http://svn.apache.org/viewvc/poi/trunk/src/testcases/org/apache/poi/hssf/data/ole2-embedding.xls?view=log
Diff says they're the same. My own unit test is (the filename is different): public void testEmbeddedObjects2003() throws Exception { File file = getDataFile("office/ole2-embedding-2003.xls"); HSSFWorkbook workbook = new HSSFWorkbook(new FileInputStream(file)); //noinspection unchecked List<HSSFObjectData> objects = workbook.getAllEmbeddedObjects(); assertEquals("Wrong number of objects", 2, objects.size()); assertEquals("Wrong name for first object", "MBD06CAB431", objects.get(0).getDirectory().getName()); assertEquals("Wrong name for second object", "MBD06CAC85A", objects.get(1).getDirectory().getName()); } The thing is, my patch didn't actually change the reading code anyway, and the unit test doesn't test the writing code (it probably should though...)
I found what's wrong. It wasn't trivial and took two brain-teasing days to figure it out :). In org.apache.poi.hssf.record.RecordFactory we have a place where continuation for drawing records is handled: else if (record.getSid() == ContinueRecord.sid && ((lastRecord instanceof ObjRecord) || (lastRecord instanceof TextObjectRecord))) { // Drawing records have a very strange continue behaviour. //There can actually be OBJ records mixed between the continues. lastDrawingRecord.processContinueRecord( ((ContinueRecord)record).getData() ); The problem is that this case is not handled when writing out the records. That is if we have an OBJ record mixed with continues we read them properly. BUT when we serialize the drawing record this structure is lost and it results in corrupted xls. It is reproducible with any xls file having a continue record after OBJ. I reproduced it with 35565.xls used by org.apache.poi.hssf.usermodel.TestBugs.test35565(). After re-save it becomes unreadable. The fix is coming soon. Yegor
Finally fixed. Yegor
*** Bug 39512 has been marked as a duplicate of this bug. ***