Bug 43877 - Cant open the saved XLS has 58 controls
Summary: Cant open the saved XLS has 58 controls
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.0-FINAL
Hardware: PC Windows 2000
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
: 39512 (view as bug list)
Depends on:
Blocks:
 
Reported: 2007-11-15 16:51 UTC by Tadashi ISHIKAWA
Modified: 2007-11-23 04:57 UTC (History)
2 users (show)



Attachments
xls has 58 controls (90.50 KB, application/vnd.ms-excel)
2007-11-15 16:53 UTC, Tadashi ISHIKAWA
Details
saved xls (90.50 KB, application/vnd.ms-excel)
2007-11-15 16:54 UTC, Tadashi ISHIKAWA
Details
4 related files of the sequence (182.74 KB, application/x-zip-compressed)
2007-11-19 16:50 UTC, Tadashi ISHIKAWA
Details
Fix for EmbeddedObjectRefSubRecord (2.62 KB, patch)
2007-11-20 17:59 UTC, Trejkaz (pen name)
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tadashi ISHIKAWA 2007-11-15 16:51:34 UTC
1. Create a XLS with Excel 2000 and add 58 checkbox controls 
2. Load the file with HSSFWorkbook(), and save with HSSFWorkbook#write()
3. Open the saved file with Excel, appears an alert "Uable to read file" 

This problem is seems same as:
http://issues.apache.org/bugzilla/show_bug.cgi?id=39512
Comment 1 Tadashi ISHIKAWA 2007-11-15 16:53:24 UTC
Created attachment 21134 [details]
xls has 58 controls
Comment 2 Tadashi ISHIKAWA 2007-11-15 16:54:26 UTC
Created attachment 21135 [details]
saved xls
Comment 3 Nick Burch 2007-11-16 07:28:11 UTC
Can you confirm if 58 controls is the point where the problem kicks in?

(i.e. does a sheet with 57 controls save and re-open just fine, or is it some
smaller number where that's the case?)
Comment 4 Tadashi ISHIKAWA 2007-11-18 16:56:24 UTC
Yes, I confirmed that no problem in some case about 30 to 57 controls.

But another confirmer in a japanese communitiy said:
Less than 59 controls: fine
59 controls: problem
(however no evidence)
Comment 5 Nick Burch 2007-11-19 04:17:55 UTC
I think to be able to debug this, we'll want 4 very closely related files:
* file generated in excel, with 57 controls
* same file, saved by poi, which excel can read in again
* re-open excel file, add in 58 controls, do save as
* that 58 controller file, saved by poi, which excel can't read

They ought to be created something like that, so that pretty much the only
differences will be the number of controls. That will help us narrow down the
differences
Comment 6 Tadashi ISHIKAWA 2007-11-19 16:50:25 UTC
Created attachment 21156 [details]
4 related files of the sequence

Well, I've created 4 related files of the sequence that you said:

1. file generated in excel, with 57 controls
2. same file, saved by poi, which excel can read in again
3. re-open excel file, add in 58 controls, do save as
4. that 58 controller file, saved by poi, which excel can't read

See new an attatched zip file.
Comment 7 Nick Burch 2007-11-20 04:27:10 UTC
Thanks. Not sure when someone will get to look at it, but at least all we need
to do know is compare the 4 files, spot the differences, and hopefully spot what
poi is doing wrong with all those controls (I suspect writing out a record
that's too long or something like that)
Comment 8 Yegor Kozlov 2007-11-20 10:19:58 UTC
I'm looking into the problem.

1. It's even worse in trunk. POI produces invalid xls from a workbook with the
only checkbox control, i.e any number of checkboxes results in a invalid xls
document. (In 3.0.1 it works fine, re-save of a workbook with 1 checkbox
produces correct xls). My investigation revealed that the problem is in
org.apache.poi.hssf.record.SubRecord.
It was a patch by Daniel Noll with support for getting OLE objects from
HSSFWorkbook ( bug 43222 ). 

After I commented out the following three lines the situation has improved. 

+            case EmbeddedObjectRefSubRecord.sid:
+                r = new EmbeddedObjectRefSubRecord( in );
+                break;

Daniel, would you please look into it? We are going to release soon and if it is
not fixed, I will have to revert your patch.

2. I confirmed that MSODRAWINGGROUP record gets corrupted if the number of
controls is greater than 58.
Since it works with smaller values most likely we have a continue
record issue. I keep researching.

Regards,
Yegor
Comment 9 Trejkaz (pen name) 2007-11-20 17:01:46 UTC
I think I know what's responsible for this.  Just trying to figure out the
correct behaviour.
Comment 10 Trejkaz (pen name) 2007-11-20 17:35:05 UTC
Okay, I did some sanity fixes to the way that record class reads and writes, but
it still doesn't open in Excel.

In fact if I go into SubRecord and comment out those lines, that doesn't make it
work either.  So there may have been a problem in my code, but even when my code
is omitted it still doesn't work.
Comment 11 Trejkaz (pen name) 2007-11-20 17:59:23 UTC
Created attachment 21169 [details]
Fix for EmbeddedObjectRefSubRecord

Patch contains the following fixes made to EmbeddedObjectRefSubRecord:

1. serialize() wasn't writing the record's sid and length (I forgot that it
   was the subclass' responsibility... blame the API ;-))

2. getRecordSize() was off by 2 as it wasn't taking into account the length of
   the stream ID offset field.

3. Storing the amount of padding instead of figuring it out from the offsets.
   Makes no difference but is somewhat more sane to look at.  Still a hack
   because it should probably write all the stuff and then set the offset field

   once it knows how far in it is.

In any case after this that file still doesn't work for me.  But commenting out
the EmbeddedObjectRefSubRecord check in SubRecord doesn't make it work either.
Comment 12 Yegor Kozlov 2007-11-21 02:44:10 UTC
Daniel,

1. After I applied the fix for EmbeddedObjectRefSubRecord the test for 1
checkbox passes. That is if I create 
a xls with 1 checkbox ans re-save it by POI the xls is still valid.
Patch applied, thanks.

2. The problem may also be related to EscherMetafileBlip.
An attempt to view the binary structure of attached saved.xls using BiffViewer
results in ArrayIndexOutOfBoundsException:

java.lang.ArrayIndexOutOfBoundsException
        at java.lang.System.arraycopy(Native Method)
        at
org.apache.poi.ddf.EscherMetafileBlip.fillFields(EscherMetafileBlip.java:87)
        at org.apache.poi.ddf.EscherBSERecord.fillFields(EscherBSERecord.java:94)
        at
org.apache.poi.ddf.EscherContainerRecord.fillFields(EscherContainerRecord.java:56)
        at
org.apache.poi.ddf.EscherContainerRecord.fillFields(EscherContainerRecord.java:56)
        at
org.apache.poi.hssf.record.AbstractEscherHolderRecord.convertToEscherRecords(AbstractEscherHolderRecord.java:104)
        at
org.apache.poi.hssf.record.AbstractEscherHolderRecord.fillFields(AbstractEscherHolderRecord.java:93)
        at org.apache.poi.hssf.record.Record.<init>(Record.java:53)
        at
org.apache.poi.hssf.record.AbstractEscherHolderRecord.<init>(AbstractEscherHolderRecord.java:66)
        at
org.apache.poi.hssf.record.DrawingGroupRecord.<init>(DrawingGroupRecord.java:42)
        at org.apache.poi.hssf.dev.BiffViewer.createRecord(BiffViewer.java:282)
        

Although BiffViewer is happy with the original has58controls.xls. I can dump and
see its structure


I look at the code and it seems I see a bug:

    public int serialize( int offset, byte[] data, EscherSerializationListener
listener )
    {
        listener.beforeRecordSerialize(offset, getRecordId(), this);

        int pos = offset;
        LittleEndian.putShort( data, pos, getOptions() ); pos += 2;
        LittleEndian.putShort( data, pos, getRecordId() ); pos += 2;
!bug    LittleEndian.putInt( data, getRecordSize() - HEADER_SIZE ); pos += 4;

it writes the record size at a wrong position. Should be
       LittleEndian.putInt( data, pos, getRecordSize() - HEADER_SIZE ); pos += 4;

Please confirm.


Yegor
Comment 13 Yegor Kozlov 2007-11-21 02:48:53 UTC
Oops. I can't apply the patch because it causes 
org.apache.poi.hssf.usermodel.TestOLE2Embeding to fail. Definitely more work is
needed.

Yegor 
Comment 14 Trejkaz (pen name) 2007-11-21 05:55:02 UTC
Now that's interesting, it didn't affect our own OLE2 unit test at all.  I don't suppose the one in POI was 
somehow reliant on the buggy behaviour?  The one I sent through should have become the one which was 
used but perhaps it was adjusted somehow.
Comment 15 Yegor Kozlov 2007-11-21 06:04:03 UTC
No, it wasn't adjusted.
Current version is 
http://svn.apache.org/viewvc/poi/trunk/src/testcases/org/apache/poi/hssf/usermodel/TestOLE2Embeding.java?revision=573878&view=markup
and the exception is
java.io.FileNotFoundException: no such entry: "MBD00000000"
	at org.apache.poi.poifs.filesystem.DirectoryNode.getEntry(DirectoryNode.java:247)
	at
org.apache.poi.hssf.usermodel.HSSFObjectData.getDirectory(HSSFObjectData.java:76)
	at
org.apache.poi.hssf.usermodel.TestOLE2Embeding.testEmbeddedObjects(TestOLE2Embeding.java:52)
Comment 16 Trejkaz (pen name) 2007-11-21 06:43:29 UTC
Weird, that looks exactly the same as the test I have which still works.

Comment 17 Yegor Kozlov 2007-11-21 07:36:18 UTC
Are we checking against the same file? 

Please check
http://svn.apache.org/viewvc/poi/trunk/src/testcases/org/apache/poi/hssf/data/ole2-embedding.xls?view=log
Comment 18 Trejkaz (pen name) 2007-11-21 13:40:26 UTC
Diff says they're the same.  My own unit test is (the filename is different):

    public void testEmbeddedObjects2003() throws Exception {
        File file = getDataFile("office/ole2-embedding-2003.xls");
        HSSFWorkbook workbook = new HSSFWorkbook(new FileInputStream(file));
        //noinspection unchecked
        List<HSSFObjectData> objects = workbook.getAllEmbeddedObjects();
        assertEquals("Wrong number of objects", 2, objects.size());
        assertEquals("Wrong name for first object", "MBD06CAB431",
                     objects.get(0).getDirectory().getName());
        assertEquals("Wrong name for second object", "MBD06CAC85A",
                     objects.get(1).getDirectory().getName());
    }

The thing is, my patch didn't actually change the reading code anyway, and the
unit test doesn't test the writing code (it probably should though...)
Comment 19 Yegor Kozlov 2007-11-22 08:34:22 UTC
I found what's wrong. It wasn't trivial and took two brain-teasing days to
figure it out :).

In org.apache.poi.hssf.record.RecordFactory we have a place where continuation
for drawing records is handled:


                        else if (record.getSid() == ContinueRecord.sid &&
                                 ((lastRecord instanceof ObjRecord) ||
(lastRecord instanceof TextObjectRecord))) {
                          // Drawing records have a very strange continue behaviour.
                          //There can actually be OBJ records mixed between the
continues.
                          lastDrawingRecord.processContinueRecord(
((ContinueRecord)record).getData() );

The problem is that this case is not handled when writing out the records. That
is if we have an OBJ record mixed with continues we read them properly.
BUT when we serialize the drawing record this structure  is lost and it results
in corrupted xls. It is reproducible with any xls file having a
continue record after OBJ. I reproduced it with 35565.xls used by
org.apache.poi.hssf.usermodel.TestBugs.test35565(). After re-save it becomes
unreadable.

The fix is coming soon.

Yegor
Comment 20 Yegor Kozlov 2007-11-23 04:56:30 UTC
Finally fixed. 

Yegor
Comment 21 Yegor Kozlov 2007-11-23 04:57:03 UTC
*** Bug 39512 has been marked as a duplicate of this bug. ***