Bug 42564

Summary: org.apache.poi.hssf.record.RecordFormatException: Unable to construct record instance
Product: POI Reporter: Marcus Breier <m.breier>
Component: HSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal CC: dave.webster, Peter.krum, sara_uppalapati
Priority: P2    
Version: 3.0-FINAL   
Target Milestone: ---   
Hardware: Macintosh   
OS: Mac OS X 10.4   
Attachments: File with problem on opening
empty Excel spreadsheet which shows Exception when opening

Description Marcus Breier 2007-06-01 07:17:37 UTC
Hi there ...

recently I got the Exception printed below while reading an Excel file that has
been exported by MS Access to "Excel 97-2002 & 5.0/95 Workbook" format. 

DEBUG: type = 5; currentLength = 8; recordOffset = 8

org.apache.poi.hssf.record.RecordFormatException: Unable to construct record
instance
	at org.apache.poi.hssf.record.RecordFactory.createRecord(RecordFactory.java:191)
	at org.apache.poi.hssf.record.RecordFactory.createRecords(RecordFactory.java:115)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:205)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:153)
	... 50 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
	at org.apache.poi.hssf.record.RecordFactory.createRecord(RecordFactory.java:179)
	... 54 more
Caused by: java.lang.ArrayIndexOutOfBoundsException
	at
org.apache.poi.hssf.record.RecordInputStream.checkRecordPosition(RecordInputStream.java:132)
	at org.apache.poi.hssf.record.RecordInputStream.readInt(RecordInputStream.java:155)
	at org.apache.poi.hssf.record.BOFRecord.fillFields(BOFRecord.java:121)
	at org.apache.poi.hssf.record.Record.<init>(Record.java:56)
	at org.apache.poi.hssf.record.BOFRecord.<init>(BOFRecord.java:99)
	... 59 more

When the exported Excel file has been reopened and saved with Excel, everything
is ok. 

DEBUG: type = 5; currentLength = 16; recordOffset = 8; sid = 2057;
ContinueRecord.sid = 60
DEBUG: type = 16; currentLength = 16; recordOffset = 8; sid = 2057;
ContinueRecord.sid = 60


To get a clue why these exceptions occur I put some debug code into
BOFRecord.fillFields(RecordInputStream in) to produce the DEBUG output stated above:

...
protected void fillFields(RecordInputStream in)
{

    field_1_version  = in.readShort();
    field_2_type     = in.readShort();
    field_3_build    = in.readShort();
    field_4_year     = in.readShort();

    System.out.println("DEBUG: type = "+getType()+"; "+
    						  "currentLength = "+in.getLength()+"; "+
    						  "recordOffset = "+in.getRecordOffset()+"; "+
    						  "sid = "+in.getSid()+"; "+
    						  "ContinueRecord.sid = "+ContinueRecord.sid);

   	field_5_history  = in.readInt();
   	field_6_rversion = in.readInt();
}
...

Since I have not a clue what's up with this SID stuff, so what can I do to fix
this problem?

Regards
Marcus
Comment 1 mshoji 2007-06-15 08:11:31 UTC
Are there some 'named range' definitions?
If so, try removing the all name definitions and that workbook might be read. 
related bug maybe #41546. 
Comment 2 Nick Burch 2008-01-09 05:43:13 UTC
Could you please upload a problem file? That way we'll have something to test
against
Comment 3 Wai Wong 2008-01-28 02:28:44 UTC
Created attachment 21435 [details]
File with problem on opening

I am new to this list, but I have exactly the same problem.  I checked and
found Name Range in the file, but I have no way to prevent Access in generating
this.  Moreover, the Name Range is working fine if I open and save the file.
Comment 4 Nick Burch 2008-02-10 13:54:39 UTC
I've just tested svn trunk with the supplied problem file. I'm able to open the
file with HSSFWorkbook, serialise it to bytes, and open that again, all without
issue. (see src/testcases/org/apache/poi/hssf/usermodel/TestBugs.java#test42564)

So, I believe this bug is fixed in 3.0.2, but it certainly appears to be fixed
in svn trunk. If you are still having problems, please re-open the bug, and
attach a file that doesn't work with svn trunk.
Comment 5 David Webster 2008-02-11 00:13:43 UTC
Created attachment 21503 [details]
empty Excel spreadsheet which shows Exception when opening
Comment 6 David Webster 2008-02-11 00:30:23 UTC
Hello All, 

The problem:
I've just attached an 'empty' spreadsheet which throws an Exception when opening
(doesntwork.xls).

The stack trace is slightly different to the one originally posted here, but it
looks to be the same cause.

The Exception came from using versions: 3.0.2-FINAL-20080204 and the latest SVN
trunk build from Feb. 7th, 2008 (appologies, I am having SVN firewall probs.
here at work and can't get a later version at the moment).

Expected behaviour:
Well, that it opens and process correctly.

To replicate:
The test program I used to open the file was XLS2CSVmra.java found under:
/src/scratchpad/examples/src/org/apache/poi/hssf/eventusermodel/examples

I compiled and ran it under JDK 1.4.2.

Notes:
The Excel file was originally created by a client.  I removed : all data, all
named ranges, hidden worksheets, extra worksheets and macros, to leave the empty
spreadsheet attached.

Appologies if this bug has been resolved in a POI version post Feb. 7th; I won't
have subversion access to test the trunk until mid-week.

Many thanks as always! 

Dave

Stack trace:
[:/cygdrive/c/work/ExcelParser/42564/513] > javac XLS2CSVmra.java 
[:/cygdrive/c/work/ExcelParser/42564/514] > java -version
java version "1.4.2_03"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)
[:/cygdrive/c/work/ExcelParser/42564/515] > java XLS2CSVmra ../doesntwork.xls 
org.apache.poi.hssf.record.RecordFormatException: Unable to construct record
instance
        at
org.apache.poi.hssf.record.RecordFactory.createRecord(RecordFactory.java:199)
        at
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:172)
        at
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:98)
        at
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processWorkbookEvents(HSSFEventFactory.java:63)
        at XLS2CSVmra.process(XLS2CSVmra.java:111)
        at XLS2CSVmra.main(XLS2CSVmra.java:324)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
        at
org.apache.poi.hssf.record.RecordFactory.createRecord(RecordFactory.java:187)
        ... 5 more
Caused by: org.apache.poi.hssf.record.RecordFormatException: Unknown grbit '16'
        at
org.apache.poi.hssf.record.formula.ArrayPtg.readTokenValues(ArrayPtg.java:99)
        at
org.apache.poi.hssf.record.formula.Ptg.createParsedExpressionTokens(Ptg.java:110)
        at org.apache.poi.hssf.record.NameRecord.fillFields(NameRecord.java:778)
        at org.apache.poi.hssf.record.Record.<init>(Record.java:55)
        at org.apache.poi.hssf.record.NameRecord.<init>(NameRecord.java:139)
        ... 10 more
Exception in thread "main"  


Comment 7 Nick Burch 2008-02-11 09:56:38 UTC
OK, I can replicate the problem with the new file, and I've added a (disabled)
test to TestBugs

Microsoft are supposed to be releasing some new documentation on the file format
in a week or so, so I'll wait for that then see if it explains this new Grbit
with value 16
Comment 8 David Webster 2008-02-25 05:53:15 UTC
*** Bug 44483 has been marked as a duplicate of this bug. ***
Comment 9 Nick Burch 2008-03-03 09:49:33 UTC
(In reply to comment #7)
> Microsoft are supposed to be releasing some new documentation on the file format
> in a week or so, so I'll wait for that then see if it explains this new Grbit
> with value 16

I've read through the docs, and the files are apparently only allowed a grbit value of x01 or 0x02, anything else isn't allowed

Oddly, we're hitting the issue on the first token value, so there's something pretty messed up from the start (it's not even like we get confused half way through)

I'm not sure how to proceed, as the record has data in that the docs say it shouldn't have. One option might be to just issue a warning and skip the rest of the ptg, but I'm not sure what else that might ripple on and break
Comment 10 David Webster 2008-03-17 01:18:51 UTC
Hi Nick, 

Thanks for looking into it.  Yep, awhile back I tried what you suggested (skipping record with warning) ... the ripple was that one of the record(s) skipped contained array info. re: worksheet names/indexes, so things were a bit snafu-ed when trying to index formulas on other worksheets as POI had the completely array size.

The spreadsheet in question comes from a client who has used many differing versions of Excel to open/add/save info (i.e. the spreadsheet gets circulated to different depts. who annotate it).  I think extra 'junk' gets saved and transmitted on when this happens - in particular, the dubious record.

My solution has been to open the spreadsheet and save it in OpenOffice.  That seems to work, touch wood.

Will investigate technically again when I have more time.

Cheers
Comment 11 Nick Burch 2008-03-17 08:08:19 UTC
For now, I'll mark this WONTFIX, as the file does seem to have become corrupt, and no longer matches the docs. Since you've tried skipping the record and that doesn't help, I can't see what else we can do for your broken file

You can probably script openoffice to do the open/save for you, so that's what I'd suggest as a workaround
Comment 12 Nick Burch 2008-03-27 05:37:12 UTC
*** Bug 42256 has been marked as a duplicate of this bug. ***
Comment 13 Peter J. Krum 2008-05-02 07:05:34 UTC
What is the Grbit responsible for and is there anyway to fix it with out opening it in open office? That seems to cause other issues for me.  Can i open the file in a hex editor and fix it?  
Comment 14 Josh Micich 2008-05-02 16:25:35 UTC
(In reply to comment #13)
> What is the Grbit responsible for and is there anyway to fix it with out
> opening it in open office? That seems to cause other issues for me.  Can i open
> the file in a hex editor and fix it?  
> 

For constant values, the grbit field specifies the type.  16=>Error code.

I tripped over grbit==16 while fixing bug 44792. Somehow I found value 10h described on pages 64-65 of Excel97-2007BinaryFileFormat(xls)Specification.pdf (note - ptgArray is described on pages 304-305 but the grbit values there are deficient).

A simple solution to this bug might be for ArrayPtg.readTokenValues(RecordInputStream) to call ConstantValueParser.parse(RecordInputStream, int).

Comment 15 Josh Micich 2008-05-05 19:09:39 UTC
Fixed in svn r653668.

There were more problems past the grbit=16 issue.  Quite a few fixes required in ArrayPtg.  Some new junits for ArrayPtg added.  Nick's failing junit TestBugs.test42564Alt() enabled.
Comment 16 Peter J. Krum 2008-05-06 08:01:33 UTC
Thanks guys!  This is much appreciated!

Do you know when the next build will be?

Comment 17 Nick Burch 2008-05-06 08:07:16 UTC
(In reply to comment #16)
> Do you know when the next build will be?
 
Nightly builds are available from
http://encore.torchbox.com/poi-svn-build/
Comment 18 Peter J. Krum 2008-05-06 08:14:17 UTC
Thanks!
Comment 19 Josh Micich 2008-05-26 11:09:04 UTC
A small bug was introduced in r653668.  Excel internally stores array elements column by column.  The only symptom of this bug was that HSSFCell.getCellFormula() would render 2D arrays elements out of order.

Fix was applied in svn r660256 and should be available in version 3.1-final


Comment 20 dkulinich 2014-08-26 14:53:04 UTC
same shit with DeltaRecord on the some (still not clear)conditions from generated Excels.Overriding of
    public DeltaRecord(RecordInputStream in) {
        field_1_max_change = in.readDouble();
        in.readDouble();
    }
helps but brokes regular/normal files
so also needs some condition like
} else if (in.available() == 8 && length == 16) {
Comment 21 Nick Burch 2014-08-26 14:59:42 UTC
(In reply to dkulinich from comment #20)
> same shit with DeltaRecord on the some (still not clear)conditions from
> generated Excels.

Can you please share a small file that shows this version of the problem? Also, given that you're trigger it on a different record, you'd probably be best off opening a brand new bug and attaching the problematic file there
Comment 22 dkulinich 2014-08-26 16:46:34 UTC
give me email where to send file - its shouldn't be seeen in SE
also after this fix 
there is another problem:

(hssf) RecordFormatException: Duplicate PageSettingsBlock record (sid=0x83)

means 2 HCenterRecord, expected 1


[HCENTER]
    .hcenter        = false
[/HCENTER]
[HCENTER]
    .hcenter        = false
[/HCENTER]

Ofcourse I can rewrite PageSettingsBlock in order to fix this stupidly generated excel model parsing. 
My question is why Excel shell is tolerable to minor errors and POI is not?
Comment 23 dkulinich 2014-08-26 17:00:47 UTC
so, my another suggestion is apply one more small fix
to PageSettingsBlock

change 
private void checkNotPresent(Record rec) {...}
to 
private boolean checkNotPresent(Record rec) {...}
and ignore duplicates like HCenterRecord (all that uses for printing only and not critical)