Hi there ... recently I got the Exception printed below while reading an Excel file that has been exported by MS Access to "Excel 97-2002 & 5.0/95 Workbook" format. DEBUG: type = 5; currentLength = 8; recordOffset = 8 org.apache.poi.hssf.record.RecordFormatException: Unable to construct record instance at org.apache.poi.hssf.record.RecordFactory.createRecord(RecordFactory.java:191) at org.apache.poi.hssf.record.RecordFactory.createRecords(RecordFactory.java:115) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:205) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:153) ... 50 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:274) at org.apache.poi.hssf.record.RecordFactory.createRecord(RecordFactory.java:179) ... 54 more Caused by: java.lang.ArrayIndexOutOfBoundsException at org.apache.poi.hssf.record.RecordInputStream.checkRecordPosition(RecordInputStream.java:132) at org.apache.poi.hssf.record.RecordInputStream.readInt(RecordInputStream.java:155) at org.apache.poi.hssf.record.BOFRecord.fillFields(BOFRecord.java:121) at org.apache.poi.hssf.record.Record.<init>(Record.java:56) at org.apache.poi.hssf.record.BOFRecord.<init>(BOFRecord.java:99) ... 59 more When the exported Excel file has been reopened and saved with Excel, everything is ok. DEBUG: type = 5; currentLength = 16; recordOffset = 8; sid = 2057; ContinueRecord.sid = 60 DEBUG: type = 16; currentLength = 16; recordOffset = 8; sid = 2057; ContinueRecord.sid = 60 To get a clue why these exceptions occur I put some debug code into BOFRecord.fillFields(RecordInputStream in) to produce the DEBUG output stated above: ... protected void fillFields(RecordInputStream in) { field_1_version = in.readShort(); field_2_type = in.readShort(); field_3_build = in.readShort(); field_4_year = in.readShort(); System.out.println("DEBUG: type = "+getType()+"; "+ "currentLength = "+in.getLength()+"; "+ "recordOffset = "+in.getRecordOffset()+"; "+ "sid = "+in.getSid()+"; "+ "ContinueRecord.sid = "+ContinueRecord.sid); field_5_history = in.readInt(); field_6_rversion = in.readInt(); } ... Since I have not a clue what's up with this SID stuff, so what can I do to fix this problem? Regards Marcus
Are there some 'named range' definitions? If so, try removing the all name definitions and that workbook might be read. related bug maybe #41546.
Could you please upload a problem file? That way we'll have something to test against
Created attachment 21435 [details] File with problem on opening I am new to this list, but I have exactly the same problem. I checked and found Name Range in the file, but I have no way to prevent Access in generating this. Moreover, the Name Range is working fine if I open and save the file.
I've just tested svn trunk with the supplied problem file. I'm able to open the file with HSSFWorkbook, serialise it to bytes, and open that again, all without issue. (see src/testcases/org/apache/poi/hssf/usermodel/TestBugs.java#test42564) So, I believe this bug is fixed in 3.0.2, but it certainly appears to be fixed in svn trunk. If you are still having problems, please re-open the bug, and attach a file that doesn't work with svn trunk.
Created attachment 21503 [details] empty Excel spreadsheet which shows Exception when opening
Hello All, The problem: I've just attached an 'empty' spreadsheet which throws an Exception when opening (doesntwork.xls). The stack trace is slightly different to the one originally posted here, but it looks to be the same cause. The Exception came from using versions: 3.0.2-FINAL-20080204 and the latest SVN trunk build from Feb. 7th, 2008 (appologies, I am having SVN firewall probs. here at work and can't get a later version at the moment). Expected behaviour: Well, that it opens and process correctly. To replicate: The test program I used to open the file was XLS2CSVmra.java found under: /src/scratchpad/examples/src/org/apache/poi/hssf/eventusermodel/examples I compiled and ran it under JDK 1.4.2. Notes: The Excel file was originally created by a client. I removed : all data, all named ranges, hidden worksheets, extra worksheets and macros, to leave the empty spreadsheet attached. Appologies if this bug has been resolved in a POI version post Feb. 7th; I won't have subversion access to test the trunk until mid-week. Many thanks as always! Dave Stack trace: [:/cygdrive/c/work/ExcelParser/42564/513] > javac XLS2CSVmra.java [:/cygdrive/c/work/ExcelParser/42564/514] > java -version java version "1.4.2_03" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02) Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode) [:/cygdrive/c/work/ExcelParser/42564/515] > java XLS2CSVmra ../doesntwork.xls org.apache.poi.hssf.record.RecordFormatException: Unable to construct record instance at org.apache.poi.hssf.record.RecordFactory.createRecord(RecordFactory.java:199) at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:172) at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:98) at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processWorkbookEvents(HSSFEventFactory.java:63) at XLS2CSVmra.process(XLS2CSVmra.java:111) at XLS2CSVmra.main(XLS2CSVmra.java:324) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:274) at org.apache.poi.hssf.record.RecordFactory.createRecord(RecordFactory.java:187) ... 5 more Caused by: org.apache.poi.hssf.record.RecordFormatException: Unknown grbit '16' at org.apache.poi.hssf.record.formula.ArrayPtg.readTokenValues(ArrayPtg.java:99) at org.apache.poi.hssf.record.formula.Ptg.createParsedExpressionTokens(Ptg.java:110) at org.apache.poi.hssf.record.NameRecord.fillFields(NameRecord.java:778) at org.apache.poi.hssf.record.Record.<init>(Record.java:55) at org.apache.poi.hssf.record.NameRecord.<init>(NameRecord.java:139) ... 10 more Exception in thread "main"
OK, I can replicate the problem with the new file, and I've added a (disabled) test to TestBugs Microsoft are supposed to be releasing some new documentation on the file format in a week or so, so I'll wait for that then see if it explains this new Grbit with value 16
*** Bug 44483 has been marked as a duplicate of this bug. ***
(In reply to comment #7) > Microsoft are supposed to be releasing some new documentation on the file format > in a week or so, so I'll wait for that then see if it explains this new Grbit > with value 16 I've read through the docs, and the files are apparently only allowed a grbit value of x01 or 0x02, anything else isn't allowed Oddly, we're hitting the issue on the first token value, so there's something pretty messed up from the start (it's not even like we get confused half way through) I'm not sure how to proceed, as the record has data in that the docs say it shouldn't have. One option might be to just issue a warning and skip the rest of the ptg, but I'm not sure what else that might ripple on and break
Hi Nick, Thanks for looking into it. Yep, awhile back I tried what you suggested (skipping record with warning) ... the ripple was that one of the record(s) skipped contained array info. re: worksheet names/indexes, so things were a bit snafu-ed when trying to index formulas on other worksheets as POI had the completely array size. The spreadsheet in question comes from a client who has used many differing versions of Excel to open/add/save info (i.e. the spreadsheet gets circulated to different depts. who annotate it). I think extra 'junk' gets saved and transmitted on when this happens - in particular, the dubious record. My solution has been to open the spreadsheet and save it in OpenOffice. That seems to work, touch wood. Will investigate technically again when I have more time. Cheers
For now, I'll mark this WONTFIX, as the file does seem to have become corrupt, and no longer matches the docs. Since you've tried skipping the record and that doesn't help, I can't see what else we can do for your broken file You can probably script openoffice to do the open/save for you, so that's what I'd suggest as a workaround
*** Bug 42256 has been marked as a duplicate of this bug. ***
What is the Grbit responsible for and is there anyway to fix it with out opening it in open office? That seems to cause other issues for me. Can i open the file in a hex editor and fix it?
(In reply to comment #13) > What is the Grbit responsible for and is there anyway to fix it with out > opening it in open office? That seems to cause other issues for me. Can i open > the file in a hex editor and fix it? > For constant values, the grbit field specifies the type. 16=>Error code. I tripped over grbit==16 while fixing bug 44792. Somehow I found value 10h described on pages 64-65 of Excel97-2007BinaryFileFormat(xls)Specification.pdf (note - ptgArray is described on pages 304-305 but the grbit values there are deficient). A simple solution to this bug might be for ArrayPtg.readTokenValues(RecordInputStream) to call ConstantValueParser.parse(RecordInputStream, int).
Fixed in svn r653668. There were more problems past the grbit=16 issue. Quite a few fixes required in ArrayPtg. Some new junits for ArrayPtg added. Nick's failing junit TestBugs.test42564Alt() enabled.
Thanks guys! This is much appreciated! Do you know when the next build will be?
(In reply to comment #16) > Do you know when the next build will be? Nightly builds are available from http://encore.torchbox.com/poi-svn-build/
Thanks!
A small bug was introduced in r653668. Excel internally stores array elements column by column. The only symptom of this bug was that HSSFCell.getCellFormula() would render 2D arrays elements out of order. Fix was applied in svn r660256 and should be available in version 3.1-final
same shit with DeltaRecord on the some (still not clear)conditions from generated Excels.Overriding of public DeltaRecord(RecordInputStream in) { field_1_max_change = in.readDouble(); in.readDouble(); } helps but brokes regular/normal files so also needs some condition like } else if (in.available() == 8 && length == 16) {
(In reply to dkulinich from comment #20) > same shit with DeltaRecord on the some (still not clear)conditions from > generated Excels. Can you please share a small file that shows this version of the problem? Also, given that you're trigger it on a different record, you'd probably be best off opening a brand new bug and attaching the problematic file there
give me email where to send file - its shouldn't be seeen in SE also after this fix there is another problem: (hssf) RecordFormatException: Duplicate PageSettingsBlock record (sid=0x83) means 2 HCenterRecord, expected 1 [HCENTER] .hcenter = false [/HCENTER] [HCENTER] .hcenter = false [/HCENTER] Ofcourse I can rewrite PageSettingsBlock in order to fix this stupidly generated excel model parsing. My question is why Excel shell is tolerable to minor errors and POI is not?
so, my another suggestion is apply one more small fix to PageSettingsBlock change private void checkNotPresent(Record rec) {...} to private boolean checkNotPresent(Record rec) {...} and ignore duplicates like HCenterRecord (all that uses for printing only and not critical)