I get the following issue with the following record when parsing with ExcelExtractor: org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException: Initialisation of record 0x850 left 3060 bytes remaining still to be read. at org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:124) at org.apache.poi.hssf.record.RecordFactory.createRecords(RecordFactory.java:402) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:277) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:202) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:184) BiffViewer gives me the following error when dealing with the spreadsheet in question: [java] Offset=0x000359F4(219636) recno=9202 sid=0x08C9 size=0x0018(24) [java] [UNKNOWNRECORD] (0x8C9) [java] rawData=[C9, 08, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 01, 00, 00, 00, 00, 00] [java] [/UNKNOWNRECORD] [java] [java] org.apache.poi.hssf.record.RecordFormatException: Not enough data (0) to read requested (6) bytes [java] at org.apache.poi.hssf.record.RecordInputStream.checkRecordPosition(RecordInputStream.java:185) [java] at org.apache.poi.hssf.record.RecordInputStream.readFully(RecordInputStream.java:250) [java] at org.apache.poi.hssf.record.RecordInputStream.readFully(RecordInputStream.java:246) [java] at org.apache.poi.hssf.record.chart.ChartEndObjectRecord.<init>(ChartEndObjectRecord.java:44) [java] at org.apache.poi.hssf.dev.BiffViewer.createRecord(BiffViewer.java:248) [java] at org.apache.poi.hssf.dev.BiffViewer.createRecords(BiffViewer.java:84) [java] at org.apache.poi.hssf.dev.BiffViewer.main(BiffViewer.java:398) [java] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [java] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [java] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:2
These are problems with the recently added chart records. I am not sure why you get a different error from BiffViewer (because it seems to interpret the same set of records as RecordFactory). Both errors look legitimate, and this suggests a third problem as to why BiffViewer behaves differently. As far as ChartEndObjectRecord is concerned, it looks like some apps don't write the 'unused' field. It would be interesting to know whether Excel re-adds it. I can't see any obvious problem with ChartFRTInfoRecord, so we'll probably need some sample data. Could you either upload the spreadsheet in question, or give the hex dump of these two offending records? You can get the dump by changing the constructors of ChartFRTInfoRecord and ChartEndObjectRecord(: // change the parameter declaration from 'in' to 'inOrig' // add this code at the top of the method byte[] data = inOrig.readRemainder(); LittleEndianInput in = new LittleEndianByteArrayInputStream(data); // add this code at the bottom if (in.available()>0) { System.err.println("leftover data reading " + getClass().getName()); System.err.println(HexDump.toHex(data, 16)); } // also change param type to CFRTID constructor ( like svn r777660 )
Created attachment 23906 [details] A .xls that causes the exception A .xls that causes the stated exception. It contains a worksheet with some numbers and a chart to graph them. It was created with Microsoft Excel 2004 for Mac Version 11.5.5 (090512).
I have attached a .xls that causes the stated exception. It contains a worksheet with some numbers and a chart to graph them. It was created with Microsoft Excel 2004 for Mac Version 11.5.5 (090512).
Still reproducible with current POI 3.10beta1 with the following code: POITextExtractor extractor = ExtractorFactory.createExtractor(POIDataSamples.getSpreadSheetInstance().getFile("47247.xls"));
If someone wanted to spend some time on this bug, you'd want to start by using a debugger to identify the problematic record, and the few preceding it (especially if there are continue records). Then, check the file format doc, and ensure that all the options / optional parts / reserved parts / variable length parts are handled by the problem record, and the few before it. Also, check with the file format validator if the record structure is correct, and see if loading the file in excel and doing a save-as trims the record?
0x850 is a ChartFRTInfoRecord, during opening the file, it reads 20 bytes from the stream, but POI thinks there should be 3080 read, thus 3060 bytes still remain. Reading and saving the file in LibreOffice makes it readable fine, however the resulting file is completely different when looking at it with BiffViewer.
Still reproducible with current poi-3.11-beta3-20141111 and attached file.