Bug 47247 - Initialisation of record 0x850 left 3060 bytes remaining still to be read.
Summary: Initialisation of record 0x850 left 3060 bytes remaining still to be read.
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.5-dev
Hardware: Macintosh other
: P2 normal with 4 votes (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-05-22 11:06 UTC by Jonathan Holloway
Modified: 2014-12-03 14:00 UTC (History)
1 user (show)



Attachments
A .xls that causes the exception (60.50 KB, application/octet-stream)
2009-06-29 20:38 UTC, David Agnew
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Holloway 2009-05-22 11:06:54 UTC
I get the following issue with the following record when parsing with ExcelExtractor:

org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException: Initialisation of record 0x850 left 3060 bytes remaining still to be read.
	at org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:124)
	at org.apache.poi.hssf.record.RecordFactory.createRecords(RecordFactory.java:402)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:277)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:202)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:184)

BiffViewer gives me the following error when dealing with the spreadsheet in question:

     [java] Offset=0x000359F4(219636) recno=9202 sid=0x08C9 size=0x0018(24)
     [java] [UNKNOWNRECORD] (0x8C9)
     [java]   rawData=[C9, 08, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 01, 00, 00, 00, 00, 00]
     [java] [/UNKNOWNRECORD]
     [java] 
     [java] org.apache.poi.hssf.record.RecordFormatException: Not enough data (0) to read requested (6) bytes
     [java]     at org.apache.poi.hssf.record.RecordInputStream.checkRecordPosition(RecordInputStream.java:185)
     [java]     at org.apache.poi.hssf.record.RecordInputStream.readFully(RecordInputStream.java:250)
     [java]     at org.apache.poi.hssf.record.RecordInputStream.readFully(RecordInputStream.java:246)
     [java]     at org.apache.poi.hssf.record.chart.ChartEndObjectRecord.<init>(ChartEndObjectRecord.java:44)
     [java]     at org.apache.poi.hssf.dev.BiffViewer.createRecord(BiffViewer.java:248)
     [java]     at org.apache.poi.hssf.dev.BiffViewer.createRecords(BiffViewer.java:84)
     [java]     at org.apache.poi.hssf.dev.BiffViewer.main(BiffViewer.java:398)
     [java]     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     [java]     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
     [java]     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:2
Comment 1 Josh Micich 2009-05-22 12:07:48 UTC
These are problems with the recently added chart records.  
I am not sure why you get a different error from BiffViewer (because it seems
to interpret the same set of records as RecordFactory).  Both errors look
legitimate, and this suggests a third problem as to why BiffViewer behaves
differently.


As far as ChartEndObjectRecord is concerned, it looks like some apps don't
write the 'unused' field.  It would be interesting to know whether Excel
re-adds it.
I can't see any obvious problem with ChartFRTInfoRecord, so we'll probably need
some sample data.

Could you either upload the spreadsheet in question, or give the hex dump of
these two offending records?

You can get the dump by changing the constructors of ChartFRTInfoRecord and
ChartEndObjectRecord(:

// change the parameter declaration from 'in' to 'inOrig'
// add this code at the top of the method
byte[] data = inOrig.readRemainder();
LittleEndianInput in = new LittleEndianByteArrayInputStream(data);

// add this code at the bottom 
if (in.available()>0) {
    System.err.println("leftover data reading " + getClass().getName());
    System.err.println(HexDump.toHex(data, 16));
}

// also change param type to CFRTID constructor ( like svn r777660 )
Comment 2 David Agnew 2009-06-29 20:38:06 UTC
Created attachment 23906 [details]
A .xls that causes the exception

A .xls that causes the stated exception. It contains a worksheet with some numbers and a chart to graph them. It was created with Microsoft Excel 2004 for Mac Version 11.5.5 (090512).
Comment 3 David Agnew 2009-06-29 20:40:30 UTC
I have attached a .xls that causes the stated exception. It contains a worksheet with some numbers and a chart to graph them. It was created with Microsoft Excel 2004 for Mac Version 11.5.5 (090512).
Comment 4 Dominik Stadler 2013-08-05 20:05:19 UTC
Still reproducible with current POI 3.10beta1 with the following code:

       POITextExtractor extractor = ExtractorFactory.createExtractor(POIDataSamples.getSpreadSheetInstance().getFile("47247.xls"));
Comment 5 Nick Burch 2013-08-06 11:44:39 UTC
If someone wanted to spend some time on this bug, you'd want to start by using a debugger to identify the problematic record, and the few preceding it (especially if there are continue records). Then, check the file format doc, and ensure that all the options / optional parts / reserved parts / variable length parts are handled by the problem record, and the few before it. 

Also, check with the file format validator if the record structure is correct, and see if loading the file in excel and doing a save-as trims the record?
Comment 6 Dominik Stadler 2013-10-27 09:15:55 UTC
0x850 is a ChartFRTInfoRecord, during opening the file, it reads 20 bytes from the stream, but POI thinks there should be 3080 read, thus 3060 bytes still remain.

Reading and saving the file in LibreOffice makes it readable fine, however the resulting file is completely different when looking at it with BiffViewer.
Comment 7 Stefan Pfafferott 2014-12-03 14:00:40 UTC
Still reproducible with current poi-3.11-beta3-20141111 and attached file.