Bug 45582

Summary: org.apache.poi.hssf.record.RecordFormatException: Error reading bytes
Product: POI Reporter: James Perry <james_perry>
Component: HSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Attachments: Sample .xls file

Description James Perry 2008-08-06 09:38:35 UTC
Created attachment 22397 [details]
Sample .xls file

I'm using the latest release of POI: poi-3.1-FINAL-20080629.jar
I have attached a sample .xls file, source, and exception.

Here is the sample code:
public class theApp {
    public static void main(String[] Args) {
        try {
            POIFSFileSystem fs = new POIFSFileSystem(new FileInputStream("test-data.xls"));
            HSSFWorkbook wb = new HSSFWorkbook(fs);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}


When attempting to open a .xls file I receive the following exception:

org.apache.poi.hssf.record.RecordFormatException: Error reading bytes
	at org.apache.poi.hssf.record.RecordInputStream.nextRecord(RecordInputStream.java:115)
	at org.apache.poi.hssf.record.RecordFactory.createRecords(RecordFactory.java:123)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:246)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:169)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:151)
	at theApp.main(theApp.java:18)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:90)
Caused by: org.apache.poi.util.LittleEndian$BufferUnderrunException: buffer underrun
	at org.apache.poi.util.LittleEndian.readFromStream(LittleEndian.java:482)
	at org.apache.poi.util.LittleEndian.readShort(LittleEndian.java:414)
	at org.apache.poi.hssf.record.RecordInputStream.nextRecord(RecordInputStream.java:113)
	... 10 more
Comment 1 James Perry 2008-08-06 09:45:30 UTC
I can open this file in Excel without issue or apparent conversion.  Once saved from Excel, this issue goes away.
Comment 2 Josh Micich 2008-08-07 13:44:33 UTC
Fixed in svn r683706.

The example file has one extra byte of data beyond the EOFRecord.  BTW - what application produced this file?

POI always attempted to read the next record sid, without first checking for stream.available().  This was wrong, seemed to work because another bug in LittleEndian caused readShort() to return 0 when there were zero bytes available.  All example spreadsheets up until now have had exactly zero bytes data after the EOFRecord.  RecordInputStream was interpreting nextSid==0 as end of stream.  This was also a little bit wrong, since 0x0000 *is* a valid Record sid (from a previous Excel version).

RecordInputStream was changed to check the number of bytes left in the stream before reading the next sid.  'End of stream' condition is now represented by nextSid==-1 (a safer number). LittleEndian was modified to properly throw BufferUnderrunException even for zero bytes read.  LittleEndian was also changed to avoid creating temporary byte arrays just to read bytes, shorts, ints and longs.

A junit test case was added using the sample file provided.
Comment 3 James Perry 2008-08-07 13:52:41 UTC
(In reply to comment #2)
> Fixed in svn r683706.
> The example file has one extra byte of data beyond the EOFRecord.  BTW - what
> application produced this file?
> POI always attempted to read the next record sid, without first checking for
> stream.available().  This was wrong, seemed to work because another bug in
> LittleEndian caused readShort() to return 0 when there were zero bytes
> available.  All example spreadsheets up until now have had exactly zero bytes
> data after the EOFRecord.  RecordInputStream was interpreting nextSid==0 as end
> of stream.  This was also a little bit wrong, since 0x0000 *is* a valid Record
> sid (from a previous Excel version).
> RecordInputStream was changed to check the number of bytes left in the stream
> before reading the next sid.  'End of stream' condition is now represented by
> nextSid==-1 (a safer number). LittleEndian was modified to properly throw
> BufferUnderrunException even for zero bytes read.  LittleEndian was also
> changed to avoid creating temporary byte arrays just to read bytes, shorts,
> ints and longs.
> A junit test case was added using the sample file provided.

(In reply to comment #2)
> Fixed in svn r683706.
> The example file has one extra byte of data beyond the EOFRecord.  BTW - what
> application produced this file?
> POI always attempted to read the next record sid, without first checking for
> stream.available().  This was wrong, seemed to work because another bug in
> LittleEndian caused readShort() to return 0 when there were zero bytes
> available.  All example spreadsheets up until now have had exactly zero bytes
> data after the EOFRecord.  RecordInputStream was interpreting nextSid==0 as end
> of stream.  This was also a little bit wrong, since 0x0000 *is* a valid Record
> sid (from a previous Excel version).
> RecordInputStream was changed to check the number of bytes left in the stream
> before reading the next sid.  'End of stream' condition is now represented by
> nextSid==-1 (a safer number). LittleEndian was modified to properly throw
> BufferUnderrunException even for zero bytes read.  LittleEndian was also
> changed to avoid creating temporary byte arrays just to read bytes, shorts,
> ints and longs.
> A junit test case was added using the sample file provided.

This file was created by Business Objects XI Update 2.  

Can you tell me (roughly) when this resolution will be available in a FINAL build?
Comment 4 Josh Micich 2008-08-07 13:56:48 UTC
(In reply to comment #3)
> Can you tell me (roughly) when this resolution will be available in a FINAL
> build?

I'm not sure of the exact timing for the next release, but it might be in about a month.