Bug 47270 - Bad Section Length - Section.java
Summary: Bad Section Length - Section.java
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.5-dev
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2009-05-26 09:51 UTC by Jonathan Holloway
Modified: 2015-05-18 21:20 UTC (History)
0 users


Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Holloway 2009-05-26 09:51:26 UTC
I've seen a rather odd issue with a spreadsheet that results in the size of the section length being miscalculated.  The following:

 * Read the section length.
size = (int) LittleEndian.getUInt(src, o1);

returns a negative number that causes an OutOfMemory error.  It appears to be a valid Excel document (it opens fine in OpenOffice).  My fix for the timebeing is to throw the following immediately after

if (size < 0) {
    throw new UnsupportedEncodingException("Tried to allocate a section of size " + size);

The document appears to parse fine after that.  Please let me know if you need any more info, I might well be able to clean up the data in the original document, but saving in OpenOffice might actually correct the issue.
Comment 1 Nick Burch 2009-05-26 09:55:15 UTC
I don't think getUInt should ever return a negative number - the U in the method name means unsigned

Any chance you could post the problem document / do some sniffing about?
Comment 2 Jonathan Holloway 2009-05-26 10:34:57 UTC
I'm sure this is bad data in the spreadsheet, in org.apache.poi.util.LittleEndian  byte[] data (4096 bytes) is passed in:

offset = 316
b0 = 0
b1 = 0
b2 = 0
b3 = 228

so (b3 << 24) + (b2 << 16) + (b1 << 8) + (b0 << 0) returns -469762048

Saving in OpenOffice "fixes" the document - so I can't cleanse the doc to send it to you, and unfortunately I can't send it for client confidentiality reasons.
Comment 3 Dominik Stadler 2015-05-18 21:20:59 UTC
I think the negative value results from interpreting an unsigned int as signed int, i.e. the unsigned int has a larger positive range than what int can hold, so very large unsigned ints will lead to negative signed int values when a cast is used. to correctly handle large unsigned ints, you need to use a long datatype.

However I assume something with the document is not quite right here as I don't think the section size really holds such large values in your document, or?

However for now without a sample document we are not able to investigate here, therefore I am setting this to LATER for now, please reopen this with a sample document and the steps to reproduce the problem, ideally as self-sufficient unit test.