47270 – Bad Section Length - Section.java

Bug 47270 - Bad Section Length - Section.java

Summary: Bad Section Length - Section.java

Status:	RESOLVED LATER

Alias:	None

Product:	POI
Classification:	Unclassified
Component:	HSSF (show other bugs)
Version:	3.5-dev
Hardware:	PC Linux

Importance:	P2 normal (vote)
Target Milestone:	---
Assignee:	POI Developers List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2009-05-26 09:51 UTC by Jonathan Holloway
Modified:	2015-05-18 21:20 UTC (History)
CC List:	0 users

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jonathan Holloway 2009-05-26 09:51:26 UTC

I've seen a rather odd issue with a spreadsheet that results in the size of the section length being miscalculated.  The following:

/*
 * Read the section length.
 */
size = (int) LittleEndian.getUInt(src, o1);

returns a negative number that causes an OutOfMemory error.  It appears to be a valid Excel document (it opens fine in OpenOffice).  My fix for the timebeing is to throw the following immediately after

if (size < 0) {
    throw new UnsupportedEncodingException("Tried to allocate a section of size " + size);
}

The document appears to parse fine after that.  Please let me know if you need any more info, I might well be able to clean up the data in the original document, but saving in OpenOffice might actually correct the issue.

Comment 1 Nick Burch 2009-05-26 09:55:15 UTC

I don't think getUInt should ever return a negative number - the U in the method name means unsigned

Any chance you could post the problem document / do some sniffing about?

Comment 2 Jonathan Holloway 2009-05-26 10:34:57 UTC

I'm sure this is bad data in the spreadsheet, in org.apache.poi.util.LittleEndian  byte[] data (4096 bytes) is passed in:

offset = 316
b0 = 0
b1 = 0
b2 = 0
b3 = 228

so (b3 << 24) + (b2 << 16) + (b1 << 8) + (b0 << 0) returns -469762048

Saving in OpenOffice "fixes" the document - so I can't cleanse the doc to send it to you, and unfortunately I can't send it for client confidentiality reasons.

Comment 3 Dominik Stadler 2015-05-18 21:20:59 UTC

I think the negative value results from interpreting an unsigned int as signed int, i.e. the unsigned int has a larger positive range than what int can hold, so very large unsigned ints will lead to negative signed int values when a cast is used. to correctly handle large unsigned ints, you need to use a long datatype.

However I assume something with the document is not quite right here as I don't think the section size really holds such large values in your document, or?

However for now without a sample document we are not able to investigate here, therefore I am setting this to LATER for now, please reopen this with a sample document and the steps to reproduce the problem, ideally as self-sufficient unit test.