I've seen a rather odd issue with a spreadsheet that results in the size of the section length being miscalculated. The following: /* * Read the section length. */ size = (int) LittleEndian.getUInt(src, o1); returns a negative number that causes an OutOfMemory error. It appears to be a valid Excel document (it opens fine in OpenOffice). My fix for the timebeing is to throw the following immediately after if (size < 0) { throw new UnsupportedEncodingException("Tried to allocate a section of size " + size); } The document appears to parse fine after that. Please let me know if you need any more info, I might well be able to clean up the data in the original document, but saving in OpenOffice might actually correct the issue.
I don't think getUInt should ever return a negative number - the U in the method name means unsigned Any chance you could post the problem document / do some sniffing about?
I'm sure this is bad data in the spreadsheet, in org.apache.poi.util.LittleEndian byte[] data (4096 bytes) is passed in: offset = 316 b0 = 0 b1 = 0 b2 = 0 b3 = 228 so (b3 << 24) + (b2 << 16) + (b1 << 8) + (b0 << 0) returns -469762048 Saving in OpenOffice "fixes" the document - so I can't cleanse the doc to send it to you, and unfortunately I can't send it for client confidentiality reasons.
I think the negative value results from interpreting an unsigned int as signed int, i.e. the unsigned int has a larger positive range than what int can hold, so very large unsigned ints will lead to negative signed int values when a cast is used. to correctly handle large unsigned ints, you need to use a long datatype. However I assume something with the document is not quite right here as I don't think the section size really holds such large values in your document, or? However for now without a sample document we are not able to investigate here, therefore I am setting this to LATER for now, please reopen this with a sample document and the steps to reproduce the problem, ideally as self-sufficient unit test.