Bug 54682

Summary: UnhandledDataStructure - OutOfMemoryError
Product: POI Reporter: Phil Persad <philip.persad>
Component: HWPFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal CC: philip.persad
Priority: P2    
Version: 3.9-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: Linux   

Description Phil Persad 2013-03-13 00:44:43 UTC
In the constructor for org.apache.poi.hwpf.model.UnhandledDataStructure, a byte array is allocated using a length value prior to the code which validates that the parameters passed to the constructor are sane.  The current check is:

if (offset + length > buf.length)
    {
      throw new IndexOutOfBoundsException("buffer length is " + buf.length +
                                          "but code is trying to read " + length + " from offset " + offset);
    }

This should be done prior to creating the buffer.  In one case a malformed word document was attempting to allocate ~1.8g of data when the total files size was 90k.

Also, the check should be:

if (((long) offset) + length > buf.length)

In a corrupt file the parameters could potentially be large enough to overflow an integer.
Comment 1 Nick Burch 2013-03-18 00:23:59 UTC
Do you have a sample file that shows off the problem you could share?
Comment 2 Phil Persad 2013-03-18 17:36:59 UTC
Unfortunately, I'm working under some rather strong confidentiality constraints and cannot provide you with the document which causes the error.

However, the current structure of the code is:
<allocate buffer>
<sanity check length>
<perform copy>

The structure:
<sanity check length>
<allocate buffer>
<perform copy>

Is clearly safer.  The fact that there is a sanity check in the existing code acknowledges that unsafe behaviour is possible, in that case it makes a lot of sense to perform buffer allocation afterwards.

It's also worth noting that an OutOfMemoryError is a catastrophic failure.  The worst case for most exceptions thrown by the poi library is a failure to parse a given document.  However, an OutOfMemoryError will generally take down the entire application.
Comment 3 Damiano Albani 2013-05-29 07:46:58 UTC
Would you consider using the patch provided by RedHat regarding this issue?
It's available in their own bug report: https://bugzilla.redhat.com/show_bug.cgi?id=799078
Comment 4 Nick Burch 2013-05-29 17:26:54 UTC
Fixed in r1487555. (Slightly different patch to the redhat one, but the same idea)
Comment 5 Phil Persad 2013-05-29 19:23:56 UTC
The code:

if (offset < 0 || length < 0)

Fails to account for the very real potential for integer overflow.  The check for offset + length < 0 is necessary.  I avoided that by casting to long, however I think the RedHat solution using binary inclusive OR is more elegant.
Comment 6 Nick Burch 2013-05-29 22:01:34 UTC
I made some additional tweaks in r1487558, which I think should cover the int overflowing into negative case. If you think there's something still missing, please let me know, and if possible include a failing unit test :)
Comment 7 Phil Persad 2013-05-29 22:27:00 UTC
r1487558 looks good.