Bug 63569 - toByteArray(InputStream stream) in IOUtils may fail if setByteArrayMaxOverride() is used
Summary: toByteArray(InputStream stream) in IOUtils may fail if setByteArrayMaxOverrid...
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: 4.0.0-FINAL
Hardware: All All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks: 64001
  Show dependency tree
 
Reported: 2019-07-16 20:59 UTC by JS Lair
Modified: 2019-12-14 21:42 UTC (History)
0 users



Attachments
Stacktrace with default limits (4.71 KB, text/plain)
2019-12-14 07:49 UTC, Dominik Stadler
Details
Stacktrace with higher limit defined (12.64 KB, text/plain)
2019-12-14 07:50 UTC, Dominik Stadler
Details

Note You need to log in before you can comment on or make changes to this bug.
Description JS Lair 2019-07-16 20:59:23 UTC
This call indeed call toByteArray(InputStream stream, Integer.MAX_LENGTH, Integer.MAX_LENGTH) and run in an exception considering this is a too long ByteArray. It's a wrong asumption considering that this value is only to say "unknown length", and that the function manage this unknown length...

The patch is to replace:

            checkLength(length, maxLength);

by :

        if ((length!=Integer.MAX_VALUE) || (maxLength!=Integer.MAX_VALUE))
            checkLength(length, maxLength);
Comment 1 Dominik Stadler 2019-11-17 12:36:34 UTC
This seems to only happen if IOUtils.setByteArrayMaxOverride() is used as otherwise checkLenght() only compares length and maxLength anyway which are both MAX_VALUE in this case.
Comment 2 Dominik Stadler 2019-12-13 04:37:38 UTC
The following allows to reproduce this:

        IOUtils.setByteArrayMaxOverride(30 * 1024 * 1024);
        try {
            ByteArrayInputStream stream = new ByteArrayInputStream("abc".getBytes(StandardCharsets.UTF_8));
            IOUtils.toByteArray(stream);
        } finally {
            IOUtils.setByteArrayMaxOverride(-1);
        }
Comment 3 Dominik Stadler 2019-12-13 05:00:04 UTC
Unfortunately the proposed fix is not everything that is needed, it would then allow oversized allocations despite the global size limit being set. 

I could not see a quick way to fix this by a few localized changes, might need a bit larger rework of the allocation-limiting functionality to make this work as intended.

If you want to help, please provide more unit-tests which verify things the way you would expect them, obviously more detailed coverage of this is lacking as well.
Comment 4 Dominik Stadler 2019-12-14 07:48:36 UTC
Information from a related discussion on the mailing list:

--------
I am using Tika to do content extraction on Visio (vsd) files,

and I am running into an ‘Unexpected RuntimeException’.

The stack trace for this is in the attached stack-trace-withOUT-setByteArrayMaxOverride.txt file.

 

When I tried the suggested work around of calling IOUtils.setByteArrayMaxOverride()
on the same file, I got the ‘Unexpected RuntimeException’ from a different part of the code.

It appears to me that when IOUtils.setByteArrayMaxOverride() is called with anything less than
Integer.MAX_VALUE, that calls to toByteArray() will fail in checkLength()

because the length input will be greater than BYTE_ARRAY_MAX_OVERRIDE.

 

Here is a snippet of the code I am using:

    private void extract(InputStream is, Path outputDir, ContentHandler h, Metadata m , AutoDetectParser extractParser) throws SAXException, TikaException, IOException {

        Map retVal = new HashMap();

        ParseContext c = new ParseContext();

 

        c.set(Parser.class, extractParser);

        EmbeddedDocumentExtractor ex = new MY_EmbeddedDocumentExtractor(outputDir, c);

        c.set(EmbeddedDocumentExtractor.class, ex);

 

        // Override the POI maximum length for all record types

        // IOUtils.setByteArrayMaxOverride(100 * 1024 * 1024);

        // IOUtils.setByteArrayMaxOverride(30 * 1024 * 1024);

        extractParser.parse(is, h, m, c);

 

        // Reset/disable the override

        // IOUtils.setByteArrayMaxOverride(-1);

    }

 

As you can see from the commented out IOUtils.setByteArrayMaxOverride() calls,

I tried this with both 100 MB, and 30 MB.

A second stack trace for the secondary error (with IOUtils.setByteArrayMaxOverride() being called)
is attached in stack-trace-with-setByteArrayMaxOverride.txt.

 

In each stack trace I have snipped out the calls to my code.
----------
Comment 5 Dominik Stadler 2019-12-14 07:49:33 UTC
Created attachment 36917 [details]
Stacktrace with default limits
Comment 6 Dominik Stadler 2019-12-14 07:50:05 UTC
Created attachment 36918 [details]
Stacktrace with higher limit defined
Comment 7 Dominik Stadler 2019-12-14 21:42:51 UTC
Fixed via r1871506, now it should be possible to override the max allocation globally with setByteArrayMaxOverride().