Bug 63569

Summary: toByteArray(InputStream stream) in IOUtils may fail if setByteArrayMaxOverride() is used
Product: POI Reporter: JS Lair <jean-severin.lair>
Component: POI OverallAssignee: POI Developers List <dev>
Severity: normal    
Priority: P2    
Version: 4.0.0-FINAL   
Target Milestone: ---   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 64001    
Attachments: Stacktrace with default limits
Stacktrace with higher limit defined

Description JS Lair 2019-07-16 20:59:23 UTC
This call indeed call toByteArray(InputStream stream, Integer.MAX_LENGTH, Integer.MAX_LENGTH) and run in an exception considering this is a too long ByteArray. It's a wrong asumption considering that this value is only to say "unknown length", and that the function manage this unknown length...

The patch is to replace:

            checkLength(length, maxLength);

by :

        if ((length!=Integer.MAX_VALUE) || (maxLength!=Integer.MAX_VALUE))
            checkLength(length, maxLength);
Comment 1 Dominik Stadler 2019-11-17 12:36:34 UTC
This seems to only happen if IOUtils.setByteArrayMaxOverride() is used as otherwise checkLenght() only compares length and maxLength anyway which are both MAX_VALUE in this case.
Comment 2 Dominik Stadler 2019-12-13 04:37:38 UTC
The following allows to reproduce this:

        IOUtils.setByteArrayMaxOverride(30 * 1024 * 1024);
        try {
            ByteArrayInputStream stream = new ByteArrayInputStream("abc".getBytes(StandardCharsets.UTF_8));
        } finally {
Comment 3 Dominik Stadler 2019-12-13 05:00:04 UTC
Unfortunately the proposed fix is not everything that is needed, it would then allow oversized allocations despite the global size limit being set. 

I could not see a quick way to fix this by a few localized changes, might need a bit larger rework of the allocation-limiting functionality to make this work as intended.

If you want to help, please provide more unit-tests which verify things the way you would expect them, obviously more detailed coverage of this is lacking as well.
Comment 4 Dominik Stadler 2019-12-14 07:48:36 UTC
Information from a related discussion on the mailing list:

I am using Tika to do content extraction on Visio (vsd) files,

and I am running into an ‘Unexpected RuntimeException’.

The stack trace for this is in the attached stack-trace-withOUT-setByteArrayMaxOverride.txt file.


When I tried the suggested work around of calling IOUtils.setByteArrayMaxOverride()
on the same file, I got the ‘Unexpected RuntimeException’ from a different part of the code.

It appears to me that when IOUtils.setByteArrayMaxOverride() is called with anything less than
Integer.MAX_VALUE, that calls to toByteArray() will fail in checkLength()

because the length input will be greater than BYTE_ARRAY_MAX_OVERRIDE.


Here is a snippet of the code I am using:

    private void extract(InputStream is, Path outputDir, ContentHandler h, Metadata m , AutoDetectParser extractParser) throws SAXException, TikaException, IOException {

        Map retVal = new HashMap();

        ParseContext c = new ParseContext();


        c.set(Parser.class, extractParser);

        EmbeddedDocumentExtractor ex = new MY_EmbeddedDocumentExtractor(outputDir, c);

        c.set(EmbeddedDocumentExtractor.class, ex);


        // Override the POI maximum length for all record types

        // IOUtils.setByteArrayMaxOverride(100 * 1024 * 1024);

        // IOUtils.setByteArrayMaxOverride(30 * 1024 * 1024);

        extractParser.parse(is, h, m, c);


        // Reset/disable the override

        // IOUtils.setByteArrayMaxOverride(-1);



As you can see from the commented out IOUtils.setByteArrayMaxOverride() calls,

I tried this with both 100 MB, and 30 MB.

A second stack trace for the secondary error (with IOUtils.setByteArrayMaxOverride() being called)
is attached in stack-trace-with-setByteArrayMaxOverride.txt.


In each stack trace I have snipped out the calls to my code.
Comment 5 Dominik Stadler 2019-12-14 07:49:33 UTC
Created attachment 36917 [details]
Stacktrace with default limits
Comment 6 Dominik Stadler 2019-12-14 07:50:05 UTC
Created attachment 36918 [details]
Stacktrace with higher limit defined
Comment 7 Dominik Stadler 2019-12-14 21:42:51 UTC
Fixed via r1871506, now it should be possible to override the max allocation globally with setByteArrayMaxOverride().