This call indeed call toByteArray(InputStream stream, Integer.MAX_LENGTH, Integer.MAX_LENGTH) and run in an exception considering this is a too long ByteArray. It's a wrong asumption considering that this value is only to say "unknown length", and that the function manage this unknown length... The patch is to replace: checkLength(length, maxLength); by : if ((length!=Integer.MAX_VALUE) || (maxLength!=Integer.MAX_VALUE)) checkLength(length, maxLength);
This seems to only happen if IOUtils.setByteArrayMaxOverride() is used as otherwise checkLenght() only compares length and maxLength anyway which are both MAX_VALUE in this case.
The following allows to reproduce this: IOUtils.setByteArrayMaxOverride(30 * 1024 * 1024); try { ByteArrayInputStream stream = new ByteArrayInputStream("abc".getBytes(StandardCharsets.UTF_8)); IOUtils.toByteArray(stream); } finally { IOUtils.setByteArrayMaxOverride(-1); }
Unfortunately the proposed fix is not everything that is needed, it would then allow oversized allocations despite the global size limit being set. I could not see a quick way to fix this by a few localized changes, might need a bit larger rework of the allocation-limiting functionality to make this work as intended. If you want to help, please provide more unit-tests which verify things the way you would expect them, obviously more detailed coverage of this is lacking as well.
Information from a related discussion on the mailing list: -------- I am using Tika to do content extraction on Visio (vsd) files, and I am running into an ‘Unexpected RuntimeException’. The stack trace for this is in the attached stack-trace-withOUT-setByteArrayMaxOverride.txt file. When I tried the suggested work around of calling IOUtils.setByteArrayMaxOverride() on the same file, I got the ‘Unexpected RuntimeException’ from a different part of the code. It appears to me that when IOUtils.setByteArrayMaxOverride() is called with anything less than Integer.MAX_VALUE, that calls to toByteArray() will fail in checkLength() because the length input will be greater than BYTE_ARRAY_MAX_OVERRIDE. Here is a snippet of the code I am using: private void extract(InputStream is, Path outputDir, ContentHandler h, Metadata m , AutoDetectParser extractParser) throws SAXException, TikaException, IOException { Map retVal = new HashMap(); ParseContext c = new ParseContext(); c.set(Parser.class, extractParser); EmbeddedDocumentExtractor ex = new MY_EmbeddedDocumentExtractor(outputDir, c); c.set(EmbeddedDocumentExtractor.class, ex); // Override the POI maximum length for all record types // IOUtils.setByteArrayMaxOverride(100 * 1024 * 1024); // IOUtils.setByteArrayMaxOverride(30 * 1024 * 1024); extractParser.parse(is, h, m, c); // Reset/disable the override // IOUtils.setByteArrayMaxOverride(-1); } As you can see from the commented out IOUtils.setByteArrayMaxOverride() calls, I tried this with both 100 MB, and 30 MB. A second stack trace for the secondary error (with IOUtils.setByteArrayMaxOverride() being called) is attached in stack-trace-with-setByteArrayMaxOverride.txt. In each stack trace I have snipped out the calls to my code. ----------
Created attachment 36917 [details] Stacktrace with default limits
Created attachment 36918 [details] Stacktrace with higher limit defined
Fixed via r1871506, now it should be possible to override the max allocation globally with setByteArrayMaxOverride().