Summary: | toByteArray(InputStream stream) in IOUtils may fail if setByteArrayMaxOverride() is used | ||
---|---|---|---|
Product: | POI | Reporter: | JS Lair <jean-severin.lair> |
Component: | POI Overall | Assignee: | POI Developers List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | P2 | ||
Version: | 4.0.0-FINAL | ||
Target Milestone: | --- | ||
Hardware: | All | ||
OS: | All | ||
Bug Depends on: | |||
Bug Blocks: | 64001 | ||
Attachments: |
Stacktrace with default limits
Stacktrace with higher limit defined |
Description
JS Lair
2019-07-16 20:59:23 UTC
This seems to only happen if IOUtils.setByteArrayMaxOverride() is used as otherwise checkLenght() only compares length and maxLength anyway which are both MAX_VALUE in this case. The following allows to reproduce this: IOUtils.setByteArrayMaxOverride(30 * 1024 * 1024); try { ByteArrayInputStream stream = new ByteArrayInputStream("abc".getBytes(StandardCharsets.UTF_8)); IOUtils.toByteArray(stream); } finally { IOUtils.setByteArrayMaxOverride(-1); } Unfortunately the proposed fix is not everything that is needed, it would then allow oversized allocations despite the global size limit being set. I could not see a quick way to fix this by a few localized changes, might need a bit larger rework of the allocation-limiting functionality to make this work as intended. If you want to help, please provide more unit-tests which verify things the way you would expect them, obviously more detailed coverage of this is lacking as well. Information from a related discussion on the mailing list: -------- I am using Tika to do content extraction on Visio (vsd) files, and I am running into an ‘Unexpected RuntimeException’. The stack trace for this is in the attached stack-trace-withOUT-setByteArrayMaxOverride.txt file. When I tried the suggested work around of calling IOUtils.setByteArrayMaxOverride() on the same file, I got the ‘Unexpected RuntimeException’ from a different part of the code. It appears to me that when IOUtils.setByteArrayMaxOverride() is called with anything less than Integer.MAX_VALUE, that calls to toByteArray() will fail in checkLength() because the length input will be greater than BYTE_ARRAY_MAX_OVERRIDE. Here is a snippet of the code I am using: private void extract(InputStream is, Path outputDir, ContentHandler h, Metadata m , AutoDetectParser extractParser) throws SAXException, TikaException, IOException { Map retVal = new HashMap(); ParseContext c = new ParseContext(); c.set(Parser.class, extractParser); EmbeddedDocumentExtractor ex = new MY_EmbeddedDocumentExtractor(outputDir, c); c.set(EmbeddedDocumentExtractor.class, ex); // Override the POI maximum length for all record types // IOUtils.setByteArrayMaxOverride(100 * 1024 * 1024); // IOUtils.setByteArrayMaxOverride(30 * 1024 * 1024); extractParser.parse(is, h, m, c); // Reset/disable the override // IOUtils.setByteArrayMaxOverride(-1); } As you can see from the commented out IOUtils.setByteArrayMaxOverride() calls, I tried this with both 100 MB, and 30 MB. A second stack trace for the secondary error (with IOUtils.setByteArrayMaxOverride() being called) is attached in stack-trace-with-setByteArrayMaxOverride.txt. In each stack trace I have snipped out the calls to my code. ---------- Created attachment 36917 [details]
Stacktrace with default limits
Created attachment 36918 [details]
Stacktrace with higher limit defined
|