Summary: | org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 4276190, but 1000000 is the maximum for this record type | ||
---|---|---|---|
Product: | POI | Reporter: | redmanmale <redmanmale+apache> |
Component: | POI Overall | Assignee: | POI Developers List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | t.heidenthal |
Priority: | P2 | ||
Version: | 4.1.2-FINAL | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | All |
Description
redmanmale
2021-10-19 14:03:19 UTC
Is there a reason you can't use IOUtils.setByteArrayMaxOverride() ? The max is there to protect users from malicious files. Ww would be reluctant to change the default and you will be stuck waiting for a release anyway. I've already use setByteArrayMaxOverride and it fixed problem for me.
But there's a comment in this method to open an issue if you're using it.
>>and please open up issues on POI's bugzilla to bump values for specific records.
That's it.
It would seem odd to have a font config that is so big. It's my opinion that it is not worth changing the POI default in this case. Someone else might have a different opinion. 4mb of font metadata feels very high and likely broken, but if it is actually holding the full font with lots of hinting/design then that might be reasonable. If it is holding multiple fonts, 4mb seems quite likely. Anyone have 15 minutes to find the relevant link to the MS docs on this kind of font record, to check if it is just metadata or can contain full embedded fonts? I've processed 30k documents and found ~60 docx with the huge record length (~37000000). I've added r1894438 PS this issue does affect docx files, it affects ppt files. I meant 'this issue does not affect docx files, it affects ppt files.' @redmanmale could you attach a sample ppt file that exhibits this issue - so we can add a regression test for it? >I meant 'this issue does not affect docx files, it affects ppt files.' Sorry, I'll open another issue for this thing. >could you attach a sample ppt file that exhibits this issue I'll take a look if there's any that I could upload (without sensitive or private data). I came across the following when using ooxml 5.2.0 (didn't have the problem in 4.1.0) BUG_REPORT: Uncaught org.apache.poi.util.RecordFormatException at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:535) [Fri Feb 18 11:53:07 CST 2022] org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 197,775,541, but the maximum length for this record type is 100,000,000. If the file is not corrupt, please open an issue on bugzilla to request increasing the maximum allowable size for this record type. As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride() This was when loading an xlsx file. My sample file is 38144 KB which is larger than your system will allow to be updated. If you can provide another way for me to get it to you, I am happy to share. I have used setByteArrayMaxOverride as a temporary workaround as suggested and it only resulted in the following stack trace. java.io.IOException: MaxLength (100000000) reached - stream seems to be invalid. at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:195) at org.apache.poi.openxml4j.util.ZipArchiveFakeEntry.<init>(ZipArchiveFakeEntry.java:72) at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:98) at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:132) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:312) could you try a higher setByteArrayMaxOverride value than 197,775,541 anything up to max-int (2,147,483,647)? you might also find org.apache.poi.openxml4j.opc.ZipPackage.setUseTempFilePackageParts(true) useful unfortunately once you start setting setByteArrayMaxOverride you will find yourself having to try different values because the default for setByteArrayMaxOverride is -1 (no limit) but there are specific limits for certain data types that are ignored if you start setting setByteArrayMaxOverride I tried both MAX_INT and ZipPackage.setUseTempFilePackageParts(true) with the same result java.io.IOException: MaxLength (100000000) reached - stream seems to be invalid. Could you put the file on google drive or some similar mechanism where it can be accessed publicly? PJ, I have sent you a shared Google Doc. In case that doesn't work, you can use this link https://docs.google.com/spreadsheets/d/1PF8gDoG_9CXorbaSGBkv012AydWwMXGV/edit?usp=sharing&ouid=101803365984806497993&rtpof=true&sd=true Other than having to assign a large Xmx, I have no trouble reading that file. Maybe Google Sheets has corrected it. import org.apache.poi.openxml4j.opc.OPCPackage; import org.apache.poi.openxml4j.opc.PackageAccess; import org.apache.poi.ss.usermodel.Sheet; import org.apache.poi.ss.usermodel.Workbook; import org.apache.poi.xssf.usermodel.XSSFWorkbook; public class LargeMain { private static final String FILE = "/Users/pj.fanning/Downloads/VeryLargeDataFile.xlsx"; public static void main(String[] args) { try (Workbook wb = new XSSFWorkbook(OPCPackage.open(FILE, PackageAccess.READ))) { Sheet sheet = wb.getSheetAt(0); } catch (Exception e) { e.printStackTrace(); } } } Thanks, PJ. That form works for me as well. However, due to some other bug with POI locking files, we had to use a different form. (I need to verify whether the locking bug is still a problem - I will do that soon.) The form we use that produces the error is FileInputStream in = new FileInputStream(filename); OPCPackage container = OPCPackage.open(in); The exception occurs in the open() call I reproduced the issue with that approach to creating OPCPackage. This setting helps: org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.setThresholdBytesForTempFiles(100_000_000); causes temp files to be used for large zip entries IOUtils.toByteArray does seem wrong - it does not take into account the IOUtils.setByteArrayMaxOverride(int) setting. Changed with r1898229 If any committer has a problem with my change, please get in touch. Fix for IOUtils.setByteArrayMaxOverride and IOUtils.toByteArray will be in v5.2.1 I don't understand the current version of IOUtils.toByteArray. It looks like the length parameter is now supposed to be Integer.MIN_VALUE for reading until EOF. I've noticed that reading from a socket with IOUtils.toByteArray(is, getMaxTimestampResponseSize()) isn't working anymore. So if MIN_VALUE is the new MAX_VALUE, that's ok, and we need to adapt the javadocs. This only hits people using the "internal" api and don't know about it. Thanks Andi for looking at this. I got everything else working but TSPTimeStampService is still causing issues. I had to disable TestPOIXMLDocument testOSGIClassLoading. No matter what I do to TSPTimeStampService, this test still fails. Its use of IOUtils.toByteArray seems so simple but something is happening that I don't understand. IOUTils length of min-int is just a way of disabling the some of the checks but there are many competing use cases, it is hard to get one method to handle them all. Just as an FYI, I verified that the locking issue that we previously had does not occur any longer, so we can use the first form you provided. Thank you for all your help in resolving this issue so quickly. I really appreciate it what is the resolution for this . I have 50MB file. while reading it i got a the same issue As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride() at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:599) |