Bug 65639

Summary: org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 4276190, but 1000000 is the maximum for this record type
Product: POI Reporter: redmanmale <redmanmale+apache>
Component: POI OverallAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal CC: t.heidenthal
Priority: P2    
Version: 4.1.2-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: All   

Description redmanmale 2021-10-19 14:03:19 UTC
I try to parse ppt document and get this error:

<business logic>

Caused by: org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 1000 on class org.apache.poi.hslf.record.RecordTypes$$Lambda$1302/2069825217@6c8f6293 : org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 1010 on class org.apache.poi.hslf.record.RecordTypes$$Lambda$1308/936605483@4abd8f87 : org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 2005 on class org.apache.poi.hslf.record.RecordTypes$$Lambda$1327/708554163@431de604 : org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 4024 on class org.apache.poi.hslf.record.RecordTypes$$Lambda$1345/56643443@5fd52ce1 : org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 4276190, but 1000000 is the maximum for this record type.
If the file is not corrupt, please open an issue on bugzilla to request
increasing the maximum allowable size for this record type.
As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride()
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:190)
at org.apache.poi.hslf.record.Record.buildRecordAtOffset(Record.java:118)
at org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.read(HSLFSlideShowImpl.java:270)
at org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.buildRecords(HSLFSlideShowImpl.java:251)
at org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.<init>(HSLFSlideShowImpl.java:150)
at org.apache.poi.hslf.usermodel.HSLFSlideShow.<init>(HSLFSlideShow.java:163)
at org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:83)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:178)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
... 11 more
Caused by: org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 1010 on class org.apache.poi.hslf.record.RecordTypes$$Lambda$1308/936605483@4abd8f87 : org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 2005 on class org.apache.poi.hslf.record.RecordTypes$$Lambda$1327/708554163@431de604 : org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 4024 on class org.apache.poi.hslf.record.RecordTypes$$Lambda$1345/56643443@5fd52ce1 : org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 4276190, but 1000000 is the maximum for this record type.
If the file is not corrupt, please open an issue on bugzilla to request
increasing the maximum allowable size for this record type.
As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride()
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:190)
at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:143)
at org.apache.poi.hslf.record.Document.<init>(Document.java:133)
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:181)
... 20 more
Caused by: org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 2005 on class org.apache.poi.hslf.record.RecordTypes$$Lambda$1327/708554163@431de604 : org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 4024 on class org.apache.poi.hslf.record.RecordTypes$$Lambda$1345/56643443@5fd52ce1 : org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 4276190, but 1000000 is the maximum for this record type.
If the file is not corrupt, please open an issue on bugzilla to request
increasing the maximum allowable size for this record type.
As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride()
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:190)
at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:143)
at org.apache.poi.hslf.record.Environment.<init>(Environment.java:54)
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:181)
... 23 more
Caused by: org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the class for type with id 4024 on class org.apache.poi.hslf.record.RecordTypes$$Lambda$1345/56643443@5fd52ce1 : org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 4276190, but 1000000 is the maximum for this record type.
If the file is not corrupt, please open an issue on bugzilla to request
increasing the maximum allowable size for this record type.
As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride()
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:190)
at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:143)
at org.apache.poi.hslf.record.FontCollection.<init>(FontCollection.java:53)
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:181)
... 26 more
Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 4276190, but 1000000 is the maximum for this record type.
If the file is not corrupt, please open an issue on bugzilla to request
increasing the maximum allowable size for this record type.
As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride()
at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630)
at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:208)
at org.apache.poi.util.IOUtils.safelyAllocateCheck(IOUtils.java:610)
at org.apache.poi.util.IOUtils.safelyAllocate(IOUtils.java:596)
at org.apache.poi.hslf.record.FontEmbeddedData.<init>(FontEmbeddedData.java:70)
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:181)
... 29 more

Maybe we should bump the default max size for this record type.

I could've attach a file but it's more than 1 Mb (around 5 Mb). If you need it I could upload and paste a link.
Comment 1 PJ Fanning 2021-10-19 14:52:10 UTC
Is there a reason you can't use IOUtils.setByteArrayMaxOverride() ?

The max is there to protect users from malicious files. Ww would be reluctant to change the default and you will be stuck waiting for a release anyway.
Comment 2 redmanmale 2021-10-20 09:03:43 UTC
I've already use setByteArrayMaxOverride and it fixed problem for me.

But there's a comment in this method to open an issue if you're using it.
>>and please open up issues on POI's bugzilla to bump values for specific records.

That's it.
Comment 3 PJ Fanning 2021-10-20 09:08:32 UTC
It would seem odd to have a font config that is so big. It's my opinion that it is not worth changing the POI default in this case. Someone else might have a different opinion.
Comment 4 Nick Burch 2021-10-20 09:52:00 UTC
4mb of font metadata feels very high and likely broken, but if it is actually holding the full font with lots of hinting/design then that might be reasonable. If it is holding multiple fonts, 4mb seems quite likely.

Anyone have 15 minutes to find the relevant link to the MS docs on this kind of font record, to check if it is just metadata or can contain full embedded fonts?
Comment 5 redmanmale 2021-10-21 09:53:28 UTC
I've processed 30k documents and found ~60 docx with the huge record length (~37000000).
Comment 6 PJ Fanning 2021-10-21 10:09:48 UTC
I've added r1894438

PS this issue does affect docx files, it affects ppt files.
Comment 7 PJ Fanning 2021-10-21 10:27:58 UTC
I meant 'this issue does not affect docx files, it affects ppt files.'
Comment 8 PJ Fanning 2021-10-21 10:55:09 UTC
@redmanmale could you attach a sample ppt file that exhibits this issue - so we can add a regression test for it?
Comment 9 redmanmale 2021-10-22 16:07:39 UTC
>I meant 'this issue does not affect docx files, it affects ppt files.'
Sorry, I'll open another issue for this thing.

>could you attach a sample ppt file that exhibits this issue
I'll take a look if there's any that I could upload (without sensitive or private data).
Comment 10 Todd Heidenthal 2022-02-18 19:15:33 UTC
I came across the following when using ooxml 5.2.0 (didn't have the problem in 4.1.0)

BUG_REPORT: Uncaught org.apache.poi.util.RecordFormatException at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:535) [Fri Feb 18 11:53:07 CST 2022]
org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 197,775,541, but the maximum length for this record type is 100,000,000.
If the file is not corrupt, please open an issue on bugzilla to request 
increasing the maximum allowable size for this record type.
As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride()

This was when loading an xlsx file.  My sample file is 38144 KB which is larger than your system will allow to be updated.  If you can provide another way for me to get it to you, I am happy to share.

I have used setByteArrayMaxOverride as a temporary workaround as suggested and it only resulted in the following stack trace.

java.io.IOException: MaxLength (100000000) reached - stream seems to be invalid.
	at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:195)
	at org.apache.poi.openxml4j.util.ZipArchiveFakeEntry.<init>(ZipArchiveFakeEntry.java:72)
	at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:98)
	at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:132)
	at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:312)
Comment 11 PJ Fanning 2022-02-18 19:30:34 UTC
could you try a higher setByteArrayMaxOverride value than 197,775,541 anything up to max-int (2,147,483,647)?

you might also find org.apache.poi.openxml4j.opc.ZipPackage.setUseTempFilePackageParts(true) useful

unfortunately once you start setting setByteArrayMaxOverride you will find yourself having to try different values because the default for setByteArrayMaxOverride is -1 (no limit) but there are specific limits for certain data types that are ignored if you start setting setByteArrayMaxOverride
Comment 12 Todd Heidenthal 2022-02-18 20:36:57 UTC
I tried both MAX_INT and ZipPackage.setUseTempFilePackageParts(true) with the same result

java.io.IOException: MaxLength (100000000) reached - stream seems to be invalid.
Comment 13 PJ Fanning 2022-02-18 20:47:14 UTC
Could you put the file on google drive or some similar mechanism where it can be accessed publicly?
Comment 14 Todd Heidenthal 2022-02-18 21:45:19 UTC
PJ, I have sent you a shared Google Doc.  In case that doesn't work, you can use this link

https://docs.google.com/spreadsheets/d/1PF8gDoG_9CXorbaSGBkv012AydWwMXGV/edit?usp=sharing&ouid=101803365984806497993&rtpof=true&sd=true
Comment 15 PJ Fanning 2022-02-18 22:26:02 UTC
Other than having to assign a large Xmx, I have no trouble reading that file. Maybe Google Sheets has corrected it.

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackageAccess;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;


public class LargeMain {
    private static final String FILE = "/Users/pj.fanning/Downloads/VeryLargeDataFile.xlsx";

    public static void main(String[] args) {
        try (Workbook wb = new XSSFWorkbook(OPCPackage.open(FILE, PackageAccess.READ))) {
            Sheet sheet = wb.getSheetAt(0);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

}
Comment 16 Todd Heidenthal 2022-02-19 18:59:56 UTC
Thanks, PJ.  That form works for me as well.  However, due to some other bug with POI locking files, we had to use a different form.  (I need to verify whether the locking bug is still a problem - I will do that soon.)

The form we use that produces the error is

FileInputStream in = new FileInputStream(filename);
OPCPackage container = OPCPackage.open(in);

The exception occurs in the open() call
Comment 17 PJ Fanning 2022-02-19 19:16:00 UTC
I reproduced the issue with that approach to creating OPCPackage.

This setting helps:

org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.setThresholdBytesForTempFiles(100_000_000);

causes temp files to be used for large zip entries
Comment 18 PJ Fanning 2022-02-19 19:37:40 UTC
IOUtils.toByteArray does seem wrong - it does not take into account the IOUtils.setByteArrayMaxOverride(int) setting.

Changed with r1898229

If any committer has a problem with my change, please get in touch.
Comment 19 PJ Fanning 2022-02-21 17:23:40 UTC
Fix for IOUtils.setByteArrayMaxOverride and IOUtils.toByteArray will be in v5.2.1
Comment 20 Andreas Beeker 2022-02-21 21:59:54 UTC
I don't understand the current version of IOUtils.toByteArray.
It looks like the length parameter is now supposed to be Integer.MIN_VALUE for reading until EOF.

I've noticed that reading from a socket with IOUtils.toByteArray(is,  getMaxTimestampResponseSize()) isn't working anymore.

So if MIN_VALUE is the new MAX_VALUE, that's ok, and we need to adapt the javadocs. This only hits people using the "internal" api and don't know about it.
Comment 21 PJ Fanning 2022-02-21 22:29:46 UTC
Thanks Andi for looking at this. I got everything else working but TSPTimeStampService is still causing issues. I had to disable TestPOIXMLDocument testOSGIClassLoading. No matter what I do to TSPTimeStampService, this test still fails. Its use of IOUtils.toByteArray seems so simple but something is happening that I don't understand.

IOUTils length of min-int is just a way of disabling the some of the checks but there are many competing use cases, it is hard to get one method to handle them all.
Comment 22 Todd Heidenthal 2022-02-22 15:05:13 UTC
Just as an FYI, I verified that the locking issue that we previously had does not occur any longer, so we can use the first form you provided.

Thank you for all your help in resolving this issue so quickly.  I really appreciate it
Comment 23 Ulaganathan 2022-08-23 20:51:14 UTC
what is the resolution for this . I have 50MB file. while reading it i got a the same issue
As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride()
	at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:599)