Bug 56447

Summary: IllegalArgumentException when initializing NPOIFSFileSystem object
Product: POI Reporter: mskan
Component: POIFSAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 3.10-FINAL   
Target Milestone: ---   
Hardware: Macintosh   
OS: All   

Description mskan 2014-04-23 07:53:25 UTC
I am using POIFS to extract data from OLE2 files generated by an X-ray scanner. Opening small files works fine with both POIFSFileSystem and NPOIFSFileSystem. However, when I try to open large files (say, a few GB or larger), I get an OutOfMemory exception with POIFSFileSystem, and with NPOIFSFileSystem, I get the following exception:

java.lang.IllegalArgumentException
	at java.nio.ByteBuffer.allocate(ByteBuffer.java:330)
	at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:275)

Could this be a bug in NPOIFSFileSystem?
Comment 1 Nick Burch 2014-04-23 08:38:10 UTC
Any chance you could attach a debugger, and check the values in NPOIFS of maxSize (line 274) and _header.getBATCount()?
Comment 2 mskan 2014-04-23 10:00:55 UTC
I have now created a minimal example that throws the following exception:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
	at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
	at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:273)
	at TestApp.main(TestApp.java:16)

The values in NPOIFS are:

_header.getBATCount() = 493
maxSize = 2067795968

The size of the file that I am opening is 3,365,793,792 bytes.

(In reply to Nick Burch from comment #1)
> Any chance you could attach a debugger, and check the values in NPOIFS of
> maxSize (line 274) and _header.getBATCount()?
Comment 3 Nick Burch 2014-04-23 11:19:06 UTC
The NPOIFSFileSystem constructor that takes an InputStream buffers the whole file into memory. So, your heap space needs to be at least the size of the file, plus a bit extra. Nothing we can do to help there - you just have to increase your heap

Alternately, NPOFSFileSystem has a constructor that takes a File, that has a much much lower memory footprint as a File allows for Random Access (InputStream does not)

If you increase your heap to something like 15% bigger than the file, does it work with an InputStream?

If you switch to a File, does that fix it without a bigger heap?
Comment 4 mskan 2014-04-23 12:30:05 UTC
I realize that the numbers that I gave you before were for a file of size 2,066,718,720 bytes. Increasing the heap size (using -Xmx3G) helps when opening the file as an InputStream, and it also works with the standard heap size when I use File instead of InputStream. However, when I try opening the file that is 3,365,793,792 bytes, I get this exception when passing a File object to NPOIFSFileSystem:

Exception in thread "main" java.lang.IllegalArgumentException
	at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:275)
	at org.apache.poi.poifs.nio.FileBackedDataSource.read(FileBackedDataSource.java:57)
	at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getBlockAt(NPOIFSFileSystem.java:426)
	at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readBAT(NPOIFSFileSystem.java:402)
	at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readCoreContents(NPOIFSFileSystem.java:377)
	at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:201)
	at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:162)
	at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:143)

Could this be because of integer overflow? If I use the InputStream and increase the heap size to 4GB, then maxSize is equal to -926937088 (and _header.getBATCount() = 803), and I get the following exception:

Exception in thread "main" java.lang.IllegalArgumentException
	at java.nio.ByteBuffer.allocate(ByteBuffer.java:330)
	at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:273)
	at ReadTXRM.main(ReadTXRM.java:19)
Comment 5 Nick Burch 2014-04-23 12:37:16 UTC
Looks like we might have one or more ints that need to be a long. 

Are you able to generate a file that's >2gb, but data mostly 0s so it can be easily compressed down to something small? That can then be used in unit tests
Comment 6 mskan 2014-04-23 12:55:57 UTC
Unfortunately I don't think that I can create such a file, but I can give you the file that causes the problem (3.37 GB uncompressed and 2.16 GB compressed with bzip2).
Comment 7 Nick Burch 2014-04-24 16:15:59 UTC
Can you try now?

For an InputStream, you should now get a helpful IllegalArgumentException on a >2gb file, because ByteBuffer has a 2gb limit

For a File, we ought to be able to go bigger. I've switched a couple of ints to longs, can you see if that helps?
Comment 8 mskan 2014-04-25 05:41:59 UTC
Sure, I can try it. Where do I find the new version? Can you send me a jar file?
Comment 9 mskan 2014-04-25 06:03:12 UTC
I found the nightly build (poi-3.11-beta1-20140424). It works! Now I can read large data files using File. Thanks for all your help, I really appreciate it!