Investigating TIKA-2058, we discovered HeapByteBuffers are being cached unnecessarily into buffersToClean, because they need no special unmapping, when datasource is not writable. A single instance of FileBackedDataSource consumed 5.7GB of heap, triggering OOM. More details on https://issues.apache.org/jira/browse/TIKA-2058 Patch will be attached.
r1760816 Thank you!
POI is supposed to support/write to files larger than 2GB? If not, I can propose a new patch to reduce the number of mmapping when the file is writable.
Yes, it should be able, although we have at least one bug-entry stating that some versions of zip-implementations cause issues when opening the zipped-XML-based file formats. Please create a separate Bug and attach the patch there so we can discuss it post-3.15 release.
Luis Filip Nassif: Hi did you have any progress on the patch to reduce the number of mmappings? We get some OOM exception in FileBackedDataSource. Trying to create an NPOIFileSystem like this: result = new NPOIFSFileSystem(file, false); Then reading entries from the file to compute a hash over all content and at the end appending an additional DocumentEntry. But we get an OOMException when reading the data (the data is read piece-by-piece into a 1024 byte buffer and is not kept around). Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940) [rt.jar:1.8.0_111] at org.apache.poi.poifs.nio.FileBackedDataSource.read(FileBackedDataSource.java:94) [poi-3.16-beta1.jar:3.16-beta1] at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getBlockAt(NPOIFSFileSystem.java:484) [poi-3.16-beta1.jar:3.16-beta1] at org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.next(NPOIFSStream.java:169) [poi-3.16-beta1.jar:3.16-beta1] ... 85 more Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) [rt.jar:1.8.0_111] at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937) [rt.jar:1.8.0_111] ... 88 more
Hi Marcus, No, I have not tried to write the patch, because the need to handle files larger than 2GB. Are you using a x64 jvm? Have you tried to increase ulimit system setting?