Bug 60140 - OOM caused by Memory Leak in FileBackedDataSource
Summary: OOM caused by Memory Leak in FileBackedDataSource
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: POIFS (show other bugs)
Version: 3.15-dev
Hardware: PC All
: P2 major (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-15 00:13 UTC by Luis Filipe Nassif
Modified: 2017-01-30 03:09 UTC (History)
1 user (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luis Filipe Nassif 2016-09-15 00:13:46 UTC
Investigating TIKA-2058, we discovered HeapByteBuffers are being cached unnecessarily into buffersToClean, because they need no special unmapping, when datasource is not writable.

A single instance of FileBackedDataSource consumed 5.7GB of heap, triggering OOM.

More details on https://issues.apache.org/jira/browse/TIKA-2058

Patch will be attached.
Comment 1 Tim Allison 2016-09-15 00:20:26 UTC
r1760816

Thank you!
Comment 2 Luis Filipe Nassif 2016-09-15 00:35:06 UTC
POI is supposed to support/write to files larger than 2GB? If not, I can propose a new patch to reduce the number of mmapping when the file is writable.
Comment 3 Dominik Stadler 2016-09-15 07:13:45 UTC
Yes, it should be able, although we have at least one bug-entry stating that some versions of zip-implementations cause issues when opening the zipped-XML-based file formats. 

Please create a separate Bug and attach the patch there so we can discuss it post-3.15 release.
Comment 4 Marcus Lundblad 2017-01-26 15:12:05 UTC
Luis Filip Nassif:

Hi did you have any progress on the patch to reduce the number of mmappings?
We get some OOM exception in FileBackedDataSource.

Trying to create an NPOIFileSystem like this:

result = new NPOIFSFileSystem(file, false);

Then reading entries from the file to compute a hash over all content and at the end appending an additional DocumentEntry.

But we get an OOMException when reading the data (the data is read piece-by-piece into a 1024 byte buffer and is not kept around).

Caused by: java.io.IOException: Map failed
	at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940) [rt.jar:1.8.0_111]
	at org.apache.poi.poifs.nio.FileBackedDataSource.read(FileBackedDataSource.java:94) [poi-3.16-beta1.jar:3.16-beta1]
	at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getBlockAt(NPOIFSFileSystem.java:484) [poi-3.16-beta1.jar:3.16-beta1]
	at org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.next(NPOIFSStream.java:169) [poi-3.16-beta1.jar:3.16-beta1]
	... 85 more
Caused by: java.lang.OutOfMemoryError: Map failed
	at sun.nio.ch.FileChannelImpl.map0(Native Method) [rt.jar:1.8.0_111]
	at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937) [rt.jar:1.8.0_111]
	... 88 more
Comment 5 Luis Filipe Nassif 2017-01-30 03:09:40 UTC
Hi Marcus,

No, I have not tried to write the patch, because the need to handle files larger than 2GB.

Are you using a x64 jvm? Have you tried to increase ulimit system setting?