Bug 56625

Summary: Constructor for HSSFWorkbook leads to memory leak
Product: POI Reporter: Nope Random <arbitdude007>
Component: HSSFAssignee: POI Developers List <dev>
Status: RESOLVED DUPLICATE    
Severity: critical CC: arbitdude007
Priority: P2 Keywords: APIBug
Version: 3.8-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: Linux   

Description Nope Random 2014-06-15 13:19:01 UTC
The constructor for org.apache.poi.hssf.usermodel.HSSFWorkbook

public HSSFWorkbook(DirectoryNode directory, boolean preserveNodes) throws IOException 

creates a DocumentInputStream instance, but never gets around to closing it.

So even though the input stream is garbage collected, the resource allocated by the OS to the JVM remains. This causes a "Too many open files" error eventually, after the ulimit for the user has been reached.

<b>Environment Details :</b>

JVM -> 
javac 1.6.0_31

JRE ->
java version "1.6.0_31"
OpenJDK Runtime Environment (IcedTea6 1.13.3) (6b31-1.13.3-1ubuntu1~0.12.04.2)
OpenJDK 64-Bit Server VM (build 23.25-b01, mixed mode)


<b>Steps to reproduce :</b>

1) Do an lsof -p <jvm process id> before reading the xls file

2) Use code below : 

Workbook workbook = WorkbookFactory.create(file);

to get a workbook.

3) Finish reading the xls file. Do a few operations on the workbook and then leave it to be garbage collected.

4) Try and run a System.gc() to collect the input stream. Can monitor this using JProfiler.

5) Do lsof -p <jvm process id>. The file is still visible in lsof :(



<b>Possible fixes :</b>

Emulate the behavior for "java.io.FileInputStream" by overriding finalize() method  in "org.apache.poi.poifs.filesystem.DocumentInputStream" .
Comment 1 Nick Burch 2014-06-15 15:26:40 UTC

*** This bug has been marked as a duplicate of bug 56537 ***