Created attachment 29278 [details] patch to fix leaks of memory and temporary files The current SXSSF implementation leaks memory by calling File.deleteOnExit(). It is recommended that the deleteOnExit method should never be called in a persistent application; see discussion here: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4513817 Additionally, temporary files are not cleaned up until the program exits, as noted in this POI bug: https://issues.apache.org/bugzilla/show_bug.cgi?id=53493 This patch is submitted as an alternative to the fix proposed in the aforementioned bug. I'm reporting a separate bug in order to encompass the additional concern about memory leaks. Here's how the patch works: * all calls to deleteOnExit are eliminated. * the SXSSFWorksheet class walks through all sheets after the output is written, deleting each associated temporary file immediately. * close and delete operations are moved into finally blocks, ensuring cleanup even when an exception causes premature failure. These changes are vitally important for our high-throughput, persistent server that will pound out million row Excel spreadsheets all day long. (We've patched the POI library internally but it would be great to factor these changes into the distribution.) I would volunteer to create unit tests for this improvement but there isn't an existing unit testing framework for the SXSSF component, so far as I can tell. I'm open to contributing on that front in the future, if there's interest. Thanks!
I think this is a duplicate of bug 53493. All that was said there applies here too. This patch suffers from the same shortcomings as the patch proposed in that bug. I vote for a straightforward solution SXSSFWorkbook.dispose() instead of an implicit destroy-on-write which is not in accordance with the contract for the function SXSSFWorkbook.write(). The File.deleteOnExit() should be removed if it indeed leaks in the scenario it is used in SXSSF (File.deleteOnExit() is needed only if the garbage collector couldn't collect the sheets by the time the program exits. Are you perhaps keeping references to your workbooks or sheets that prevent garbage collection and subsequent temp file deletion?).
Thank you for the quick response. I reported this as a new issue because it encompasses an additional problem, the memory leak via deleteOnExit. I also included the part about temporary file cleanup (bug 53493) because the problems are entangled; it's convenient to address both symptoms at the same time. The generally accepted wisdom about finalize methods is not to use them, ever. It isn't a reliable mechanism for doing anything; one should never assume that finalizers will run, and the object graph tends to be seen in an inconsistent state within these methods, often causing weird and unpredictable problems. The patch for 53493 that calls finalize() explicitly is a direct violation of the finalize contract; it must only run once, if ever. See Effective Java or this stackoveflow thread: http://stackoverflow.com/questions/158174/why-would-you-ever-implement-finalize Solving the problem with a new dispose method seems reasonable; I'll submit a revised patch today.
Another note: file handles will be leaked if there's an exception during the workbook write process. I'm fixing that problem in the patch as well.
Created attachment 29285 [details] patch to fix bug Revised as specified by Alex Geller. Changes since last patch: * Renamed cleanup methods as dispose * Moved temporary file cleanup to a separate public dispose method * Added a unit test
Thanks for the patch, I put it in my TODO list to review and apply. Yegor (In reply to comment #4) > Created attachment 29285 [details] > patch to fix bug > > Revised as specified by Alex Geller. Changes since last patch: > * Renamed cleanup methods as dispose > * Moved temporary file cleanup to a separate public dispose method > * Added a unit test
Patch applied in r1384784 with some tweaks. SheetDataWriter.dispose must close the corresponding Writer object before deleting the temp file. Without this change your unit test fails on Windows. I also tweaked SXSSF unit tests to cleanup temp files: every new instance of SXSSFWorkbook in the tests must be accompanied by workbook.dispose(). We no longer use the trick with File.deleteOnExit and must free the resources explicitly. Regards, Yegor