Bug 57200 - SXSSF saving fails sometimes as TempFile creation fails
Summary: SXSSF saving fails sometimes as TempFile creation fails
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: SXSSF (show other bugs)
Version: 3.13-dev
Hardware: PC All
: P2 major (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
: 57947 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-11-11 17:06 UTC by Philip Welch
Modified: 2016-11-01 20:13 UTC (History)
1 user (show)



Attachments
Removes usage of auto-deleted temp sub-directory (2.82 KB, patch)
2015-03-01 13:38 UTC, Yaniv Kunda
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Philip Welch 2014-11-11 17:06:34 UTC
SXSSF creates a temporary file when its saved using the class org.apache.poi.util.TempFile. When a tempfile is first created, the directory name is saved within this class, in the static singleton DefaultTempFileCreationStrategy object. This directory is only created on the first save and its never checked if this directory still exists, as you can see below:

        public File createTempFile(String prefix, String suffix) throws IOException {
            // Identify and create our temp dir, if needed
            if (dir == null)
            {
                dir = new File(System.getProperty("java.io.tmpdir"), "poifiles");
                dir.mkdir();
                if (System.getProperty("poi.keep.tmp.files") == null)
                    dir.deleteOnExit();
            }

The directory is automatically deleted when the virtual machine exits. (I believe the directory is also created automatically when you load an xlsx using the event api).

So, if you have two applications both loading and saving excels and you then close one this directory is automatically deleted. Now both applications used exactly the same temp directory. Therefore if you try to save your file in the second application which is still open, an IOException is instead raised as it tries to create the tmp file in the no-longer existing directory. The user cannot save their xlsx any longer. So basically you cannot have two applications open on a computer at the same time which load/save using POI (or at least you can, but you have to close them both at the same time as one left open will no longer function).

I hope my explanation is clear. Basically you need to add a check to this function that the directory still exists and mkdir if not...
Comment 1 Nick Burch 2014-11-11 17:27:06 UTC
the temp file strategy is pluggable, could you not provide your own to handle the case of multiple JVMs starting/stopping and tidying up after each other?
Comment 2 Philip Welch 2014-11-11 17:36:25 UTC
Thanks - I'll take a look at how you plug different temp file strategies in (do you have a handy link?) as a fix for my own application. 

However the default behaviour of POI will break whenever two applications are open using it so presumably this needs to be fixed within POI itself going forward. Particularly as this could even be two completely different applications which you wouldn't ever expect to interact.

The default behaviour should be a unique tmp directory per virtual machine using POI to ensure no unexpected interactions occur between POIs in different virtual machines.
Comment 3 Yaniv Kunda 2015-03-01 13:38:40 UTC
Created attachment 32537 [details]
Removes usage of auto-deleted temp sub-directory

This removes the behavior of using a specific directory that gets deleted on VM exit - resolving the problem described in the issue of running more than one VM using POI on the same machine.
Using the default temp dir is much simpler and less error-prone.

This patch does NOT solve the memory leak caused by deleteOnExit(), which should be addressed in a wider scope, as it transfers responsibility to the classes using TempFile and further on to POI users.
Comment 4 htung 2015-08-11 06:17:20 UTC
Porting workaround from #57947 here for reference:

mkdir /tmp/poifiles
touch /tmp/poifiles/.dontdeleteonexitonnonuniquedirectories

Worked for me until the fix.
Comment 5 Oliver.Moeller 2015-10-08 11:42:40 UTC
Issue:
We've got two long-running parallel processes creating several Excel files every few minutes via SXSSF.
After one of the processes terminates the other one cannot write Excel files anymore due to the deletion of folder "C:\Windows\Temp\poifiles\".

Work-around:
As a work-around we call the processes with different "java.io.tmpdir" system property values to use separate temporary folders.
We do not use "poi.keep.tmp.files=true" as we want the temporary files to be deleted.

However, this is no long-term solution for us as our program is normally installed by the customer who can run as many parallel processes as he wants. We can't configure different temp folders for all of these.

Possible solutions:
a. Use temp folder "java.io.tmpdir" instead of its sub-folder "poifiles".
b. Use a unique temp folder name for each process instead of "poifiles".
c. Never delete sub-folder "poifiles".
d. Delete folder "poifiles" only if the user explicitely sets a corresponding system property, e.g. "poi.delete.tmp.dirs" (setting this property must also delete the contained temporary files even if "poi.keep.tmp.files" is specified).
Comment 6 Dominik Stadler 2015-10-08 13:30:29 UTC
Another workaround that we found is to put a dummy-file into the temporary poifiles-directory, then the deletion of the directory will not happen and multiple processes can work with the same tempdir.
Comment 7 Dominik Stadler 2016-03-12 15:38:38 UTC
*** Bug 57947 has been marked as a duplicate of this bug. ***
Comment 8 Dominik Stadler 2016-03-12 17:04:49 UTC
As of r1734719 we do not delete the main temp-directory any more to avoid problems when multiple applications are using tempfiles in POI.

This problem became more pressing lately because now not only SXSSF uses TempFile, but also core ZipPackage, see bug 57947 for a related bug report.
Comment 9 Yaniv Kunda 2016-05-05 13:58:58 UTC
This does not solve the problem of a long-running process (e.g. server) that is trying to create the poifiles temp directory only once during its lifetime.

In the event the directory gets deleted by an external process (temp file cleaner) while the server process is running, the next temp file creation will fail.

Possible solutions:
1) Lock the poifiles directory after it is created (this can probably be done easily by keeping an open file in the directory for the lifetime of the JVM)
2) Ensure that poifiles is created each time before creating a temp file
3) Stop using the poifiles dir - segregating temp files serves no purpose other than perhaps ease of debugging.

I'll be happy to provide a patch - just reply if one of the solutions I suggested is acceptable.
Comment 10 Audun Røe 2016-08-30 10:41:44 UTC
Just ran into this with our java server app on Linux/RHEL, featuring Excel report generation that's used once in a blue moon.. External to the Java process, cron.daily runs tmpwatch to clean up /tmp, taking /tmp/poifiles with it. 

For a workaround, we'll probably use a pluggable strategy as suggested in the comments. I've tested on a direct fork of DefaultTempFileCreationStrategy from POI source, with the only modification being the addition of || !dir.exists() in createTempFile. This seems to work OK so far, and is essentially Yaniv's suggestion #2 I guess. #1 seems like a less safe alternative to me, with all file systems not necessarily behaving the same. And deletions then making /tmp/poifiles essentially invisible on Linux systems until the java process terminates and releases its file descriptors.
Comment 11 Javen O'Neal 2016-09-10 06:14:30 UTC
There are a couple suggested strategies in the TempFileCreationStrategy javadocs
https://poi.apache.org/apidocs/org/apache/poi/util/TempFileCreationStrategy.html

Feel free to subclass TempFileCreationStrategy to something that implements the desired behavior -- store poifiles unsegregated from other system temporary files which are cleaned up by a cron job (I like that solution for a long-running server use case), or have Java do the cleanup either with a timer, a fixed-length container, or something else.

Then it's as simple as TempFile.setTempFileCreationStrategy(YourTempFileCreationStrategy).

If you need a more flexible solution, you need not restrict yourself to TempFile.

We use TempFile and DefaultTempFileCreationStrategy for our unit tests, which is the main purpose for them existing--we never bothered to mark them as @Internal or package-private, but it is unlikely that we would internalize them at this point.

I do not see a huge benefit implementing classes that are not specific to handling Microsoft Office files in POI. I would rather see temporary file creation and deletion moved to a general purpose I/O library so that these classes could appeal to a wider audience. Any ideas if such a library exists beyond java.io and java.nio?
Comment 12 Yaniv Kunda 2016-09-25 07:29:10 UTC
It looks like r1746932 (which fixed this issue for failing tests) should solve the root cause by trying to create the temp dir on each temp file creation.
Comment 13 Dominik Stadler 2016-11-01 20:13:33 UTC
Yes, it seems solution 2) "Ensure that poifiles is created each time before creating a temp file" was implemented via r1746932, thus this should be fixed now.

Other ways can be added as custom strategy as described by Javen.