Documentation of Version 5.0.0 (2021-01-20) claims: "new experimental DeferredSXSSFWorkbook which avoids temp files by lazily generating rows (see DeferredGeneration in poi-examples)" However, writing a DeferredSXSSFWorkbook still causes temporary files to be written to disk, as it still uses org.apache.poi.xssf.streaming.SXSSFWorkbook.write(OutputStream) which does: //Save the template File tmplFile = TempFile.createTempFile("poi-sxssf-template", ".xlsx"); Is this a bug or intended?
It was not the intention to 100% avoid temp files but to create fewer temp files. Also note that the code is still regarded as experimental. If you don't want any temp files, then maybe you can submit a patch. I suspect that any change would need to be configurable because the change could lead to more memory consumption - which other users may not want.
We are facing the same issue. In some environments, e.g. for application server scenarios, any usage of the file system might be prohibited. In these cases, POI / SXSSF can't be used. A tempfile-free variant would really be nice and an important improvement to the project. BTW why is the current deferred implementation marked as "experimental"? Are there any lacking features or known bugs? Or is it just not tested well enough to consider it stable? Thanks!
the deferred sxssf code is not widely used and is not well tested - it is not core code and was submitted with minimal test coverage
One alternative is https://github.com/dhatim/fastexcel which may support excel generation without the use of temp files.
Yes, fastexcel completely avoids temporary files. Unfortunately, fastexcel also lacks important features (which POI offers). Is there a roadmap / release target for having this feature production-ready? BTW, what was the reason implementing deferred workbooks?
POI has no roadmap - it is a mature project with a small volunteer base - essentially, if you want a feature, you write it yourself - other developers might take an interest and review your ideas and/or code
I've added r1893896 - try it out if you like - I'm in 2 minds as to whether to leave this code in for the release - I'm not sure it is a good approach and I don't have much time to do much more with it. Up until now, users have been happy enough to live with the fact the SXSSF code relies on using temp files. ZipInputStreamZipEntrySource has a limitation in that it cannot handle zip entries that are more than 4Gb uncompressed - unless you enable a feature ZipInputStreamZipEntrySource#setThresholdBytesForTempFiles (which is off by default) and as you can infer this relies on temp files if enabled.
Thanks, I'll have a look!
Thanks for the added feature. I ran into another problem trying to implement the DeferredSXSSFWorkbook. Currently my workflow looks like this: 1. Create DeferredSXSSFWorkbook. 2. Create Sheet. 3. Create Row. 4. Create Cell. But the file turns out empty. This is because createRow() of SXSSFSheet tries to flush the row to the writer here: (SXSSFSheet.java Line 1895) if (_writer != null) _writer.writeRow(rowIndex, row); _rows.remove(firstRowNum); But the writer for the DeferredSXSSFWorkbook only gets created during the writing process in writeAvoidingTempFiles() so _writer is null and the row gets removed even if it was not written yet. Is there a way to create a DeferredSXSSFWorkbookWorkbook following the steps above (without using a RowGenerator)? Could you please share an example or JUnitTest if there is one?
Can I flip the problem - can you provide a reproducible test case? There are test cases for the existing code - all our source code is publicly accessible so I'll let you look for it yourself. As I said above, the DeferredSXSSFWorkbook was contributed (ie not written by the Apache POI team) and the test coverage is not great but there is some.
Can I ask why you are using DeferredSXSSFWorkbook and not the more widely used SXSSFWorkbook? SXSSFWorkbook works in a way more similar to how you seem to want the code to work - ie create a workbook, then a sheet, then a row and then a cell.
Created attachment 38075 [details] JUnitTest for DeferredSXSSFWorkbook usage I can not use the SXSSFWSheets, since they create a SheetDataWriter in their constructor and the SheetDataWriter always produces a temp file (see SheetDataWriter.java line 72), which is a showstopper in my case. The DeferredSXSSFWorkbook on the other hand uses a StreamingSheetWriter, which does not produce temp files. So i would either need a SXSSFSheet that uses a StreamingSheetWriter or a DeferredSXSSFWorkbook, which lets me flush rows before the writing process. I have searched through your examples and tests but all of them use the RowGenerator, which i cannot use. I have added a JUnitTest to illustrate my problem.
DeferredSXSSFWorkbook is for row generators and row generators alone. Apache POI is a volunteer project and if you can find a volunteer to write you a new version of SXSSFWorkbook/DeferredSXSSFWorkbook to suit your needs, then best of luck - POI team will try to help them if they run into issues.
https://github.com/dhatim/fastexcel is still work trying - it may be able to generate a xlsx without temp files.
@PJ Fanning: Thanks for your support! As mentioned above, fastexcel unfortunately lacks a couple of POI features which we need (e.g. https://github.com/rap2hpoutre/fast-excel/issues/96). Therefore we would love to have a 100% tempfile-free workbook creation within POI with all (or most) features available and without the need for changing POI APIs (e.g. to the "row generator" API). Can you give hints for a proper implementation of such a feature and do you see any showstoppers when considering the POI architecture? Thanks!
temp files appear all over POI code - I have no interest in rewriting POI for your use case - you will need to find someone if you are not willing to get involved in writing code yourselves
Sorry for the misunderstanding, I don't ask you to implement this, I am asking you for any hints for considering an implementation by ourselves. I don't know anything about POIs internal architecture (yet), for which things tempfiles are currently used and whether they can be replaced by in-memory data structures (for size-limited data) or by just streaming the data out (for the actual, possibly unbound data to prevent any OOM). Maybe it is impossible without refactoring major parts of POI at all - then this would be a nice warning for us :)