The SXSSF temp xml file is grows to a very large size. e.g. for a 20 MB csv data the size of the temp xml file become few GB large. In order to reduce this I have created a path and tested it. Request for the review and application of the same to the. I have changed the extension of the temp file from xml to gz, which is another cleanliness factor >From 3e95b1880a04657ec93e5626720b1d40b5487ee4 Mon Sep 17 00:00:00 2001 From: Santosh <santosh.kharolkar@tcs.com> Date: Tue, 4 Oct 2011 12:11:23 +0530 Subject: [PATCH] Compress the temporary xml file --- .../org/apache/poi/xssf/streaming/SXSSFSheet.java | 9 ++++++--- 1 files changed, 6 insertions(+), 3 deletions(-) diff --git a/src/ooxml/java/org/apache/poi/xssf/streaming/SXSSFSheet.java b/src/ooxml/java/org/apache/poi/xssf/streaming/SXSSFSheet.java index 8c1b998..567c9e8 100644 --- a/src/ooxml/java/org/apache/poi/xssf/streaming/SXSSFSheet.java +++ b/src/ooxml/java/org/apache/poi/xssf/streaming/SXSSFSheet.java @@ -22,6 +22,8 @@ import java.util.Iterator; import java.util.TreeMap; import java.util.Map; +import java.util.zip.GZIPOutputStream; +import java.util.zip.GZIPInputStream; import org.apache.poi.ss.usermodel.*; import org.apache.poi.ss.util.CellReference; @@ -1284,9 +1286,9 @@ public class SXSSFSheet implements Sheet, Cloneable public SheetDataWriter() throws IOException { - _fd = File.createTempFile("poi-sxxsf-sheet", ".xml"); + _fd = File.createTempFile("poi-sxxsf-sheet-xml", ".gz"); _fd.deleteOnExit(); - _out = new BufferedWriter(new FileWriter(_fd)); + _out = new OutputStreamWriter(new GZIPOutputStream (new FileOutputStream(_fd))); } public int getNumberOfFlushedRows() { @@ -1308,7 +1310,8 @@ public class SXSSFSheet implements Sheet, Cloneable { _out.flush(); _out.close(); - return new FileInputStream(_fd); + + return new GZIPInputStream ( new FileInputStream(_fd)); } /** -- 1.7.6.msysgit.0
Update the file name >From 3e95b1880a04657ec93e5626720b1d40b5487ee4 Mon Sep 17 00:00:00 2001 From: Santosh <santosh.kharolkar@tcs.com> Date: Tue, 4 Oct 2011 12:11:23 +0530 Subject: [PATCH] Compress the temporary xml file --- .../org/apache/poi/xssf/streaming/SXSSFSheet.java | 9 ++++++--- 1 files changed, 6 insertions(+), 3 deletions(-) diff --git a/src/ooxml/java/org/apache/poi/xssf/streaming/SXSSFSheet.java b/src/ooxml/java/org/apache/poi/xssf/streaming/SXSSFSheet.java index 8c1b998..567c9e8 100644 --- a/src/ooxml/java/org/apache/poi/xssf/streaming/SXSSFSheet.java +++ b/src/ooxml/java/org/apache/poi/xssf/streaming/SXSSFSheet.java @@ -22,6 +22,8 @@ import java.util.Iterator; import java.util.TreeMap; import java.util.Map; +import java.util.zip.GZIPOutputStream; +import java.util.zip.GZIPInputStream; import org.apache.poi.ss.usermodel.*; import org.apache.poi.ss.util.CellReference; @@ -1284,9 +1286,9 @@ public class SXSSFSheet implements Sheet, Cloneable public SheetDataWriter() throws IOException { - _fd = File.createTempFile("poi-sxxsf-sheet", ".xml"); + _fd = File.createTempFile("poi-sxssf-sheet-xml", ".gz"); _fd.deleteOnExit(); - _out = new BufferedWriter(new FileWriter(_fd)); + _out = new OutputStreamWriter(new GZIPOutputStream (new FileOutputStream(_fd))); } public int getNumberOfFlushedRows() { @@ -1308,7 +1310,8 @@ public class SXSSFSheet implements Sheet, Cloneable { _out.flush(); _out.close(); - return new FileInputStream(_fd); + + return new GZIPInputStream ( new FileInputStream(_fd)); } /** -- 1.7.6.msysgit.0
Request you to pls add this patch for next beta5 release. It saves me to not keep patching the releases.
It was discussed on the mailing list that compression of the temp files should be optional. I'd rather not apply your patch in its current form. Yegor
If you drop a hint on how to add the option for it, I will rework the patch. Should it be a flag in the sxssfworkbook/sheet class? What is your preferred way to set options?
IMO the best would be to define a method to pass a custom instance of SheetDataWriter : SXSSFWorkbook#setSheetDataWriter (SheetDataWriter writer) In your case you would pass a GZSheetDataWriter that supports compression of the temp files. Does it make sense to you? Yegor (In reply to comment #4) > If you drop a hint on how to add the option for it, I will rework the patch. > Should it be a flag in the sxssfworkbook/sheet class? What is your preferred > way to set options?
Created attachment 28031 [details] First cut implementation for data store writer class
I have the first cut implementation ready. But setting the writer in the workbook class is difficult. Since we need multiple instances of this writer in constructor of workbook. (one per sheet) Can I suggest the polocy to be set in the Sheet class as static or some other way. I am attaching the first cut implementation to this
Committed in r1212330 Santosh, When I suggested passing a subclass of SheetDataWriter I didn't realize the complications. Since each sheet uses its own instance of SheetDataWriter you pass the class name instead of an instance and this is not good in my opinion. I re-worked the patch and used your original idea: pass a flag to SXSSFWorkbook. In current implementation the gzip compression is turned on as follows: SXSSFWorkbook wb = new SXSSFWorkbook(); wb.setCompressTempFiles(true); // temp files will be gzipped I think it is much more user-friendly than passing a class name in the constructor. Sorry that my previous suggestion mislead you. In any case, thank you for the patch. You've written 90% of it and I just put it in a slightly different shape. Regards, Yegor
Thank you., Though we will need another way to pass this flag as well in next patch I will work that out. In the use case where the wb is created from pre-existing wb. We can track that separately. Another easy solution would be to make the flag a class level static flag. Which can solve all the cases required.
Created attachment 28066 [details] Add another constructor to be able to set the compression flag during loading of existing workbook Add another constructor to be able to set the compression flag during loading of existing workbook
A small patch attached to complete the action checked in by yegor. Add another constructor to be able to set the compression flag during loading of existing workbook
Applied in r1215080 Yegor (In reply to comment #11) > A small patch attached to complete the action checked in by yegor. > > Add another constructor to be able to set the compression flag during loading > of existing workbook