Summary: | XSSFWorkbook and SXSSFWorkbook produce different output | ||
---|---|---|---|
Product: | POI | Reporter: | Tim Jones <timothy.l.jones> |
Component: | SXSSF | Assignee: | POI Developers List <dev> |
Status: | NEW --- | ||
Severity: | enhancement | ||
Priority: | P2 | ||
Version: | 3.15-FINAL | ||
Target Milestone: | --- | ||
Hardware: | Macintosh | ||
OS: | All |
Description
Tim Jones
2017-04-24 07:45:29 UTC
I tried the following to enable shared strings: SXSSFWorkbook wb = new SXSSFWorkbook( new XSSFWorkbook(),1000,true,true); Sheet sh = wb.createSheet("Sheet"); Row r1 = sh.createRow(1); r1.createCell(1).setCellValue("One"); r1.createCell(2).setCellValue("Two"); Row r2 = sh.createRow(2); r2.createCell(1).setCellValue("One"); r2.createCell(2).setCellValue("Two"); wb.write(output); wb.dispose(); wb.close(); It resulted in a file of 3321 bytes (a new size). There's definitely more going on here. POI inlines strings for SXSSF so that it doesn't have to maintain a shared strings table. This will make the output file larger. I'm not sure how SXSSF handles cell styles, but wouldn't be surprised if it also inlined those to eliminate the need to maintain a style table in memory. There have been a couple discussions of adding an optional shared strings table for SXSSF (this would allow RTF strings). We could probably strip newline characters from the XML output, but this would be a trivial savings in file size. After zip compression, it would be negligible. File size could be improved more easily by adjusting the zip file compression settings. The trade off there would be compression and expanding time. What file sizes are you measuring? The compressed zip or the raw XML? One thing that would be different between the file contents would be the last modified date, which is saved in the XML. We may also save rId's in any order, so long as the reference numbers are used correctly. If we stores these in an unsorted HashMap before serializing, we could make no guarantee of producing binary-identical files, though the information would be the same. I'm measuring the resulting compressed document, yes. My main surprise was that the output was different (starting with the whitespace, and finishing with things like inline styles and string tables). Since this produces quite different file sizes for larger files, it could be added to the documentation, even if identical binary output is not practical. Users of the XSSF and SXSSF classes would want to be aware that there's an additional tradeoff. The ability to tune the level of compression would be a nice-to-have. |