Bug 58915 - memory eaters
Summary: memory eaters
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: SXSSF (show other bugs)
Version: 3.13-FINAL
Hardware: PC All
: P2 trivial with 1 vote (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-23 12:31 UTC by givemejob
Modified: 2016-03-29 18:36 UTC (History)
0 users



Attachments
memory eaters (211.34 KB, image/png)
2016-01-23 12:31 UTC, givemejob
Details
10min monitoring (207.97 KB, image/png)
2016-01-23 12:33 UTC, givemejob
Details
1hour monitoring (178.00 KB, image/png)
2016-01-23 12:34 UTC, givemejob
Details

Note You need to log in before you can comment on or make changes to this bug.
Description givemejob 2016-01-23 12:31:55 UTC
Created attachment 33480 [details]
memory eaters

We did stress test:
 - Tomcat 7 Xmx100m
 - JVM 7
 - on request we start 10 threads and each thread writes large xlsx file with 10 rows sliding window (SXSSF) to the file

Library we use:
 - jxls 2.8
 - jxls-poi 1.0.7
 - Apache POI 3.13 (poi, poi-ooxml, poi-ooxml-schemas)
 - xmlbeans 2.6.0

It was assumed that window should reduce memory consumption to volume memory needed for window. 

For large xlsx files we can see what objects Xobj$ElementXobj, Xobj$AttrXobj, STRefImpl, CTMergeCellImpl eat memory.
Comment 1 givemejob 2016-01-23 12:33:38 UTC
Created attachment 33481 [details]
10min monitoring
Comment 2 givemejob 2016-01-23 12:34:15 UTC
Created attachment 33482 [details]
1hour monitoring
Comment 3 givemejob 2016-01-25 10:54:33 UTC
Looks like bug of jxls 2.2.8
Wrong version of jxls in the first comment. (2.8 -> 2.2.8)
Comment 4 givemejob 2016-01-27 12:29:19 UTC
POI SXSSF uses SXSSFSheet for excel sheet representation.

SXSSFSheet is wraper for XSSFSheet with 'overrided' createRow method. (support for row flush capability)

Other methods of SXSSFSheet delegates call to XSSFSheet.

Therefore all information about workbook are stored in memory except rows (cell value and etc)
For example SXSSFSheet.addMergedRegion add CellRangeAddress object in memory when CellRangeAddress count not constant value. On big data this produce chart like "1hour monitoring".

Сall of SXSSFWorkbook.write method marshal java objects (except rows) to ByteArrays (see MemoryPackagePartOutputStream) therefore it produce memory consumption jump at the end of SXSSFWorkbook.write call.

If reasoning is correct then in my opinion this is should be pointed into "Quick Guide".
I did not find the information until not investigate it.
Comment 5 Dominik Stadler 2016-03-29 17:31:33 UTC
Adjusted javadoc in r1737024 and the page at https://poi.apache.org/spreadsheet/how-to.html#sxssf via r1737025.
Comment 6 Dominik Stadler 2016-03-29 18:36:56 UTC
Correct revision for the webpage is r1737028.