Bug 65042 - Adding pictures to workbook causes memory leak
Summary: Adding pictures to workbook causes memory leak
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: SXSSF (show other bugs)
Version: 4.1.2-FINAL
Hardware: Macintosh Mac OS X 10.4
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-12-30 07:13 UTC by Runzhi
Modified: 2021-10-17 11:25 UTC (History)
0 users



Attachments
Memory Usage Dump (239.99 KB, image/png)
2020-12-30 07:13 UTC, Runzhi
Details
Memory I count in program (37.38 KB, image/png)
2020-12-30 07:15 UTC, Runzhi
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Runzhi 2020-12-30 07:13:21 UTC
Created attachment 37670 [details]
Memory Usage Dump

Sxssf using Xssf handle its picture. So picture will save in memory, not flush with row. This means if we have a lot of pictures had to write in xlsx, this will cause memory leak which was I am struggling with. I can't find a way to flush picture data before workbook.write().

The test case is, 300kb picture, one row write 10 pictures. JVM -Xmx128m.
There will be an oom when write 50 rows. In that time, the picture data list will have 60mb in memory.
Comment 1 Runzhi 2020-12-30 07:15:35 UTC
Created attachment 37671 [details]
Memory I count in program
Comment 2 Dominik Stadler 2020-12-30 17:14:03 UTC
Can you provide a small piece of code which reproduces the problem for you? Ideally as a self-contained unit-test so we can reproduce the problem and take a closer look?
Comment 3 Runzhi 2021-01-05 07:54:07 UTC
Sorry for the late response.
I have created a repository that can reproduce the problem.
Run unit test with max heap size 128m.

Repository URL is :
https://github.com/foresx/poi-memory-leak-demo
Comment 4 Dominik Stadler 2021-01-06 15:38:54 UTC
Thanks for the detailed reproducing code, I took a look at your sample-project now.

Pictures in .xlsx files are not stored per "row" or "sheet", but rather globally in a separate structure along the other parts. 

The current SXSSFWorkbook only flushes and removes rows based on the "rowAccesswindowSize".

So flushing picture data for SXSSFWorkbook is currently not supported, we can consider adding it as an enhancement, naturally it will happen sooner if you can propose an implementation that offers this as additional option for SXSSFWorkbook in some way, however it will require some coding as you likely need to flush out pictures in a similar way as the rows and then a write-time combine the information into the final document as well.
Comment 5 Runzhi 2021-01-07 02:24:16 UTC
Thanks a lot. When I have free time, I think I will have a try.
Comment 6 PJ Fanning 2021-10-13 14:22:37 UTC
I think the issue is that we don't have a TempFilePackagePart that can optionally be used instead of MemoryPackagePart. https://poi.apache.org/apidocs/dev/org/apache/poi/openxml4j/opc/ZipPackage.html#createPartImpl-org.apache.poi.openxml4j.opc.PackagePartName-java.lang.String-boolean-

This would allow us to avoid using memory (while slowing things down by using temp files).

The other issue is how to configure the code so that it can choose whether to use MemoryPackagePart or TempFilePackagePart.
Comment 7 PJ Fanning 2021-10-13 18:32:46 UTC
I added r1894203 - still experimental/beta - needs testing still - may be removed or modified
Comment 8 PJ Fanning 2021-10-17 11:25:22 UTC
The features to change ZipPackage to use temp files to save memory will be a beta feature in POI 5.1.0