Bug 66500 - CodepointsUtil consumed alot of memory when text is large
Summary: CodepointsUtil consumed alot of memory when text is large
Status: NEEDINFO
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: unspecified
Hardware: PC Mac OS X 10.1
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-03-01 09:43 UTC by shpodg
Modified: 2023-03-11 08:04 UTC (History)
0 users



Attachments
outputEscapedString consumed alot of memory (122.64 KB, image/jpeg)
2023-03-01 09:43 UTC, shpodg
Details

Note You need to log in before you can comment on or make changes to this bug.
Description shpodg 2023-03-01 09:43:33 UTC
Created attachment 38514 [details]
outputEscapedString consumed alot of memory

When a large text excel cell is generating, every codepoint is accessed and it was converted to a new String. These logic can consume alot of memory.

Is there any optimization for this ?
Comment 1 PJ Fanning 2023-03-01 10:23:14 UTC
May I ask why you are putting massive strings in individual cells?

I see little point in optimising POI memory usage in this case. The code point util code here solves a real world problem (support for UTF-8 surrogate chars).

It's not a problem that many users will hit but I'd prefer the code handles it.

In theory, we could allow the setting of a flag on SXSSFWorkbook that disables this use of CodepointUtil (opt out). If you were to produce a PR or patch like this, I would consider merging it.
Comment 2 PJ Fanning 2023-03-01 10:39:55 UTC
Also, all the small strings created by code point util should be garbage collectable as soon as they are iterated over - so if you run into memory trouble, the garbage collector should be able to claim back a lot of this memory. The original large string value for the Cell will still be needed though.