Bug 15716

Summary: [RFE] reduce memory consumption
Product: POI Reporter: Ran Leitman <ranl>
Component: HSSFAssignee: POI Developers List <dev>
Severity: critical CC: trejkaz
Priority: P3    
Version: 3.0-dev   
Target Milestone: ---   
Hardware: PC   
OS: other   

Description Ran Leitman 2002-12-30 10:55:34 UTC
Can not write (and read) large excel sheets (~65,0000 rows X ~15 columns).
I increase the heap size to 512 MB and then it was o.k , but if I need more 
shhets it is impossible to increase heap size. 
Is it possible to use streamming to write excel rows in chunks ? .
I need the application not to use more then about 100 Mb of heap memory.
Comment 1 Avik Sengupta 2002-12-30 11:05:40 UTC
Sorry we CANNOT stream the write. The file format contains upstream pointers to
downstream data, making it impossible to stream. You can read using the Event
API, which allows you to read without keeping the whole file in memory. For
writing such large files, you are unfortunately out of luck. We cannot recommend
using poi if your file is so large. 
Comment 2 Andy Oliver 2002-12-30 13:42:00 UTC
Ran, if you want to beta-test something for me.  I might be able to help.
Comment 3 Andy Oliver 2003-07-24 14:10:22 UTC
This is the performance merge going on for 3.0
Comment 4 Marc CHANTEGREIL 2003-08-27 09:17:46 UTC
Hi, I have as well memory issue when saving bix excel files. I am ready to beta-
test, if something is available.
Comment 5 Andy Oliver 2003-08-27 14:06:46 UTC
the head has the performance code in it.  There are still some places that need to be merged and 
cell types not yet supported.
Comment 6 andry 2003-10-16 13:50:35 UTC
Hi! Seems like memory consumption is only one side of a matter.
Your files when heaving 20000 rows and 60 columns filled with simple
number 1 are 21M in size (2.0-pre3), while the same file generated with
http://www.andykhan.com/jexcelapi/index.html is 8M in size.
The same ratio (approx. 3:1) is in comsumed memory.
Seems memory consumption goes from not compact file
structure being generated by your library.
Comment 7 Jason Height 2006-07-26 10:43:30 UTC
The 3.x series has had some performance improvements. The 3.0 performance code
mentioned above, didnt quite workout. But POI/HSSF has improved quite a lot
since 2.0.

We still dont support chunked writing.

As with all performance there is always something else that can be done, but
often requires significant effort. I am going to close this bug as invalid, we
will never be able to "fix" this issue to 100% satisfaction (& you still cant
write 65000x15 with just 64MB memory)

Comment 8 gopi 2007-02-22 08:03:39 UTC
It seems HSSF works with small mb of excel , but when 10 mb of file is 
requested to process ,the request is hung.  I am using poi version 2.5. 
Is this issue addressed in the newer versions ?
Comment 9 Josh Micich 2008-10-30 13:41:35 UTC
It seems (see bug 45570) that the biggest offender for hssf memory usage is all the exploded Formulas (Ptg arrays).  Depending on the content of the workbook, the Ptgs could account for up to half the JVM heap usage (of the HSSFWorkbook object graph).

I made a change in svn 709235 to convert (almost) all Ptg arrays into Formula objects which only spill their Ptg tokens when asked.  Large savings were gained by converting formula un-sharing from eager to lazy.

I ran some tests on a local XLS file (around 4MB in size, containing around 35,000 formula cells).  Before the change, this file took 28.1MB when loaded by POI.  After the change it took 22.6MB.

There is still a long way to go.  No doubt there will be additional requests to reduce POI's memory footprint.  Please supply example XLS files with any new requests (preferably in a new Bugzilla), so we can focus on the specifics, and track progress better.