Bug 59958 - [PATCH] Add cells on the fly to the evaluation sheet cache on cache miss
Summary: [PATCH] Add cells on the fly to the evaluation sheet cache on cache miss
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.15-dev
Hardware: PC All
: P2 normal with 2 votes (vote)
Target Milestone: ---
Assignee: POI Developers List
Keywords: PatchAvailable
Depends on:
Reported: 2016-08-08 12:08 UTC by Tomasz Stanczak
Modified: 2016-10-06 07:36 UTC (History)
0 users

GIT patch with adding the cell into the cell cache in case of a cache miss. (1.49 KB, patch)
2016-08-08 12:08 UTC, Tomasz Stanczak
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tomasz Stanczak 2016-08-08 12:08:28 UTC
Created attachment 34111 [details]
GIT patch with adding the cell into the cell cache in case of a cache miss.

Seen on POI 3.15 beta2.

In my use case I have a middle sized sheet populated with precalculated constant values, then for some of the columns subtotal are being dynamically added using SUMIF row by row. The index parameter (2nd parameter) to SUMIF is also being set dynamically, like

         B     F                       I
Row 150  2.5   SUMIF(..., B150, ...)   SUMIF(..., B150, ...)
Row 151  3.5   SUMIF(..., B151, ...)   SUMIF(..., B151, ...)

Up to v3.14 it has worked (there was no cache!). The reason is that in v3.15 XSSFEvaluationSheet maintains a cell cache that is populated only at the first access, so evaluation of the formulae of the first dynamically created row (row 150) succeeds, yet starting with the next row (row 151) the cache misses the new values and all formulae return 0.

I am aware of the workarounds: 

1. create all of the B column before evaluating the first formula
2. use the new clearAllCachedResultValues method available in 3.15 beta 3.

Re 1. - it seems to be not very use friendly for the evaluation to depend on the setting all constant values before evaluating the very first formula

Re 2. - clearing the cache on each cell change seems to be an overkill - not only the cell cache would be recreated but also all other caches, thus causing a performance penalty. 

Both to be successfully deployed require the knowledge of caching mechanism inside POI.

This is why I suggest a patch - if cache misses a cell add it if present. This way the performance gain of the cell cache would remain in place, the costly recreating of all the caches could be avoided.

But since I am new to POI and the solution seems trivial I might have missed something important so please bear with me.
Comment 1 Javen O'Neal 2016-09-11 02:04:29 UTC
Thanks for the patch! Committed in r1760213. This will likely be included in POI 3.15 final.

Could you write a test case that demonstrates the problem from comment 0? If we include a test in our test suite, it will reduce the chance of regressions.

For an example of similar unit tests, see TestXSSFBugs, TestXSSFFormulaParser, and TestFormulaParser.
Comment 2 Dominik Stadler 2016-10-05 20:14:52 UTC
Added a simple unit test via r1763487.
Comment 3 Tomasz Stanczak 2016-10-06 07:36:26 UTC
I'm sorry I didn't respond - I didn't get any emails from Apache Bugzilla, don't know why, the preferences look okay :-(