|Summary:||SUMIF returns wrong result (POI 4.0.x)|
|Product:||POI||Reporter:||Axel Howind <axel>|
|Component:||XSSF||Assignee:||POI Developers List <dev>|
minimal program that demonstrates the problem
JUnit test for this
Patch that fixes bug 62993
Patch with bugfix and testcase
Description Axel Howind 2018-12-08 17:33:14 UTC
Created attachment 36312 [details] minimal program that demonstrates the problem I have upgraded some projects to POI 4 and found out that there's a bug in SUMIF. I create an in-memory XSSF Workbook that contains some data and some formulas working on that data. SUMIF() only returns the correct result for the first cell, and returns zero for the remaining cells. The original program was written by a colleague in Ruby, and it worked in POI 3.x (until the latest pre 4.0-version). Since the first time working with Ruby was trying to debug said program, I created a minimal Java application to reproduce the problem Please take a look at the Excel sheet as it is written by the program. It should look like this (please ignore the locale specific decimal formatting): Key Value 1,75 123 2,25 456 3,25 789 1,75 10 3,25 20 3,25 30 0,9 40 1,25 50 Key Sum POI Exp Status 0,9 40 40 40 OK 2,25 456 0 456 FAIL 1,25 50 0 50 FAIL 3,25 839 0 839 FAIL 1,75 133 0 133 FAIL `Key` is a numeric value `Sum` is a formula, i.e. "SUMIF(A2:A9;A12;B2:B9)" for the first row, "SUMIF(A2:A9;A13;B2:B9)" for the second and so on `POI` is the result calculated by invoking `evaluator.evaluateFormulaCell()` `Exp` is the expected result (calculated in Java) `Status` is "OK" if the POI result is the same as in Java I can reproduce the problem with POI 4.0 and 4.0.1
Comment 1 Axel Howind 2018-12-08 21:19:12 UTC
Created attachment 36313 [details] JUnit test for this The attachment contains a JUnit test case for this bug. I am unsure where this should end up in the POI source tree though.
Comment 2 Axel Howind 2018-12-09 07:30:13 UTC
It turns out that this probably not only affects SUMIF but possibly all functions that operate on ranges. Bug is triggered when rows are added after the first formula evaluation is done. I have prepared a patch and will include a more detailed description there. Please note that this is almost certainly a regression from POI 3.x since the Ruby code we use produced correct results before upgrading, and the changes during upgrade were rather cosmetically (exchanging ints with enums in some places).
Comment 3 Axel Howind 2018-12-09 07:32:39 UTC
Created attachment 36315 [details] Patch that fixes bug 62993 Patch to fix Bug 62993 `XSSFEvaluationSheet` uses an internal cache to speed up evaluations. It also contains an integer field `_lastDefinedRow` to avoid calling `getLastRowNum()` for the spreadsheet which is costly because it looks up the last entry in a sorted map which should be O(log(n)) for the implementation used (with n being the number of rows). The problem is, that `_lastDefinedRow` is evaluated only once when the `XSSFEvaluationSheet` is created and not updated when rows are added later. This patch removes `XSSFEvaluationSheet._lastDefinedRow` and instead always uses `_xs.getLastRowNum()`. Runtime performance should be unaffected or even better because I moved the optimization (caching of the last row number) up to `XSSFSheet`, so that other code will also benefit. Please review and comment.
Comment 4 Axel Howind 2018-12-10 15:19:51 UTC
Sorry, it seems this patch introduces a regression that I didn't see because Junit was broken on my machine (I opened Bug 62996 for that problem). Now that Junit is up and running again, I will look into the regression this has created and create a updated patch.
Comment 5 Axel Howind 2018-12-10 16:58:36 UTC
Created attachment 36319 [details] Patch with bugfix and testcase Revised patch, now all unit tests pass. I have also added a new test case for this.
Comment 6 Axel Howind 2018-12-27 09:19:57 UTC
I have changed Hardware and OS to "all" because this bug is not specific to a particular system (found it on Windows 7 / confirmed and fixed it on MAC). Also, what should I do to get this applied to trunk? I think this is rather important because it may lead to wrong results and it's a regression from 3.x.