Created attachment 36259 [details] Test Excel File In our project we use POI to "calculate" an Excel workbook that is managed by a business analyst. In the last revision we had very strange results and I tracked it down to POI not really supporting array formulas. I made a test excel and project where it fails for this simple formula { =MIN(IF(A1:A6>=C1;A1:A6)) } The strange thing is that it works sometimes, but fails for other values. If this is not supported I would be OK with a hard failure as soon as the formula is evaluated. But here there are no exceptions at all, but the formula just gives wrong results for some values. See the example project including an automated test at https://github.com/eekboom/poi-minif ---- I also attached the test excel file to this issue. Here's a groovy spock test that shows the problem: import org.apache.poi.ss.usermodel.Cell import org.apache.poi.ss.usermodel.FormulaEvaluator import org.apache.poi.ss.usermodel.Row import org.apache.poi.ss.usermodel.Sheet import org.apache.poi.ss.usermodel.Workbook import org.apache.poi.ss.usermodel.WorkbookFactory import spock.lang.Specification import spock.lang.Unroll class PoiSpec extends Specification { @Unroll def 'test array formula: #value -> #expectedValue'() { given: Workbook workbook = WorkbookFactory.create(Thread.currentThread().getContextClassLoader().getResourceAsStream("workbook.xlsx")) Sheet sheet = workbook.getSheet("Sheet1") FormulaEvaluator formulaEvaluator = workbook.getCreationHelper().createFormulaEvaluator(); Row inputRow = sheet.getRow(0) Cell inputCell = inputRow.getCell(2) Row outputRow = sheet.getRow(2) Cell outputCell = outputRow.getCell(2) when: inputCell.setCellValue(value) formulaEvaluator.notifyUpdateCell(inputCell) then: formulaEvaluator.evaluate(outputCell).getNumberValue() == expectedValue where: value || expectedValue 3 || 5 5 || 5 6 || 10 27 || 30 } }
Hello Steven, thanks for the report. I encountered the same problem yesterday. Reasonably enough yet unfortunately, the IF construct is being treated as "lazy if", and the first argument is evaluated as a boolean scalar, never an array, and determines the following evaluation flow. If you are interested, I can post a dirty and almost untested 'works for me' patch. Right now I am overloaded with work but hopefully I'll address the issue properly during the holidays.
Created attachment 36356 [details] patch to fix evaluation of IF with array arguments With the attached patch POI evaluates it right. Summary of changes: 1. IF needs to implement ArrayFunction 2. moved RelationalOperationEval#evaluateArray into ArrayFunction. The logic to transpose and evaluate matrix arguments is common for all array functions and IFFunc is an example. 3. changed WorkbookEvaluator to evaluate and put the first argument on the stack in array mode. As Gallon noted, the "lazy if" logic evaluates the first arg as a boolean scalar which further results in a wrong result. Unit tests are coming soon
I checked in the fix in r1850646. The fix *improves* support for evaluation of array formulas, but full coverage is a long way ahead. What is included in the fix: 1. support for array arguments in the IF function. In a array context IF is rarely used by its own, it is usually nested in another function like {=MIN(IF(logical_test, values))} as in your example. Tests with MIN and MAX all pass. For SUM I have at least one failing test, but it can be related to another issue. 2. Support for array arguments in logical IS*** functions and unary operators. This makes it possible to evaluate formulas like {=SUM( --ISERROR(E6:E11)) }. Previously IS*** and unary operators evaluated to a scalar and the result was wrong. Please open a new ticket or re-open this one if you find more issues with evaluation of array formulas. Regards, Yegor
@Yegor could you please look into this issue https://bz.apache.org/bugzilla/show_bug.cgi?id=63054 ? The topic seems very close, and the divine vision of the code may still be fresh in your cache.
Thanks for the pointer. This commit fixed bug #63054 as well. The exact change that fixed it was in TwoOperandNumericOperation#evaluateArray() We had duplicate pieces of code that evaluated array arguments: 1. ArrayEval.evaluate(srcRowIndex, srcColumnIndex, rag1, arg2) 2. RelationalOperationEval.evaluateArray(args, srcRowIndex, srcColumnIndex) both iterate over the input arrays and evaluate (i,j)-th element from the first range with (i,j)-th element from the second range. The logic in RelationalOperationEval was more mature and I refactored it into a base class. ArrayEval is obsolete and can be deleted. The test cases are in TwoOperandNumericFunctionTestCaseData.xls Regards, Yegor
I checked in a follow-up to my previous commit. Now all tests pass including the commented ones, that failed in my previous commit. Excel has an optimization when evaluating the IF formula: depending on whether the logical condition is true/false, the PTGs for the 2nd or 3rd argument can be skipped altogether without evaluation. I had a hard time figuring out how it works in array mode until I realized that the "optimized if" logic is not applicable in array context: in array mode all arguments should be evaluated and put onto the stack and the 'skip' instructions should be ignored. Once I realized this and changed the code all failing tests started to pass. Regards, Yegor