Bug 65915

Summary: BooleanFunction.OR evaluateArray not working properly
Product: POI Reporter: xican.yang
Component: SS CommonAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal CC: xican.yang
Priority: P2    
Version: 5.2.0-FINAL   
Target Milestone: ---   
Hardware: Macintosh   
OS: Mac OS X 10.1   

Description xican.yang 2022-02-24 23:45:48 UTC
With POI version 4.1.2

This formula failed to evaluate
```
=INDEX({1},1,
IF(OR(FALSE,FALSE),1,1))
```

Stacktrace
[POI.FormulaEval]W   - evaluateFormula('Sheet1'/A1): [[ArrayPtg] nRows = 1 nCols = 1   {1}, IntPtg [1], BoolPtg [FALSE], BoolPtg [FALSE], FuncVarPtg [OR nArgs=2], AttrPtg [if dist=7], IntPtg [1], AttrPtg [skip dist=10], IntPtg [1], AttrPtg [skip dist=3], FuncVarPtg [IF nArgs=3], FuncVarPtg [INDEX nArgs=3]]
[POI.FormulaEval]I     * ptg 0: [ArrayPtg] nRows = 1 nCols = 1   {1}, stack: []
[POI.FormulaEval]I       = org.apache.poi.ss.formula.CacheAreaEval[A1:A1]
[POI.FormulaEval]I     * ptg 1: org.apache.poi.ss.formula.ptg.IntPtg [1], stack: [org.apache.poi.ss.formula.CacheAreaEval[A1:A1]]
[POI.FormulaEval]I       = org.apache.poi.ss.formula.eval.NumberEval [1]
[POI.FormulaEval]I     * ptg 2: org.apache.poi.ss.formula.ptg.BoolPtg [FALSE], stack: [org.apache.poi.ss.formula.CacheAreaEval[A1:A1], org.apache.poi.ss.formula.eval.NumberEval [1]]
[POI.FormulaEval]I       = org.apache.poi.ss.formula.eval.BoolEval [FALSE]
[POI.FormulaEval]I     * ptg 3: org.apache.poi.ss.formula.ptg.BoolPtg [FALSE], stack: [org.apache.poi.ss.formula.CacheAreaEval[A1:A1], org.apache.poi.ss.formula.eval.NumberEval [1], org.apache.poi.ss.formula.eval.BoolEval [FALSE]]
[POI.FormulaEval]I       = org.apache.poi.ss.formula.eval.BoolEval [FALSE]
[POI.FormulaEval]I     * ptg 4: org.apache.poi.ss.formula.ptg.FuncVarPtg [OR nArgs=2], stack: [org.apache.poi.ss.formula.CacheAreaEval[A1:A1], org.apache.poi.ss.formula.eval.NumberEval [1], org.apache.poi.ss.formula.eval.BoolEval [FALSE], org.apache.poi.ss.formula.eval.BoolEval [FALSE]]
[POI.FormulaEval]I       = org.apache.poi.ss.formula.eval.ErrorEval [#VALUE!]
[POI.FormulaEval]I     * ptg 5: org.apache.poi.ss.formula.ptg.AttrPtg [if dist=7], stack: [org.apache.poi.ss.formula.CacheAreaEval[A1:A1], org.apache.poi.ss.formula.eval.NumberEval [1], org.apache.poi.ss.formula.eval.ErrorEval [#VALUE!]]
[POI.FormulaEval]I     * ptg 6: org.apache.poi.ss.formula.ptg.IntPtg [1], stack: [org.apache.poi.ss.formula.CacheAreaEval[A1:A1], org.apache.poi.ss.formula.eval.NumberEval [1], org.apache.poi.ss.formula.eval.ErrorEval [#VALUE!]]
[POI.FormulaEval]I       = org.apache.poi.ss.formula.eval.NumberEval [1]
[POI.FormulaEval]I     * ptg 7: org.apache.poi.ss.formula.ptg.AttrPtg [skip dist=10], stack: [org.apache.poi.ss.formula.CacheAreaEval[A1:A1], org.apache.poi.ss.formula.eval.NumberEval [1], org.apache.poi.ss.formula.eval.ErrorEval [#VALUE!], org.apache.poi.ss.formula.eval.NumberEval [1]]
[POI.FormulaEval]I     * ptg 8: org.apache.poi.ss.formula.ptg.IntPtg [1], stack: [org.apache.poi.ss.formula.CacheAreaEval[A1:A1], org.apache.poi.ss.formula.eval.NumberEval [1], org.apache.poi.ss.formula.eval.ErrorEval [#VALUE!], org.apache.poi.ss.formula.eval.NumberEval [1]]
[POI.FormulaEval]I       = org.apache.poi.ss.formula.eval.NumberEval [1]
[POI.FormulaEval]I     * ptg 9: org.apache.poi.ss.formula.ptg.AttrPtg [skip dist=3], stack: [org.apache.poi.ss.formula.CacheAreaEval[A1:A1], org.apache.poi.ss.formula.eval.NumberEval [1], org.apache.poi.ss.formula.eval.ErrorEval [#VALUE!], org.apache.poi.ss.formula.eval.NumberEval [1], org.apache.poi.ss.formula.eval.NumberEval [1]]
[POI.FormulaEval]I     * ptg 10: org.apache.poi.ss.formula.ptg.FuncVarPtg [IF nArgs=3], stack: [org.apache.poi.ss.formula.CacheAreaEval[A1:A1], org.apache.poi.ss.formula.eval.NumberEval [1], org.apache.poi.ss.formula.eval.ErrorEval [#VALUE!], org.apache.poi.ss.formula.eval.NumberEval [1], org.apache.poi.ss.formula.eval.NumberEval [1]]
[POI.FormulaEval]I       = org.apache.poi.ss.formula.eval.ErrorEval [#VALUE!]
[POI.FormulaEval]I     * ptg 11: org.apache.poi.ss.formula.ptg.FuncVarPtg [INDEX nArgs=3], stack: [org.apache.poi.ss.formula.CacheAreaEval[A1:A1], org.apache.poi.ss.formula.eval.NumberEval [1], org.apache.poi.ss.formula.eval.ErrorEval [#VALUE!]]
[POI.FormulaEval]I       = org.apache.poi.ss.formula.eval.ErrorEval [#VALUE!]
[POI.FormulaEval]I   finshed eval of A1: org.apache.poi.ss.formula.eval.ErrorEval [#VALUE!]
[org.apache.poi.ss.formula.WorkbookEvaluator]D Evaluated Sheet1!A1 to org.apache.poi.ss.formula.eval.ErrorEval [#VALUE!]
Comment 1 xican.yang 2022-02-25 00:52:45 UTC
I just tried with version 5.2.0, and the issue persists.

Basically this formula 

```
=INDEX({1},1,
IF(OR(FALSE,FALSE),1,1))
```

Cannot be evaluated because it is considered to be an array formula and BooleanFunction.OR evaluateArray only accepts 1 argument.
Comment 3 PJ Fanning 2022-02-25 12:26:34 UTC
POI supports the use cases in https://support.microsoft.com/en-us/office/or-function-7d17ad14-8700-4281-b308-00b131e22af0 but may not support other undocumented scenarios. If you want to contribute fixes for these scenarios, please feel free to do so.
Comment 4 PJ Fanning 2022-02-25 12:57:21 UTC
I added a test in trunk code and it seems to work (r1898416) - so could you try POI 5.2.0 instead?
Comment 5 xican.yang 2022-02-25 17:13:51 UTC
Steps to reproduce in version 5.2.0

```
try (HSSFWorkbook wb = new HSSFWorkbook()) {
  HSSFFormulaEvaluator fe = new HSSFFormulaEvaluator(wb);
  HSSFSheet sheet = wb.createSheet();
  HSSFRow row = sheet.createRow(0);
  HSSFCell cell = row.createCell(0);
  sheet.setArrayFormula("INDEX({1},1,IF(OR(FALSE,FALSE),1,1))", new CellRangeAddress(0, 0, 0, 0));
  org.junit.jupiter.api.Assertions.assertEquals(fe.evaluate(cell).getNumberValue(), 1.0);
}
```
Comment 6 PJ Fanning 2022-02-25 17:16:01 UTC
why are you using setArrayFormula and not just setCellFormula?
Comment 7 xican.yang 2022-02-25 17:21:00 UTC
In my actual use case, the spreadsheet.xlsx is created by office365, and I just load the spreadsheet via POI, we didn't specify the formula as array formula, but when the OOXMLExtractor(or whatever class) parses the spreadsheet, the formula is recognized as array formula.

Regardless of that, this is still a bug, no?
Comment 8 PJ Fanning 2022-02-25 17:26:03 UTC
POI is a volunteer project - if you feel strongly about this then feel free to submit a patch and someone will review it. My feeling is that POI will never be able to get 100% coverage of Excel function behaviour without a serious uptick in code contributions from the user community.
Comment 9 PJ Fanning 2022-02-25 17:27:30 UTC
PS most of the time, when you read the xlsx files, you don't need to evaluate the formulas - xlsx files have the evaluated values cached in the sheet data - and POI can read those cached results
Comment 10 xican.yang 2022-02-25 17:31:37 UTC
(In reply to PJ Fanning from comment #9)
> PS most of the time, when you read the xlsx files, you don't need to
> evaluate the formulas - xlsx files have the evaluated values cached in the
> sheet data - and POI can read those cached results

Thanks, that's good to know. I'll do this as a workaround.
Comment 11 xican.yang 2022-02-25 17:32:46 UTC
(In reply to PJ Fanning from comment #8)
> POI is a volunteer project - if you feel strongly about this then feel free
> to submit a patch and someone will review it. My feeling is that POI will
> never be able to get 100% coverage of Excel function behaviour without a
> serious uptick in code contributions from the user community.

Understood. I tried to fix this on my own, but I'm having some difficulty understand the code block around array formulas. Thus I'm wondering if someone has better context here.
Comment 12 PJ Fanning 2022-02-25 18:26:16 UTC
I've added a possible fix - r1898421 - if it doesn't help, I will need to revert it
Comment 13 xican.yang 2022-02-25 18:30:03 UTC
(In reply to PJ Fanning from comment #12)
> I've added a possible fix - r1898421 - if it doesn't help, I will need to
> revert it

Thanks, that revision only works if OR/AND has 1 or 2 arguments. In reality, it could be more than that, e.g., 

```
INDEX({1},1,IF(OR(FALSE,FALSE,FALSE),1,1))
```

I'd recommend reverting it in favor of more proper fix.
Comment 14 xican.yang 2022-02-25 19:07:09 UTC
Just thinking aloud, for BooleanFunction, can evaluateArray just call evaluate?

I don't understand how OR(FALSE, FALSE, FALSE) is different in the context of array vs non-array.
Comment 15 PJ Fanning 2022-02-25 19:14:16 UTC
I'm not spending more time on this. I'm working on other stuff.
Comment 16 PJ Fanning 2022-02-25 19:20:03 UTC
I added r1898425 as one last change - as I said earlier I'm a volunteer and I don't want to spend time fixing stuff for other users. They can fix their own issues.
Comment 17 xican.yang 2022-02-25 19:22:39 UTC
(In reply to PJ Fanning from comment #16)
> I added r1898425 as one last change - as I said earlier I'm a volunteer and
> I don't want to spend time fixing stuff for other users. They can fix their
> own issues.

Appreciate your help!
Comment 18 PJ Fanning 2022-02-25 19:27:31 UTC
We are in the middle of a v5.2.1 release - these changes will not be in that release unless we run into a bigger problem that blocks the existing RC and that forces us to build a new RC. So likely that these changes won't appear until the v5.2.2 release (we typically only do 1 or 2 releases a year).