Bug 61859 - Limited support for array operands
Summary: Limited support for array operands
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.17-FINAL
Hardware: Macintosh All
: P2 normal with 1 vote (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-12-05 15:03 UTC by Markus Kirsten
Modified: 2017-12-18 16:13 UTC (History)
0 users



Attachments
Simple file to reproduce the bug (26.80 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2017-12-05 15:03 UTC, Markus Kirsten
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Markus Kirsten 2017-12-05 15:03:41 UTC
Created attachment 35587 [details]
Simple file to reproduce the bug

POI does not correctly support array operands and collapses e.g., the
expression inside the INDEX function in INDEX(($B$2:$B$11=F2)*$A$2:$A$11) into a number. The formula INDEX expects an array, and instead of evaluating

INDEX({1,2,2,3,0,0,0,4,0,0}, 0)

POI evaluates it as

INDEX(1, 0)

STEPS TO REPRODUCE:
1. Evaluate all cells in the attached file (or even just G4) with Apache POI
2. The following exception is thrown WARNING: Incomplete code - cannot handle first arg of type (org.apache.poi.ss.formula.eval.NumberEval) for cell G4

See original description of the problem below. Identification that this had to do with how POI handles arguments that are arrays was kindly provided by Yegor Kozlov.


======
For a spreadsheet I am trying to update and evaluate with POI I am having some difficulties. The problem I am trying to solve in its simplest form is the following; From a data sheet with two columns - one with a numeric value, and one with a group, I would like to find the maximum value for each group. So from the table below, Iā€™d like to calculate that max for A is 4, max for B is 10 and max for C is 20.

Value	Group
1	A
2	A
2	A
3	A
5	B
3	B
10	B
4	A
10	C
20	C

One way to do this would be to create the following table, where the column max value is to;

=MAX(INDEX(($B$2:$B$11=F2)*$A$2:$A$11;0))

The logic is basically; find where column B (the group) matches the specified group. This is an array with 0/1s. Multiply that array with the value array. Now we have an array with either 0s or the values from the specified group. Take max of that.

Group	Max value
A	4
B	10
C	20

However, when evaluating the formula above, I get the following error. I have uploaded a minimal test file on https://ufile.io/z6qg4 The 3 red cells are the only ones with formulas and it is when I evaluate them that I get this error.

WARNING: Incomplete code - cannot handle first arg of type (org.apache.poi.ss.formula.eval.NumberEval) for cell G4
java.lang.RuntimeException: Incomplete code - cannot handle first arg of type (org.apache.poi.ss.formula.eval.NumberEval)
	at org.apache.poi.ss.formula.functions.Index.convertFirstArg(Index.java:106)
	at org.apache.poi.ss.formula.functions.Index.evaluate(Index.java:50)
	at org.apache.poi.ss.formula.functions.Index.evaluate(Index.java:114)
	at org.apache.poi.ss.formula.OperationEvaluatorFactory.evaluate(OperationEvaluatorFactory.java:132)
	at org.apache.poi.ss.formula.WorkbookEvaluator.evaluateFormula(WorkbookEvaluator.java:523)
	at org.apache.poi.ss.formula.WorkbookEvaluator.evaluateAny(WorkbookEvaluator.java:290)
	at org.apache.poi.ss.formula.WorkbookEvaluator.evaluate(WorkbookEvaluator.java:232)
	at org.apache.poi.xssf.usermodel.BaseXSSFFormulaEvaluator.evaluateFormulaCellValue(BaseXSSFFormulaEvaluator.java:65)
	at org.apache.poi.ss.formula.BaseFormulaEvaluator.evaluateFormulaCellEnum(BaseFormulaEvaluator.java:192)
	at org.apache.poi.ss.usermodel.DataFormatter.formatCellValue(DataFormatter.java:887)
Comment 1 Yegor Kozlov 2017-12-18 16:13:28 UTC
Should be fixed in r1818587

The fix includes two important changes:

1. Support for evaluating relational operators in array mode. Before this fix POI always de-referenced the first value  and $B$2:$B$11=F2 evaluated as $B$2=F2 . Now it should correctly evaluate it to an array of {TRUE, FALSE, ..... }. 

2. Recognize 'array' evaluation mode from the formula context. Normally, array arithmetic is used in array formulas (those saved with Ctrl-Shift-Enter).  The INDEX function seems to be a special case: if row_num or column_num are omitted or set to 0, INDEX returns an array of the entire column or row. I had to tweak the evaluator to recognize it.