Bug 62875

Summary: Area with empty cell is shifted
Product: POI Reporter: Sven <poi>
Component: SS CommonAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 4.0.0-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Attachments: MDETERM function with empty cell in referenced area

Description Sven 2018-11-01 08:28:18 UTC
Created attachment 36227 [details]
MDETERM function with empty cell in referenced area

Hi,

the behavior of function MDETERM with at least one empty cell in the area-parameter differs between POI and Excel/LibreOffice (both tested).

In an Excel 365 file there are the following functions/values:
- A1: =MDETERM(A2:B3)
- A2: [empty]
- B2: 1
- A3: 1
- B3: 1

Excel 365 and LibreOffice 6.1.2.1 both show "#VALUE" in A1 for that situation.
POI asserts A2 = 0 and returns A1 = -1.

This A2=0 comes from org.apache.poi.ss.formula.functions.MatrixFunction.OneArrayArg.evaluate(int, int, ValueEval). There the collectValues(arg0)-call collects the values for the given area A2:B3 and returns values[]={1.0,1.0,1.0} (only 3 of 4 values).
The subsequent call of fillDoubleArray(...) creates a "double[][] matrix"-array that is initialized with zeros. fillDoubleArray(...) returns matrix[][]=[[1.0, 1.0], [1.0, 0.0]].
Thus matrix does not represent the actual matrix from the workbook, but a shifted version with a false 0-value.

A solution could be that collectValues() throws an EvaluationException for any empty cell in that area.
Another solution could be that collectValues() returns a Double[][] array with null values so that subsequent methods can determine whether they throw an exception.
Comment 1 gallon.fizik@gmail.com 2018-12-17 16:19:37 UTC
I've come across several similar cases recently, namely with Product and Geomean functions. I'll post a PR soon with Product, where I change the way MultiOperandNumericFunction.collectValue() handles non-numeric values resolved from explicit argument/area reference/reference. It's not a MatrixFunction but the idea and implementation of collecting evals to a double[]/double[][] array is the same.

Instead of boolean flags (blanksAreCounted,  boolsAreCounted) I introduce a thing I called EvalConsumer<T extends ValueEval, RECEIVER> (not the best name, perhaps I'll think of a better one). The idea is to simplify the configuration of such functions by introducing a set of consumers which actually perform the handling. The base class is not a Consumer because the latter doesn't allow throwing a checked EvaluationException. The three most common consumer flavors are Coerce (use the value directly or try to cast), Skip and Throw. The implementations are so simple that can well be lambdas.

That said, the collectValue() from a sheet of if instanceof else if instance of will change to something like:
EvalConsumer consumer = resolveConsumer(ValueEval ev, boolean isViaReference);
consumer.consume(ev, accumulator);

We may go even further and maintain a [lazily populated] map <T extends ValueEval -> EvalConsumer<T, RECEINVER> so that we don't have to resolve the consumer once again for each argument. Or a list (actually, this is the Chain of Responsibility pattern). Thanks for making me clarify this :)

An option to set a custom handler/hook/filter is easily implemented. In pseudo-code, the config for MDETERM may look like:

class MDETERM extends MatrixFunction {
    MDETERM() {
      super(...);
      addHandler(BlankEval.class, (value, receiver) -> {throw new EvaluationException(ErrorEval.VALUE_INVALID);};
    }
}


Going further, such a universal construct may later become a part of the generic OperandResolver.
Comment 2 Greg Woolsey 2018-12-17 17:18:17 UTC
Good analysis, thank you for filing it and contributing.  I've done some work on formula evaluation, but not dug into the matrix or array pieces much.

One question, as the scope of the proposed lazy collection is unclear to me:

Is the scope for a single expression evaluation chain, or does this live higher up in the Workbook level FormulaEvaluator and similar?  If it lives at that level, caching is OK, but only as long as it is cleared when values change, by adding the necessary steps to the existing code for that.

Thanks again, looking forward to your patch/PR.
Comment 3 gallon.fizik@gmail.com 2018-12-17 17:55:54 UTC
The scope is for one call to Function.evaluate(Ptg[], row, col). It doesn't persist between calls. Caching is performed higher in WorkbookEvaluator at the cell level. 
Caching wouldn't work for functions because along with refs they receive literals and nested functions' results.
Comment 4 gallon.fizik@gmail.com 2018-12-17 18:15:41 UTC
PR https://github.com/apache/poi/pull/137
Comment 5 gallon.fizik@gmail.com 2018-12-19 21:07:48 UTC
The PR has been pulled to trunk. I guess the issue can be closed now as resolved.
Comment 6 Andreas Beeker 2018-12-19 21:56:21 UTC
r1849237 was the svn revision.

I usually leave tickets open, if I continue to work on it ... but sometimes they are forgotten ... ¯\_(ツ)_/¯