Bug 60848 - SUMPRODUCT fails when first arg is of form --(...)
Summary: SUMPRODUCT fails when first arg is of form --(...)
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: SS Common (show other bugs)
Version: 3.16-dev
Hardware: All All
: P2 normal with 2 votes (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-12 18:33 UTC by mewalig
Modified: 2018-04-09 09:35 UTC (History)
1 user (show)



Attachments
bad.xlsx (29.56 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2017-03-21 15:31 UTC, mewalig
Details

Note You need to log in before you can comment on or make changes to this bug.
Description mewalig 2017-03-12 18:33:59 UTC
The following formula works fine in POI (using version 3.15):

=SUMPRODUCT(X17:X1048576,--(O17:O1048576>=30))

However, reversing the order of the args, it fails with "Invalid arg type
for SUMPRODUCT: (org.apache.poi.ss.formula.eval.ErrorEval)":

=SUMPRODUCT(--(O17:O1048576>=30),X17:X1048576)

Either way works fine in Excel.

Also described at https://lists.apache.org/list.html?dev@poi.apache.org:2017-03
Comment 1 Dominik Stadler 2017-03-19 20:20:48 UTC
Do you have a sample file/sample code or unit-test that reproduces the problem? This would make it a lot easier for developers to reproduce and work on a fix.
Comment 2 mewalig 2017-03-21 15:29:30 UTC
It's as simple as evaluating a sumproduct whose first arg starts with --.

--- Sample files ---
 bad.xlsx: in A3, set formula to
     =SUMPRODUCT(--(B5:B20))

--- Test results ---
Running the sample code below, you will get:
  java sampleApp good.xlsx
    <prints "No error">
  java sampleApp bad.xlsx
    <prints "Invalid arg type for SUMPRODUCT:(org.apache.poi.ss.formula.eval.ErrorEval)">

--- Sample code ---

    private static final String FILE_NAME = "test.xlsx";

    public static void main(String[] args) {
        try {
            FileInputStream excelFile = new FileInputStream(new File(FILE_NAME));
            Workbook workbook = new XSSFWorkbook(excelFile);

            // get the cell with the offending formula (cell A3 in sheet 0)
            Cell c = workbook.getSheetAt(0).getRow(2).getCell(0);

            FormulaEvaluator evaluator = workbook.getCreationHelper().createFormulaEvaluator();

            evaluator.evaluateFormulaCell(c);

            System.out.println("No error");
        } catch(Exception e) {
            System.out.println("There was an error");
            System.out.println(e.getMessage());
        }
    }
Comment 3 mewalig 2017-03-21 15:31:07 UTC
Created attachment 34860 [details]
bad.xlsx

bad.xlsx as described in prior comment. It has no contents other than one formula in cell A3: =SUMPRODUCT(--(B5:B20))
Comment 4 mewalig 2017-03-21 15:33:50 UTC
Please ignore the reference to "good.xlsx" in comment 2. It was meant to be removed before it was sent.
Comment 5 Greg Woolsey 2017-03-21 17:57:50 UTC
Thanks for the simple test case.  This is deeper than just the SUMPRODUCT() function.  By the time it gets to evaluating the function, the argument has already evaluated to an error.

The problem is that the "Unary Minus" operator, "-", doesn't properly handle an array argument.  This is actually a problem with the underlying utility class OperandResolver, which needs to return an array instead of a single value when the input is an array for cases like this, but perhaps not for some other functions.

This will require some digging to fully define.  To start with, the unary functions should at least call a new operand static method that can return an array of values.  UnaryPlusEval has the same bug as UnaryMinusEval, although the plus and minus evaluation logic is slightly different according to the comments in the code.

It will take someone digging through the Excel function specs to see what other functions can silently operate on an array and return an array for use by array functions like SUMPRODUCT, and testing/checking the POI handling of each.

Patches are always welcome.  If one of my users indicates we need this it will end up in my queue, but otherwise it's a bit too big for me to tackle at the moment.
Comment 6 mewalig 2017-03-21 18:20:50 UTC
Thank you for the comment

Out of curiosity, why is the processing wrt the first argument different from that wrt subsequent arguments? Note that if we do:
   =SUMPRODUCT(A5:A20, --(B5:B20))

then even though the problematic argument is still there, it works just be relegating it to the position of argument 2 instead of argument 1.

Is it possible to take whatever process is being applied to arguments 2+ and apply that to argument 1?
Comment 7 Greg Woolsey 2017-03-21 18:49:28 UTC
The current logic is completely dependent on the internal class type of of the first argument.  As I mentioned, the problem is with the unary negation operator (minus).  I haven't tested it, but I suspect the second argument is actually evaluating to a single value instead of an array, and projecting that across the first argument array, rather than taking the pairwise product of arrays of equal size.  There won't be a simple fix.  

You are welcome to check out the source code and play with it.  If you come up with a fix and unit test, we'd love to incorporate a patch.
Comment 8 mewalig 2017-03-21 20:03:30 UTC
I will try to take a look. the main (at least initial) limitation on my ability to do so is that I'm pretty unfamiliar with the Java build process (I'm mostly a C programmer and stick with emacs + cc + make). 

Will have to do some research on how to get code cycle times down for this kind of endeavor. Any suggestions much appreciated.
Comment 9 mewalig 2017-03-22 02:51:17 UTC
If any developers are interested in fixing this for a $ bounty, please send me a proposal at mewalig@gmail.com
Comment 10 mewalig 2017-04-21 21:43:21 UTC
the plot thickens.

Using the below example, when cell A3 is changed to be:
    =SUMPRODUCT(--(B4:B6))

then, as before, evaluation of that throws an error.

However, when cell A3 contains
    =SUMPRODUCT(--(B3:B5))

then, unlike before, no error is thrown, but the resulting value is not correct and only appears to consider the cell B3.

Not sure if this indicates two separate issues that need to be fixed, but it seems odd that one throws an error and the other does not just by moving the referenced range up or down a row.
Comment 11 mewalig 2017-04-21 21:57:56 UTC
Looking at the syntax in https://msdn.microsoft.com/en-us/library/dd906358(v=office.12).aspx, then, assuming that "Unary Minus" is equivalent in the POI grammer as "sign" in the MSFT site's grammar, it seems possible that the grammar in POI should not be classifying the leading "-" as "Unary Minus", and instead should be classifying it as whatever in the POI grammar is equivalent to "prefix-operator".

See in particular the line that reads:

  nospace-argument-expression = "("  expression  ")" / constant / prefix-operator argument-expression ...


in other words, "--(xx)" should be parsed as


  prefix-operator argument-expression
                    |
                  prefix-operator argument-expression
                                    |
                                  "("  expression  ")"


(note that I've ignored the difference between nospace-argument-expression and argument-expression and treated them as equivalent for brevity)

Would you agree that the expression should parsed differently, as shown above? I'm not familiar enough with how POI implemented the grammar to know whether there is some other way to deal with this which is different from how the MSFT site's grammar works... but I would think that the formula parsing is hairy enough that it would be optimal to use the exact same grammar as specified in the MSFT site. It might be a fundamental change to make, but there are a number of other formula-parsing bugs in POI that would likely be more easily addressed taking that approach.
Comment 12 mewalig 2018-02-02 17:54:41 UTC
Calling all coders who want more money: who is willing to fix this bug for a bounty? 

I have to admit I am surprised that there hasn't been more attention being paid to what appears to be a fundamental problem in how this library parses formulas, which not only doesn't work, but also doesn't notify the user that it doesn't work, and instead just sends back bad data. If nothing else, there should be raised some Exception raised saying "I can't calculate this correctly, so I'm returning an error". Otherwise how can anyone take this library seriously for any "real" work?

Also, why is this bug's status set to "NEEDINFO"? We already have all the info needed to reproduce. I get it that the solution isn't found yet, but "NEEDINFO" makes it sound like it's not ready for a developer to look at, which is not the case.