Bug 44413

Summary: [patch] INDEX() formula cannot contain its own location in the data array range
Product: POI Reporter: David Webster <dave.webster>
Component: HSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P3    
Version: 3.0-dev   
Target Milestone: ---   
Hardware: Other   
OS: Windows XP   
Attachments: Excel file showing INDEX function specified within its own range, which causes Java stack overflow
svn diff of 1 changed file (HSSFFormulaEvaluator)
tar bz2 of two new classes and one test case

Description David Webster 2008-02-13 04:44:37 UTC
Hello All, 

Description
-----------
If an INDEX() formula contains its own location in the data array range, parsing
this spreadsheet with POI causes a recursive exception with an eventual
java.lang.StackOver.

Steps to Reproduce
------------------
1/
Create a simple spreadsheet as follows
     A    B
   -------------
1 |    |  1  |
2 |    |  2  |
3 |    |  3  |
4 | ** |     |

** contains formula =INDEX(A1:B4,2,2)

2/
Notice in Excel (2003), cell A4 evaluates to 2.

3/
Parse the Excel file using POI to evaulate cell formulas (i.e. using
HSSFFormulaEvaluator.CellValue cellValue = evaluator.evaluate(cell); for example)

4/
Evaluating cell A4 causes an exception.

Actual Results
--------------
As stated in 4/ above.
Exception is:

Exception in thread "main" java.lang.StackOverflowError
        at java.lang.Character.digit(Character.java:4531)
        at java.lang.Character.digit(Character.java:4490)
        at java.lang.Integer.parseInt(Integer.java:445)
        at java.lang.Integer.parseInt(Integer.java:497)
        at org.apache.poi.hssf.util.CellReference.<init>(CellReference.java:53)
        at org.apache.poi.hssf.util.AreaReference.<init>(AreaReference.java:43)
        at org.apache.poi.hssf.record.formula.AreaPtg.<init>(AreaPtg.java:55)
        at org.apache.poi.hssf.model.FormulaParser.Ident(FormulaParser.java:290)
        at org.apache.poi.hssf.model.FormulaParser.Factor(FormulaParser.java:522)
        at org.apache.poi.hssf.model.FormulaParser.Term(FormulaParser.java:660)
        at
org.apache.poi.hssf.model.FormulaParser.Expression(FormulaParser.java:708)
        at org.apache.poi.hssf.model.FormulaParser.Arguments(FormulaParser.java:486)
        at org.apache.poi.hssf.model.FormulaParser.function(FormulaParser.java:358)
        at org.apache.poi.hssf.model.FormulaParser.Ident(FormulaParser.java:280)
        at org.apache.poi.hssf.model.FormulaParser.Factor(FormulaParser.java:522)
        at org.apache.poi.hssf.model.FormulaParser.Term(FormulaParser.java:660)
        at
org.apache.poi.hssf.model.FormulaParser.Expression(FormulaParser.java:708)
        at org.apache.poi.hssf.model.FormulaParser.parse(FormulaParser.java:812)
        at
org.apache.poi.hssf.usermodel.HSSFFormulaEvaluator.internalEvaluate(HSSFFormulaEvaluator.java:361)
        at
org.apache.poi.hssf.usermodel.HSSFFormulaEvaluator.getEvalForCell(HSSFFormulaEvaluator.java:557)
        at
org.apache.poi.hssf.usermodel.HSSFFormulaEvaluator.internalEvaluate(HSSFFormulaEvaluator.java:361)
        at
org.apache.poi.hssf.usermodel.HSSFFormulaEvaluator.getEvalForCell(HSSFFormulaEvaluator.java:557)
<REPEATED 100's OF TIMES>

Expected Results
----------------
Same as Excel as stated in 2/ above.

Build Date & Platform
---------------------
2008-02-13 on Windows XP

Additional Information
----------------------
I can't think of the top of my head what other Excel functions are 
a) implemented in POI,
b) use data ranges, and 
c) are *legal* in Excel (for instance, you can't use =SUM in a cell which is
part of its own range)

so as far as I know, this only affects the INDEX() function.

Cheers and thanks again!

Dave
Comment 1 David Webster 2008-02-13 04:45:38 UTC
Created attachment 21518 [details]
Excel file showing INDEX function specified within its own range, which causes Java stack overflow
Comment 2 Josh Micich 2008-02-13 14:04:18 UTC
Created attachment 21522 [details]
svn diff of 1 changed file (HSSFFormulaEvaluator)
Comment 3 Josh Micich 2008-02-13 14:05:48 UTC
Created attachment 21523 [details]
tar bz2 of two new classes and one test case
Comment 4 Josh Micich 2008-02-13 14:34:11 UTC
Bugfix patch submitted.

The root problem is that the scratchpad evaluator does(/did) not handle 
circular references (it crashes with SOE).  
Some functions like INDEX, OFFSET, VLOOKUP are distinguished by the 
characteristic that not every cell in the range arguments has to be 
evaluated.  This is in contrast to more basic functions like SUM and COUNTIF 
where every input cell must be evaluated.  Functions like INDEX can be set up 
to potentially cause cyclic references, but nonetheless behave OK due to 
careful choice of other parameters (like in this example).

POI's evaluation strategy is to fully evaluate all operator arguments prior to 
calling the operator.  So the situation described here causes SOE due to a 
cyclic reference on cell A4 even though cell A4 should eventually get ignored 
by this call to INDEX.

The patch supplied here allows HSSFFormulaEvaluator to handle cyclic 
references (and *potential* cyclic references) gracefully.  No change was made 
to the INDEX function.


A junit test case was added to show this bug (exactly as described), and also 
some simpler cyclic errors involving plain cell references.
Comment 5 Nick Burch 2008-02-15 03:53:38 UTC
Thanks for this patch Josh, I've applied to to svn

(I also enabled the error code test, since the patch it depends on is now in svn
too)