Bug 45060

Summary: Token Class Transformation incorrect when function expects 'reference' but arg is 'value'
Product: POI Reporter: Andreas Goetz <cpuidle>
Component: HSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal CC: kayamkulamkochunni
Priority: P3    
Version: 3.0-dev   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Bug Depends on:    
Bug Blocks: 45041    
Attachments: test case

Description Andreas Goetz 2008-05-22 04:17:02 UTC
This is a follow-on issue to bug 45041. 

The following example creates a sheet with
two rows as input data for the final formula in row 3. The formula produces a #VALUE error
in excel (tested on poi 3.1beta).
Visually the formula in Excel appears ok. Just pressing
"Enter" in the formula field again fixes the problem and field displays a value. It seems
Excel is correcting the formula in the background- visually it does not change by this
process.

import java.io.*;
import java.util.*;

import org.apache.poi.hssf.usermodel.*;

public
class Test
{
	static int row = 0;

	static HSSFSheet sheet;

	static HSSFCellStyle
xlsDateStyle, xlsNumericStyle, xlsPercentStyle;

	/**
	 * @param args
	 * @throws
Exception 
	 */
	public static void main(String[] args) throws Exception
	{
		FileOutputStream
out = new FileOutputStream("d:\\test.xls");
		HSSFWorkbook wb = new HSSFWorkbook();

		sheet
= wb.createSheet("test");

		// Excel-formatted date object
		xlsDateStyle = wb.createCellStyle();
		xlsDateStyle.setDataFormat(wb.createDataFormat().getFormat("dd.mm.yyyy"));
		
		//
Excel-formatted number
		xlsNumericStyle = wb.createCellStyle();
		xlsNumericStyle.setDataFormat(wb.createDataFormat().getFormat("0.00"));
		
		//
Excel-formatted percent object
		xlsPercentStyle = wb.createCellStyle();
		xlsPercentStyle.setDataFormat(wb.createDataFormat().getFormat("0.00%"));

		addRow(new
Date(0, 0, 1), 100.0);
		addRow(new Date(1, 0, 1), -110.0);

		HSSFRow r = sheet.createRow(row++);

		//
create the IRR formula
		short col = 2;
		HSSFCell c = r.createCell(col++);
		c.setCellStyle(xlsPercentStyle);
		c.setCellType(HSSFCell.CELL_TYPE_FORMULA);
		c.setCellFormula("(1+IRR(SUMIF(A:A,ROW(INDIRECT(MIN(A:A)&\":\"&MAX(A:A))),B:B),0))^365-1");

		wb.write(out);
		out.close();
	}

	private
static void addRow(Date date, double d)
	{
		HSSFRow r = sheet.createRow(row++);

		short
col = 0;
		HSSFCell c = r.createCell(col++);
		c.setCellValue(date);
		c.setCellStyle(xlsDateStyle);

		c
= r.createCell(col++);
		c.setCellValue(d);
		c.setCellStyle(xlsNumericStyle);
	}
}
Comment 1 Andreas Goetz 2008-05-22 04:17:33 UTC
Created attachment 21987 [details]
test case
Comment 2 Josh Micich 2008-05-22 10:25:02 UTC
Initial investigation shows that Excel encodes (when it re-parses) the formula different to POI.  At first I thought the problem might have been the missing tAttrVolatile token, but that seems to make no difference.  The critical problem is the difference in the operand class of two of the function tokens.  
Excel augments to them to 'array'.  I am speculating that the rule being followed here is that when the function parameter is 'reference' and the actual argument is 'value' that it should get changed to 'array'.  However, I can't see that explicity mentioned in the ooo document.  

// The formula parse tokens 
AttrPtg [volatile ]  (POI does not encode this token)
IntPtg [1]
AreaPtg [A:A]
AreaPtg [A:A]
FuncVarPtg [MIN nArgs=1]
StringPtg [:]
ConcatPtg
AreaPtg [A:A]
FuncVarPtg [MAX nArgs=1]
ConcatPtg
FuncVarPtg [INDIRECT nArgs=1]
FuncVarPtg [ROW nArgs=1]   >>> ptgClass = V should be A
AreaPtg [B:B]
FuncVarPtg [SUMIF nArgs=3] >>> ptgClass = V should be A
IntPtg [0]
FuncVarPtg [IRR nArgs=2]  
class AddPtg

// Function metadata
#Columns: (index, name, minParams, maxParams, returnClass, paramClasses)
148	INDIRECT	1	2	R	V V
8	ROW	0	1	V	R
345	SUMIF	2	3	V	R V R
62	IRR	1	2	V	R 


In this current example  SUMIF()'s return class is 'value' but IRR() expects 'reference' for the first parameter.  I guess this somehow necessitates the transformation of the SUMIF() token to 'array'.  From there, perhaps 'forced array' state causes SUMIF()'s second arg to be transformed to 'array' too.

A simpler example "COLUMNS(PI())" also gets encoded by Excel with transformation to 'array' operand class.  Excel tolerates POI's incorrect encoding in this case, so the mistake is not as clearly visible.

The fix for this is going to be in FormulaParser.setParameterRVA().

Comment 3 Josh Micich 2008-05-27 23:30:09 UTC
Fix applied in svn r660828.

It seems that POI's operand class transformation differed from a lot Excel's.  A new test case (TestRVA.java) shows many of those differences.

It seems that in the function metadata for IRR, the first parameter should be operand class 'array' (not 'reference').  However, this was not nearly enough to get all the test cases working.

Some changes were made to the Ptg class hierarchy (to make the distinction between classified and base tokens clearer).  Amongst other changes FormulaParser was fixed to produce a parse tree on the fly while parsing (as opposed to storing tokens in a flat list.

As of this fix, POI still has some differences with Excel with regard to operand class transformation, but no significant examples have been found (i.e. in the cases where POI gets it wrong, Excel still seems to read/evaluate the formula OK).
Comment 4 Josh Micich 2008-06-15 15:28:36 UTC
*** Bug 45206 has been marked as a duplicate of this bug. ***