Bug 60219

Summary: [PATCH] FormulaParser can't parse external references when sheet name is quoted
Product: POI Reporter: IgnacioHR <ignacio>
Component: XSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: major Keywords: PatchAvailable
Priority: P2    
Version: 3.15-FINAL   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: FormulaParser.java patch content ASCII
FormulaParser.java and unit test patch for quoted external sheet names
Full patch, passes conf suite
Alternative patch to previous one

Description IgnacioHR 2016-10-07 17:21:26 UTC
Created attachment 34334 [details]
FormulaParser.java patch content ASCII

A formula cell contains a reference to a cell in an external workbook. This is represented in POI as 

[1]Shee1!A1

where [1] is an index on a table to de-reference external workbooks. The current parser has no problems with this. But, if the sheet name contains spaces or "special" characters, the formula cell contains:

'[1]Sheet 1'!A1

Note the quote starts before the square bracket indicating an external file reference.

We resolved this problem by editing the FormulaParser.java file and I'm now submitting a patch for revision.
Comment 1 Javen O'Neal 2016-10-08 23:36:26 UTC
Created attachment 34342 [details]
FormulaParser.java and unit test patch for quoted external sheet names

Thanks for the patch! I added a unit test (attached) to make sure this stays fixed in the future.

Unfortunately, this patch cannot be committed yet because it breaks o.a.p.hssf.usermodel.TestBugs#bug45970.

row.getCell(1).setCellFormula("'[$http://gagravarr.org/FormulaRefs2.xls]Sheet1'!B2");

> No external workbook with name '$http://gagravarr.org/FormulaRefs2.xls'
> java.lang.RuntimeException: No external workbook with name '$http://gagravarr.org/FormulaRefs2.xls'
>         at org.apache.poi.hssf.model.LinkTable.getExternalSheetIndex(LinkTable.java:445)
>         at org.apache.poi.hssf.model.InternalWorkbook.getExternalSheetIndex(InternalWorkbook.java:2012)
>         at org.apache.poi.hssf.usermodel.HSSFEvaluationWorkbook.getSheetExtIx(HSSFEvaluationWorkbook.java:300)
>         at org.apache.poi.hssf.usermodel.HSSFEvaluationWorkbook.get3DReferencePtg(HSSFEvaluationWorkbook.java:94)
>         at org.apache.poi.ss.formula.FormulaParser.createAreaRefParseNode(FormulaParser.java:957)
>         at org.apache.poi.ss.formula.FormulaParser.parseRangeable(FormulaParser.java:568)
>         at org.apache.poi.ss.formula.FormulaParser.parseRangeExpression(FormulaParser.java:311)
>         at org.apache.poi.ss.formula.FormulaParser.parseSimpleFactor(FormulaParser.java:1517)
>         ...
>         at org.apache.poi.ss.formula.FormulaParser.unionExpression(FormulaParser.java:1857)
>         at org.apache.poi.ss.formula.FormulaParser.parse(FormulaParser.java:2005)
>         at org.apache.poi.ss.formula.FormulaParser.parse(FormulaParser.java:170)
>         at org.apache.poi.ss.formula.FormulaParser.parse(FormulaParser.java:190)
>         at org.apache.poi.hssf.model.HSSFFormulaParser.parse(HSSFFormulaParser.java:108)
>         at org.apache.poi.hssf.usermodel.HSSFCell.setCellFormula(HSSFCell.java:622)
>         at org.apache.poi.hssf.usermodel.TestBugs.bug45970(TestBugs.java:1852)
Comment 2 Javen O'Neal 2016-10-08 23:41:39 UTC
Committed disabled unit test in r1763937.
Comment 3 IgnacioHR 2016-10-10 09:37:21 UTC
Hi,

Sorry, I don't understand this part of the comment:

"Unfortunately, this patch cannot be committed yet because it breaks o.a.p.hssf.usermodel.TestBugs#bug45970."

row.getCell(1).setCellFormula("'[$http://gagravarr.org/FormulaRefs2.xls]Sheet1'!B2");

The error message generated is right cause the Excel sheet referenced in the URL does not exists. So maybe there is a bug in the output of test case o.a.p.hssf.usermodel.TestBugs#bug45970 because it MUST raise that exception
Comment 4 IgnacioHR 2016-10-10 10:31:08 UTC
OK, I've dig into the problem of TestBugs#bug45970 and this is what I see.

on line 1852 of o.a.p.hssf.usermodel.TestBugs.java there is the following:

row.getCell(1).setCellFormula("'[$http://gagravarr.org/FormulaRefs2.xls]Sheet1'!B2");

During setCellFormula operation, the referenced external workbook is not in the LinkTable as you can see in the following stack trace:

LinkTable.getExternalSheetIndex(String, String, String) line: 443	
InternalWorkbook.getExternalSheetIndex(String, String, String) line: 2012	
HSSFEvaluationWorkbook.getSheetExtIx(SheetIdentifier) line: 300	
HSSFEvaluationWorkbook.get3DReferencePtg(CellReference, SheetIdentifier) line: 94	
FormulaParser.createAreaRefParseNode(SheetIdentifier, FormulaParser$SimpleRangePart, FormulaParser$SimpleRangePart) line: 957	
FormulaParser.parseRangeable() line: 568	
FormulaParser.parseRangeExpression() line: 311	
FormulaParser.parseSimpleFactor() line: 1516	
FormulaParser.percentFactor() line: 1474	
FormulaParser.powerFactor() line: 1461	
FormulaParser.Term() line: 1834	
FormulaParser.additiveExpression() line: 1962	
FormulaParser.concatExpression() line: 1946	
FormulaParser.comparisonExpression() line: 1903	
FormulaParser.intersectionExpression() line: 1876	
FormulaParser.unionExpression() line: 1856	
FormulaParser.parse() line: 2004	
FormulaParser.parse(String, FormulaParsingWorkbook, FormulaType, int, int) line: 170	
FormulaParser.parse(String, FormulaParsingWorkbook, FormulaType, int) line: 190	
HSSFFormulaParser.parse(String, HSSFWorkbook, FormulaType, int) line: 108	
HSSFCell.setCellFormula(String) line: 622	
TestBugs.bug45970() line: 1852	


LinkTable.getExternalSheetIndex(String, String, String) line: 443 returns -1 because and on line 445 of o.a.p.hssf.model.LinkTable.java a RuntimeException is raised.

There are three options:
1) Modify LinkTable to automatically add a new external reference to the link table rather that raise an exception (this way looks easy for POI users)
2) Modify test case 45970 to add an external workbook reference before setting the formula
3) Other, such as define a policy for unreferenced external workbooks and either raise an error or create the reference automatically.

Going for (2) seems obvious and the simplest way to fix this problem, but makes live more difficult to developers
Going for (1) seems possible in a short period of time, but might break existing test cases (I'm not 100% sure about this)
Going for (3) seems to be the right position in a long term. But requires more work

Any preference?
Comment 5 IgnacioHR 2016-10-10 15:36:25 UTC
I'm sorry to be so active at this time.

o.a.p.hssf.usermodel.TestHSSFFormulaEvaluator.java#testXRefs suggest me that the preferred option is (2) (the external workbook must be added to the linkTable before the setFormula is called)

This means the code o.a.p.hssf.usermodel.TestBugs#bug45970 MUST always fail and the current situation (the test pass without a fail) pass because the resolution of the formula reference contains a bug and the external workbook is recognised as a worksheet of the current workbook.

In order to fix code in bug45970. The changes should be like this:

-------------------------------------------------------------------
// Link our new workbook
wb1.linkExternalWorkbook("$http://gagravarr.org/FormulaRefs2.xls", new String[] {"Sheet1"});
       
// Change 4       row.getCell(1).setCellFormula("'[$http://gagravarr.org/FormulaRefs2.xls]Sheet1'!B2");
row.getCell(1).setCellValue(123.0);

// Link our new workbook
wb1.linkExternalWorkbook("$http://example.com/FormulaRefs.xls", new String[] {"Sheet1"});
       
// Add 5
row = s.createRow(5);
row.createCell(1, CellType.FORMULA);
       row.getCell(1).setCellFormula("'[$http://example.com/FormulaRefs.xls]Sheet1'!B1");
row.getCell(1).setCellValue(234.0);
-------------------------------------------------------------------

Note I'm adding the external workbook to the current workbook prior to set a formula that uses cells in that workbook.

The current implementation of linkExternalWorkbook requires the external workbook to be instantiated as a Workbook object. In order to support referencing external workbooks that are now already instantiated a new method linkExternalWorkbook(String name, String sheetNames[]) is required.

The attached new patch contains all required changes and passes complete conformance suite.

The changes include:

added linkExternalWorkbook(String name, String sheetNames[]) to  src/java/org/apache/poi/ss/usermodel/Workbook.java and added implementation of derived classes. Note XSSF... does not implements external workbooks at all.

TestBugs now can uncomment two final tests (that pass OK) to check the external references.
Comment 6 IgnacioHR 2016-10-10 15:40:15 UTC
Created attachment 34354 [details]
Full patch, passes conf suite
Comment 7 IgnacioHR 2016-10-11 10:52:05 UTC
Created attachment 34357 [details]
Alternative patch to previous one

This is an alternative patch to the full patch. This one does not change the Workbook interface. Only the TestBugs.java and FormulaParser.java
Comment 8 Javen O'Neal 2016-10-15 20:02:00 UTC
Are these external references saved in the xls or xlsx file? If so, then you are correct that the HSSFWorkbook and XSSFWorkbook need a method to add a link to an external workbook (I'm assuming it would also be valid to remove a link, in which case we would need an add, remove and get external links methods).

If only the FormulaEvaluator needs to know about external references (if this information is not saved in the workbook or is always auto-created when an external reference is found), then there is no need to complicate the API with multiple ways to link workbooks for evaluation.

Relevant: https://poi.apache.org/spreadsheet/eval.html#external

FWIW, I tested attachment 34354 [details] and it passes with the current test suite.
Comment 9 IgnacioHR 2016-10-21 10:59:33 UTC
Patch 34357 should be OK to fix this bug and does not modify the signature of Workbook interface. I propose you incorporate patch 34357 to current version and close this bug
Comment 10 Dominik Stadler 2016-12-31 16:55:51 UTC
Applied the less intrusive patch via r1776796, thanks for the work on this!