Bug 57721 - Evaluating formulas in named Excel table
Summary: Evaluating formulas in named Excel table
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.12-FINAL
Hardware: All All
: P2 normal with 6 votes (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on: 57840
Blocks:
  Show dependency tree
 
Reported: 2015-03-17 11:14 UTC by gruber.chri
Modified: 2016-09-22 21:45 UTC (History)
2 users (show)



Attachments
test excel file (8.98 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2015-03-17 11:14 UTC, gruber.chri
Details
java test pgm (536 bytes, text/plain)
2015-03-17 11:15 UTC, gruber.chri
Details

Note You need to log in before you can comment on or make changes to this bug.
Description gruber.chri 2015-03-17 11:14:39 UTC
Created attachment 32581 [details]
test excel file

If you evaluate all formula cells in an XLSX file (for instance using  XSSFFormulaEvaluator.evaluateAllFormulaCells((XSSFWorkbook) workbook) ) and are using formulas in "named tables" in Excel 2007+, an exception is thrown.

I don't know how you call these named tables exactly, but they are created in Excel 2007+ by selecting a range (of data) and pressing the "Table" button in the "Insert" ribbon (including headers). Excel then offers special "table tools" in the title bar, when selecting a cell in the table.

If you create this kind of table and are using formulas referencing cells in the named table, excel doesn't reference these rows/cells via A1, B2, etc., but with their column and table names:


Create this simple table (attached as formular_test.xlsx):

A | B
-----
1 | 3
2 | 4

Then select the whole table and select Insert > Table, the table gets styled with alternating row colors etc.
If you then create a sum for all the table data, excel displays the formula as:
=SUM(Table1[#All]) or =SUM(Table1[[#All];[A]:[B]]) or =SUM(Table1)

When reading such a file with Apache POI, calling evaluateAllFormulaCells (source attached as UpdateFormulaTest.java) this exception is thrown:
Exception in thread "main" org.apache.poi.ss.formula.FormulaParseException: Specified named range 'Table1' does not exist in the current workbook (thrown when using "SUM(Table1[[A]:[B]])" as formula).

It looks like it fails, as soon as formulas are referencing these named tables.
Comment 1 gruber.chri 2015-03-17 11:15:24 UTC
Created attachment 32582 [details]
java test pgm
Comment 2 Stephen Matta 2016-03-30 12:16:25 UTC
This is preventing me from shifting rows that use the formula. My workaround is to use the "Table Tools/Design" tab and "Convert to Range" the table in Excel. This is not ideal but it's allowing me to move forward for now.
Comment 3 Greg Woolsey 2016-06-06 17:27:30 UTC
Tables are used widely in formulas, but POI formula evaluation doesn't handle them.  This is a significant problem for us, as Excel table references make writing complex formulas much simpler and less error-prone.
Comment 4 Javen O'Neal 2016-06-06 18:36:08 UTC
Patches are always welcome!
Submitting patches: https://poi.apache.org/guidelines.html#SubmittingPatches
Understanding formula evaluation:
https://poi.apache.org/spreadsheet/eval.html
https://poi.apache.org/spreadsheet/eval-devguide.html
Comment 5 Greg Woolsey 2016-06-06 19:41:39 UTC
This isn't a problem with formula function coverage, it is a core problem with XSSF formula SYNTAX evaluation.  POI just doesn't understand Excel Table "Structured References" in formulas.  These are XSSF format objects only, and have no equivalent in HSSF.

https://support.office.com/en-us/article/Using-structured-references-with-Excel-tables-f5ed2452-2337-4f71-bed3-c8ae6d2b276e

This means POI formula support is limited to pre-2007 syntax only, which is an increasing problem.

The fix should not be too hard, but would be deep into XSSFEvaluationWorkbook.  There is no function to register.  It would need to extend getName(String, int) to also check all sheets for tables matching the given name (tables have globally scoped names but are tied to specific sheets), and return something similar to EvaluationName.  

Since table references are just convenience syntax, this could just map directly to a range specification and parse the formula using the range expression from that point on, I think.

I plan to dig a bit, but any fix will involve some extensive refactoring, so I'm not sure anyone outside the core committers can write a patch acceptable to the repository owners.
Comment 6 Greg Woolsey 2016-06-06 20:19:57 UTC
I found this issue, with a potential patch, that has received no attention in over a year:

https://bz.apache.org/bugzilla/show_bug.cgi?id=57840

From the description, this looks like exactly what is needed.
Comment 7 Javen O'Neal 2016-06-10 16:37:20 UTC
With the added support of structured references from bug 57840, using the XSSFFormulaEvaluator to evaluate all formulas in a workbook containing structured references works. Applied unit test in r1747740 to make sure this continues to work into the future.
Comment 8 Javen O'Neal 2016-09-22 07:16:37 UTC
Is anything else needed to close this bug?
Comment 9 Greg Woolsey 2016-09-22 20:05:18 UTC
I think the other patches accepted for Structured Reference syntax cover this.  I've been on other tasks for a bit, will be digging back into the project that needed this shortly.  If I find anything else, it would be new bugs filed separately (hopefully with patches :D )
Comment 10 Javen O'Neal 2016-09-22 21:45:25 UTC
Closing. This was added to the changelog under Version 3.15-beta2 (2016-07-02) https://poi.apache.org/changes.html