Bug 58452 - [PATCH] Set cell formulas containing unregistered function names
Summary: [PATCH] Set cell formulas containing unregistered function names
Alias: None
Product: POI
Classification: Unclassified
Component: SS Common (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
Keywords: PatchAvailable
Depends on:
Reported: 2015-09-24 01:29 UTC by Javen O'Neal
Modified: 2015-10-31 12:03 UTC (History)
0 users

supporting changes (java docs, add FormulaParsingWorkbook.createName) (3.91 KB, patch)
2015-10-06 02:33 UTC, Javen O'Neal
Details | Diff
main FormulaParser changes (14.60 KB, patch)
2015-10-06 03:17 UTC, Javen O'Neal
Details | Diff
testNames.xlsm - Test case spreadsheet (12.98 KB, application/vnd.ms-excel.sheet.macroEnabled.12)
2015-10-06 20:07 UTC, Javen O'Neal
supporting changes (java docs, add FormulaParsingWorkbook.createName) (5.11 KB, patch)
2015-10-06 20:18 UTC, Javen O'Neal
Details | Diff
main FormulaParser changes (19.38 KB, patch)
2015-10-06 20:23 UTC, Javen O'Neal
Details | Diff
main FormulaParser changes (19.46 KB, patch)
2015-10-06 20:34 UTC, Javen O'Neal
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Javen O'Neal 2015-09-24 01:29:01 UTC
Carried over from poi-user mailing list
Original post: http://mail-archives.apache.org/mod_mbox/poi-user/201509.mbox/%3CCAM%2BTppJHZRk4QfZ%3DJ8WF0zw5%2B%2BrHi-XOnXAXTNezHsgXtr2gPQ%40mail.gmail.com%3E
Nick's response: http://mail-archives.apache.org/mod_mbox/poi-user/201509.mbox/%3Calpine.DEB.2.02.1509221216360.18373%40urchin.earth.li%3E

If a formula contains one or more unregistered UDF's, a FormulaParseException is normally thrown.

I need a way to allow the formula to be parsed, using placeholder Ptgs for the unregistered functions. With this array of Ptgs, I can manipulate registered Ptgs as needed, then recreate the cell formula and write it to the cell.

Formulas containing unregistered functions will not be eligible for computation in the calculation chain.

Solution will be targeted for XSSFWorkbooks, but it's possible a fix may also be available for HSSFWorkbooks if the source cell that the formula is being copied from was created by Excel (that is, Excel has validated the formula and the Ptg is saved in the BIFF file).

From Nick via poi-user mailing list
> [Add formula parsing option] to skip / unchange unknown function names? (It 
> might actually be easier to do the copy in the Ptg space, rather than 
> Formula text space, as with the ptgs you could check for non-absolute 
> references and update just them. That'd mean you'd need a ways to flag to 
> skip these functions / leave them unchanged or something)

I'll hack away at this to see what I can come up with.
Comment 1 Javen O'Neal 2015-10-03 03:15:04 UTC
No code just yet. Here's what I have figured out so far (mostly a brain dump for me, a passive request for help/direction from formula/ptg experts, and a guide for future contributors):

Using testdata/spreadsheet/testNames.xls, UDF's (VBA macros) are saved in the workbook names (Workbook.getName(String)), along with named ranges.

Converting this to an Excel2007 files, the UDFs are not included in the workbook names.

This means that the behavior of FormulaParser.parse("MyFunc(\"arg\")", fpb, FormulaParser.CELL, -1); is different for Excel97 and Excel2007 booms when MyFunc is a UDF. Parse throws FormulaParseException if the formula contains a UDF that isn't in the workbook names for both.

For my application, I want to copy formulas and shift the cell references, so I will not be adding new formulas to the workbook. I'm not sure yet if I want to implement a solution that could handle new formula names (since that would create an invalid formula without adding the corresponding UDF to vbaProject.bin or replacing vbaProject.bin with a vbaProject.bin from another workbook.

Since XSSFWorkbook.getName() only includes named ranges, I'm assuming adding UDFs to XSSFWotkbook._names is not desired. Right now I'm looking at creating a variant of NamePtg that can hold information for XSSFWorkbook UDFs, so that FormulaParser.parse returns ptgs that could be rendered back to a formula string. Without registering a UDF with the same  name with the workbook via XSSFWorkbook.addToolPack(UDFFinder), trying to evaluate a formula containing a nkn-registered UDF will fail.

I could see where a user may rely on formula parse errors to detect unregistered UDFs in a formula string, so I could introduce this fix by requiring the user call Workbook.setAllowParsingUnregisteredUDFs(true), but I'm thinking there are very few if any scenarios where someone would want to disallow unregistered UDFs where they couldn't more directly detect it--such as checking for UnregisteredNamePtg instances in the result of parse or otherwise.
Comment 2 Javen O'Neal 2015-10-06 02:33:50 UTC
Created attachment 33167 [details]
supporting changes (java docs, add FormulaParsingWorkbook.createName)

Two approaches here how to handle when an "completely unknown name" is found in the workbook:

1) Add the name to the workbook and use that to create the NamePtg that is linked to the workbook.
> nameToken = _book.getNameXPtg(name, null);
> if (nameToken == null) {
>     // name is not an internal or external name
>     if (log.check(POILogger.WARN)) {
>         log.log(POILogger.WARN,
>                "Name '" + name + "' is completely unknown in the current workbook.");
>     }
>     // name is probably the name of an unregistered User-Defined Function
>     addName(name);
>     hName = _book.getName(name, _sheetIndex);
>     nameToken = hName.createPtg();
> }

2) Do not add the name to the workbook. Create a NamePtg that isn't linked to the workbook.
> nameToken = _book.getNameXPtg(name, null);
> if (nameToken == null) {
>     if (log.check(POILogger.WARN)) {
>         log.log(POILogger.WARN,
>                 "Name '" + name + "' is completely unknown in the current workbook.");
>     }
>     switch(_book.getSpreadsheetVersion()) {
>         case EXCEL97:
>             nameToken = new NameXPtg(name);
>             break;
>         case EXCEL2007:
>             nameToken = new NameXPxg(name);
>             break;
>         default:
>             throw new IllegalStateException("Unrecognized SpreadsheetVersion" + _book.getSpreadsheetVersion());
>     }
> }

The second case seems like a cleaner solution since it doesn't introduce weird side-effects (name gets added to workbook).
However, HSSFWorkbooks require all non-built-in-names (for named ranges and UDFs, registered or not) to have an entry in the internal or external names table. Thus, in order to save an HSSFWorkbook, we must add the name to the workbook, meaning option 1 is the only feasible option here. This is because the function name token is serialized as an index into the worksheet names table, not a string of the function name.

Workbook.createName is currently not accessible in FormulaParser.parse. In the attached patch, I add createName to the FormulaParsingWorkbook interface so that these names can be created.
FormulaParsingWorkbook {
  public Name createName() {
    return _uBook.createName()
Alternatively, we could just expose the _uBook Workbook via a getWorkbook() call, and that'd save us from proxying so many functions to the underlying Workbook. Let me know if you'd prefer that solution and I will upload the patches with those changes.
Comment 3 Javen O'Neal 2015-10-06 03:17:52 UTC
Created attachment 33168 [details]
main FormulaParser changes

When parsing a formula, if there is an unknown name in the formula, add the name to the workbook and continue parsing. This is consistent with how Excel works: it allows formulas to contain "unrecognized text" or "invalid name errors", but the formula result is #NAME?.

Formula evaluation is unchanged: a org.apache.poi.ss.formula.eval.NotImplementedFunctionException/NotImplementedException is thrown if a formula contains a UDF that isn't registered (workbook.addToolPack).

In this patch:
* FormulaParser.parse no longer throws FormulaParseException for formulas containing unregistered functions  such as '=myFunc("arg")'.
* modify TestExternalFunctions as necessary
* add FormulaParser unit tests to make sure formulas with invalid syntax still throw FormulaParseException.
Comment 4 Javen O'Neal 2015-10-06 17:47:34 UTC
Current patches (attachment 33167 [details] and attachment 33168 [details]) result in a corrupted XLSM workbook.
Comment 5 Javen O'Neal 2015-10-06 20:07:00 UTC
Created attachment 33170 [details]
testNames.xlsm - Test case spreadsheet

Add this attached workbook as /test-data/spreadsheet/testNames.xlsm
Set the appropriate mime-type: application/vnd.ms-excel.sheet.macroEnabled.12 or application/octet-stream

This workbook is based on /test-data/spreadsheet/testNames.xls [1] at r1614697. I used MS Excel 2013 to convert this to Excel 2007 format, and removed personal information.

[1] http://svn.apache.org/viewvc/poi/trunk/test-data/spreadsheet/testNames.xls?view=log&pathrev=1614697
Comment 6 Javen O'Neal 2015-10-06 20:18:56 UTC
Created attachment 33171 [details]
supporting changes (java docs, add FormulaParsingWorkbook.createName)

Supporting changes, including java docs updates where documentation was lacking
Comment 7 Javen O'Neal 2015-10-06 20:23:36 UTC
Created attachment 33172 [details]
main FormulaParser changes

main changes (commit after supporting changes and testNames.xlsm)

* For EXCEL97 workbooks, a defined name is required, so add one to the workbook
* For EXCEL2007 workbooks, external names are saved as strings instead of in a table and referenced by index, so create a NameXPxg without adding a defined name to the workbook.

I checked the output to make sure Excel is able to open the workbooks without complaining about corrupted contents. No issues in this build (/test-data/spreadsheet/testNames-saved.xls and testNames-saved.xlsm)
Comment 8 Javen O'Neal 2015-10-06 20:34:11 UTC
Created attachment 33173 [details]
main FormulaParser changes

Unit test should fail if evaluating the formula 'MYFUNC("B1")' does not raise a org.apache.poi.ss.formula.eval.NotImplementedFunctionException/NotImplementedException.
Comment 9 Javen O'Neal 2015-10-31 12:03:03 UTC
Applied supporting changes in r1711600, main changes in r1711605 to trunk

Site docs updated in r1711608.