Bug 62969 - HYPERLINK() function needs a way to return link reference when display label is also specified
Summary: HYPERLINK() function needs a way to return link reference when display label ...
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: SS Common (show other bugs)
Version: 4.0.x-dev
Hardware: All All
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-30 18:06 UTC by Greg Woolsey
Modified: 2019-03-01 20:37 UTC (History)
0 users



Attachments
Workaround by overloading FunctionEval.functions (10.00 KB, application/x-tar)
2018-12-01 08:54 UTC, Dominik Stadler
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Greg Woolsey 2018-11-30 18:06:36 UTC
This function has special behavior in Excel.  It is dynamically displayed as a link, using the first argument as the target and the second, optional one as the display text.  If there is only one argument, the link address is used for display.

The POI function currently returns the displayed value, which is correct in most cases, as that's what the user sees as the cell value.

However, if one wants to display the cell contents in some interactive format, there is currently no way to get the address from the formula in the presence of a defined display label, other than to read the raw formula string, parse it for the first function argument, and then evaluate that string as a formula value.

While that is possible, it is fragile and requires proper handling of the complex formula syntax in the raw value.

This issue is to track discussion from the related POI Users email list thread, and document design decisions and implementation.
Comment 1 Greg Woolsey 2018-11-30 18:06:56 UTC
Current thinking involves augmenting FormulaEvaluator with a flag (or facility for multiple flags, perhaps an EnumMap?) to control which value is returned by the function in that invocation/evaluation.

The Formula interface would take the OperationEvaluationContext in which it is being called, which would have access to the flags via the WorkbookEvaluator. WorkbookEvaluator appears to be the only class that ever constructs OperationEvaluationContext objects and uses them to evaluate functions.

This would enable a given cell to be evaluated in the context of cached results in two different ways based on the state of the flags for the current FormulaEvaluator.

Testing will be needed to see if the cache needs to be cleared when flags change, in order to properly re-process the formula.

Also, this could have side effects for other cells whose formulas reference the cell with the HYPERLINK() formula call - we would want dependent cell calculations to continue to always use the display label result.

Perhaps that means the cache mechanism needs to know about the presence or absence of flags also, and skip cache reads and writes when flags are set.
Comment 2 Greg Woolsey 2018-11-30 18:21:17 UTC
Testing in Excel 2016/Office 365 shows that a cell with a formula containing the HYPERLINK() function anywhere in the formula converts the entire cell value to a link.

If there is additional formula contents before or after the function call, those results are appended to BOTH the address and the display label. In my opinion, that's never going to be useful to anyone, and may be a case we don't need to handle, at least not at first.

In practice, I think POI only needs to handle cases where the cell formula is just a single call to HYPERLINK(), regardless of the complexity of the argument formulas.

This may mean the only useful use case is to access the address from the Cell object, similar to accessing hyperlink relations currently.

Cell could have these methods:

public boolean isHyperlinkFormula();
public String getHyperlinkFormulaAddress();

The first would just check if the formula starts with "HYPERLINK(" case insensitive.

The second would set up a new FormulaEvaluator with the proper flag set, and evaluate the cell formula in that context.

In the future, if performance with lots of link function calls is an issue, perhaps we could investigate ways to reuse the same FormulaEvaluator, but for safety I'd avoid it if possible, so we don't mess with related/dependent cell values expecting the display text result.

I like this, as it could reduce the public API impact to just these method calls, and avoid misuse of the flag, which could result in more bug reports.
Comment 3 Dominik Stadler 2018-12-01 08:54:24 UTC
Created attachment 36288 [details]
Workaround by overloading FunctionEval.functions

I had the same problem some time ago and used a workaround by putting a class into the org.apache.poi.ss.formula.eval package and accessing the protected FunctionEval.functions-array this way, see the attached sample code.

// MyHyperlink returns the URL Value instead of the normal text-value returned usually by the Hyperlink function
Function func = new MyHyperlink();

BuiltinFunctionsOverloader.replaceBuiltinFunction(359, func);


Maybe we can make it possible to override functions this way in FunctionEval natively via a static FunctionEval.replaceBuiltinFunction() and provide the alternative implementation of Hyperlink together with sample code? This would spare us from breaking existing interfaces.
Comment 4 Greg Woolsey 2018-12-03 17:12:14 UTC
PJ Fanning suggested another path, creating Hyperlink*Eval classes that extend the existing Eval classes, and implement an additional interface to provide 

String getAddress();

that way eval type checks will still pass, but an additional check can be made to see if the result is a hyperlink, and if so, get the target address.

I like it - requires maintaining twice as many eval classes, but there aren't that many, and they don't change.
Comment 5 PJ Fanning 2018-12-03 17:52:18 UTC
My suggestion was to have `public Hyperlink getHyperlink()` which I think is better than `public String getAddress()` because Hyperlink as all the Hyperlink data (URL, display name, etc).
Comment 6 Greg Woolsey 2019-03-01 20:37:47 UTC
I also came up with a workaround hack just now that I'll document here in case others find it helpful.

I had a Cell, whose formula was HYPERLINK(...).  It sometimes had one arg, sometimes two.  Parsing the text with a REGEX was a nightmare, and often hit runaway cases.

However, I realized I could leverage Excel formulas here, and POI formula parsing.

I did this:

        String formula = cell.getCellFormula()
                .replaceFirst("(?i)hyperlink\\s*\\(", "IF(true, ");


I can then evaluate the resulting IF formula instead, which will return the first argument from the former HYPERLINK function.  So simple, and arguments can be arbitrarily complex Excel formulas involving all sorts of crazy stuff.