Summary: | HYPERLINK() function needs a way to return link reference when display label is also specified | ||
---|---|---|---|
Product: | POI | Reporter: | Greg Woolsey <greg.woolsey> |
Component: | SS Common | Assignee: | POI Developers List <dev> |
Status: | NEW --- | ||
Severity: | enhancement | ||
Priority: | P2 | ||
Version: | 4.0.x-dev | ||
Target Milestone: | --- | ||
Hardware: | All | ||
OS: | All | ||
Attachments: | Workaround by overloading FunctionEval.functions |
Description
Greg Woolsey
2018-11-30 18:06:36 UTC
Current thinking involves augmenting FormulaEvaluator with a flag (or facility for multiple flags, perhaps an EnumMap?) to control which value is returned by the function in that invocation/evaluation. The Formula interface would take the OperationEvaluationContext in which it is being called, which would have access to the flags via the WorkbookEvaluator. WorkbookEvaluator appears to be the only class that ever constructs OperationEvaluationContext objects and uses them to evaluate functions. This would enable a given cell to be evaluated in the context of cached results in two different ways based on the state of the flags for the current FormulaEvaluator. Testing will be needed to see if the cache needs to be cleared when flags change, in order to properly re-process the formula. Also, this could have side effects for other cells whose formulas reference the cell with the HYPERLINK() formula call - we would want dependent cell calculations to continue to always use the display label result. Perhaps that means the cache mechanism needs to know about the presence or absence of flags also, and skip cache reads and writes when flags are set. Testing in Excel 2016/Office 365 shows that a cell with a formula containing the HYPERLINK() function anywhere in the formula converts the entire cell value to a link. If there is additional formula contents before or after the function call, those results are appended to BOTH the address and the display label. In my opinion, that's never going to be useful to anyone, and may be a case we don't need to handle, at least not at first. In practice, I think POI only needs to handle cases where the cell formula is just a single call to HYPERLINK(), regardless of the complexity of the argument formulas. This may mean the only useful use case is to access the address from the Cell object, similar to accessing hyperlink relations currently. Cell could have these methods: public boolean isHyperlinkFormula(); public String getHyperlinkFormulaAddress(); The first would just check if the formula starts with "HYPERLINK(" case insensitive. The second would set up a new FormulaEvaluator with the proper flag set, and evaluate the cell formula in that context. In the future, if performance with lots of link function calls is an issue, perhaps we could investigate ways to reuse the same FormulaEvaluator, but for safety I'd avoid it if possible, so we don't mess with related/dependent cell values expecting the display text result. I like this, as it could reduce the public API impact to just these method calls, and avoid misuse of the flag, which could result in more bug reports. Created attachment 36288 [details]
Workaround by overloading FunctionEval.functions
I had the same problem some time ago and used a workaround by putting a class into the org.apache.poi.ss.formula.eval package and accessing the protected FunctionEval.functions-array this way, see the attached sample code.
// MyHyperlink returns the URL Value instead of the normal text-value returned usually by the Hyperlink function
Function func = new MyHyperlink();
BuiltinFunctionsOverloader.replaceBuiltinFunction(359, func);
Maybe we can make it possible to override functions this way in FunctionEval natively via a static FunctionEval.replaceBuiltinFunction() and provide the alternative implementation of Hyperlink together with sample code? This would spare us from breaking existing interfaces.
PJ Fanning suggested another path, creating Hyperlink*Eval classes that extend the existing Eval classes, and implement an additional interface to provide String getAddress(); that way eval type checks will still pass, but an additional check can be made to see if the result is a hyperlink, and if so, get the target address. I like it - requires maintaining twice as many eval classes, but there aren't that many, and they don't change. My suggestion was to have `public Hyperlink getHyperlink()` which I think is better than `public String getAddress()` because Hyperlink as all the Hyperlink data (URL, display name, etc). I also came up with a workaround hack just now that I'll document here in case others find it helpful. I had a Cell, whose formula was HYPERLINK(...). It sometimes had one arg, sometimes two. Parsing the text with a REGEX was a nightmare, and often hit runaway cases. However, I realized I could leverage Excel formulas here, and POI formula parsing. I did this: String formula = cell.getCellFormula() .replaceFirst("(?i)hyperlink\\s*\\(", "IF(true, "); I can then evaluate the resulting IF formula instead, which will return the first argument from the former HYPERLINK function. So simple, and arguments can be arbitrarily complex Excel formulas involving all sorts of crazy stuff. |