Bug 58648 - FormulaParser throws exception in parseSimpleFactor() when getCellFormula() is called on a cell and the formula contains spaces between closing parentheses ") )"
Summary: FormulaParser throws exception in parseSimpleFactor() when getCellFormula() i...
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: SS Common (show other bugs)
Version: 3.13-FINAL
Hardware: PC All
: P2 regression with 8 votes (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
: 59271 59652 (view as bug list)
Depends on: 52111
Blocks: 59652
  Show dependency tree
 
Reported: 2015-11-24 21:53 UTC by Peter Davison
Modified: 2016-10-19 08:04 UTC (History)
2 users (show)



Attachments
skip whitespace before matching close parentheses (641 bytes, patch)
2015-11-25 09:43 UTC, Javen O'Neal
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Davison 2015-11-24 21:53:06 UTC
To reproduce:

Define an XSSFCell with a formula like: "((1 + 1) )".
Note the extra space between the closing parentheses.

Or create a spreadsheet with a cell with the formula specified above, save it and then read it in as a workbook.  Navigate via POI to the sheet, row, and cell in question.

Call getCellFormula() on the cell.

Results in an exception: 
org.apache.poi.ss.formula.FormulaParseException: Parse error near char ... ')' in specified formula '((1 + 1) )'. Expected cell ref or constant literal
	at org.apache.poi.ss.formula.FormulaParser.expected(FormulaParser.java:208)
	at org.apache.poi.ss.formula.FormulaParser.parseSimpleFactor(FormulaParser.java:1148)
	at org.apache.poi.ss.formula.FormulaParser.percentFactor(FormulaParser.java:1103)
	at org.apache.poi.ss.formula.FormulaParser.powerFactor(FormulaParser.java:1090)
	at org.apache.poi.ss.formula.FormulaParser.Term(FormulaParser.java:1450)
	at org.apache.poi.ss.formula.FormulaParser.additiveExpression(FormulaParser.java:1570)
	at org.apache.poi.ss.formula.FormulaParser.concatExpression(FormulaParser.java:1554)
	at org.apache.poi.ss.formula.FormulaParser.comparisonExpression(FormulaParser.java:1511)
	at org.apache.poi.ss.formula.FormulaParser.intersectionExpression(FormulaParser.java:1499)
	at org.apache.poi.ss.formula.FormulaParser.unionExpression(FormulaParser.java:1472)
	at org.apache.poi.ss.formula.FormulaParser.parseSimpleFactor(FormulaParser.java:1131)
	at org.apache.poi.ss.formula.FormulaParser.percentFactor(FormulaParser.java:1103)
	at org.apache.poi.ss.formula.FormulaParser.powerFactor(FormulaParser.java:1090)
	at org.apache.poi.ss.formula.FormulaParser.Term(FormulaParser.java:1450)
	at org.apache.poi.ss.formula.FormulaParser.additiveExpression(FormulaParser.java:1570)
	at org.apache.poi.ss.formula.FormulaParser.concatExpression(FormulaParser.java:1554)
	at org.apache.poi.ss.formula.FormulaParser.comparisonExpression(FormulaParser.java:1511)
	at org.apache.poi.ss.formula.FormulaParser.Arguments(FormulaParser.java:1076)
	at org.apache.poi.ss.formula.FormulaParser.function(FormulaParser.java:963)
	at org.apache.poi.ss.formula.FormulaParser.parseNonRange(FormulaParser.java:556)
	at org.apache.poi.ss.formula.FormulaParser.parseRangeable(FormulaParser.java:524)
	at org.apache.poi.ss.formula.FormulaParser.parseRangeExpression(FormulaParser.java:257)
	at org.apache.poi.ss.formula.FormulaParser.parseSimpleFactor(FormulaParser.java:1143)
	at org.apache.poi.ss.formula.FormulaParser.percentFactor(FormulaParser.java:1103)
	at org.apache.poi.ss.formula.FormulaParser.powerFactor(FormulaParser.java:1090)
	at org.apache.poi.ss.formula.FormulaParser.Term(FormulaParser.java:1450)
	at org.apache.poi.ss.formula.FormulaParser.additiveExpression(FormulaParser.java:1570)
	at org.apache.poi.ss.formula.FormulaParser.concatExpression(FormulaParser.java:1554)
	at org.apache.poi.ss.formula.FormulaParser.comparisonExpression(FormulaParser.java:1511)
	at org.apache.poi.ss.formula.FormulaParser.intersectionExpression(FormulaParser.java:1492)
	at org.apache.poi.ss.formula.FormulaParser.unionExpression(FormulaParser.java:1472)
	at org.apache.poi.ss.formula.FormulaParser.parse(FormulaParser.java:1612)
	at org.apache.poi.ss.formula.FormulaParser.parse(FormulaParser.java:153)
	at org.apache.poi.xssf.usermodel.XSSFCell.convertSharedFormula(XSSFCell.java:421)
	at org.apache.poi.xssf.usermodel.XSSFCell.getCellFormula(XSSFCell.java:393)

Remove the white space between the two closing parentheses and the exception goes away.

Proposed fix:
Add an additional call to "SkipWhite()" before the call to "Match(')');" inside the case for '('.  Near line 1130 in parseSimpleFactor method of FormulaParser.java (release version 3.13).
Comment 1 Javen O'Neal 2015-11-25 09:43:57 UTC
Created attachment 33295 [details]
skip whitespace before matching close parentheses

Thanks for finding this problem!
I added a unit tests that shows this problem in r1716338, and can verify that this problem still exists in the latest dev build (2015-11-25, r1716338).

The attached patch, which I think is what your suggested fix was from comment 0, still fails the unit test. Were you able to get a passing build? If so, could you add your patch/diff as an attachment?
Comment 2 Peter Davison 2015-11-25 16:14:36 UTC
Sorry no.  I haven't tried building the code, just been using jars and src-jars up til now.
My suggested fix was based on a quick perusal of the source code.  It may not actually fix the problem.
Comment 3 Javen O'Neal 2015-11-25 19:32:39 UTC
The second Match() call without another SkipWhite() looked suspiscious, so good guess. I guess whoever works on this bug will become intimately familiar with the deep call-stack that is used for PTG parsing.
Comment 4 Dominik Stadler 2016-04-04 19:24:19 UTC
*** Bug 59271 has been marked as a duplicate of this bug. ***
Comment 5 Dominik Stadler 2016-04-04 19:48:00 UTC
Bug 59271 provides a testcase with a testfile
Comment 6 Dominik Stadler 2016-04-04 20:17:43 UTC
Did some initial analysis: the problem is not skipping whitespaces, but the Excel "intersection operator", which is ... TADA .. a blank! 

Bad idea to use a char for something when you also allow the same char to act as arbitrary whitespace in formulas at various places. Seems we need to put logic in place that at first tries to parse the pieces as intersection and if that fails skip it as whitespace. Quite bad for reproducible behaviour and good error messages. :(
Comment 7 Dominik Stadler 2016-04-04 20:25:31 UTC
My approach would be to handle this in FormulaParser.intersectionExpression() as follows, however I fear that this might lead to hard to track errors with formula parsing and/or misleading error messages, any thoughts or ideas how to do this better? I.e. maybe intersection is only possible in some specific places in the formula and we can only allow it there in the first place?

   private ParseNode intersectionExpression() {
		ParseNode result = comparisonExpression();
		boolean hasIntersections = false;
		while (true) {
			SkipWhite();
			if (_inIntersection) {
				int savePointer = _pointer;

				// Don't getChar() as the space has already been eaten and recorded by SkipWhite().
				try {
					ParseNode other = comparisonExpression();
					result = new ParseNode(IntersectionPtg.instance, result, other);
					hasIntersections = true;
					continue;
				} catch (FormulaParseException e) {
					// if parsing for intersection fails we assume that we actually had an arbitrary
					// whitespace and thus should simply skip this whitespace
					resetPointer(_pointer);
				}
			}
			if (hasIntersections) {
				return augmentWithMemPtg(result);
			}
			return result;
		}
	}
Comment 9 Kai G 2016-04-05 09:22:11 UTC
Wow using a blank as an operator is really weird - I was not aware of that. Did we add support for the intersection in 3.13 or what changes between 3.12 & 3.13 caused this issue?
Comment 10 Dominik Stadler 2016-04-05 10:10:09 UTC
Seems to have been introduced by bug 52111 when support for the intersection operator was added.
Comment 11 Kai G 2016-04-05 14:38:18 UTC
That makes sense. Unfortunately I don't know the whole formula parsing in POI too well, so I don't know if your suggested fix would work ;-(
Comment 12 Javen O'Neal 2016-04-05 16:17:12 UTC
Intersection is whitespace between two cell range/area PTGs. If the two operands don't evaluate to ranges, treat the whitespace as just whitespace. Is multiple whitespace characters as the operator a valid intersection expression?
Comment 13 Kai G 2016-04-06 12:03:50 UTC
Well in excel both

=A1:B1 B1:B2
as well as 
=A1:B1    B1:B2

are valid and return the same value (b) in the following table

 | A | B |  
1| a | b |
2| a | a |
Comment 14 Dominik Stadler 2016-04-06 19:54:01 UTC
I have applied my proposed simple approach via r1738033 and added a number of tests which verify that all the cases that I could think of seem to work fine now. If there is any that I missed, please report it as new bug so we can fix it and add additional unit-tests.
Comment 15 Dominik Stadler 2016-08-13 20:43:25 UTC
*** Bug 59652 has been marked as a duplicate of this bug. ***
Comment 16 Javen O'Neal 2016-10-19 08:04:45 UTC
Re-enabled unit tests in r1765548.