Bug 59719 - XSSFDataValidationConstraint doesn't parse static list text properly
Summary: XSSFDataValidationConstraint doesn't parse static list text properly
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.15-dev
Hardware: All All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-06-18 05:30 UTC by Greg Woolsey
Modified: 2016-06-20 05:01 UTC (History)
0 users



Attachments
patch to properly parse static list validation constraints (1.75 KB, text/plain)
2016-06-18 05:30 UTC, Greg Woolsey
Details
updates to use constants, add comments, and another unit test case (2.48 KB, patch)
2016-06-19 21:24 UTC, Greg Woolsey
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Greg Woolsey 2016-06-18 05:30:05 UTC
Created attachment 33959 [details]
patch to properly parse static list validation constraints

When you add a LIST type validation to a cell in Excel, and type a comma separated list of values to accept instead of using a range reference or named range, the value is stored in the XLSX sheet XML as a "formula1" string:

"one, two, three"

named ranges as list validation values are stored in formula1 as

range_name

note the lack of double quotes.

Currently, XSSFDataValidationConstraint just splits all "formula1" strings by comma.  This makes the first and last values include the double quotes, and all values contain any whitespace leading or trailing a comma.

Excel automatically trims whitespace when parsing the CSV value strings.

The attached patch checks for and removes the enclosing quotes, and uses a compiled pattern to split by commas with optional whitespace.

If the formula isn't enclosed in double quotes, it is not parsed into the explicit values array.

I can't tell from inspecting XLSX files how Excel knows whether a formula is a list of values or a range, other than the quotes, or falling back on a list if evaluating as a range fails.
Comment 1 Javen O'Neal 2016-06-19 04:45:14 UTC
Applied in r1749129.
Could you write a unit test that shows a scenario where the previous code failed and the patched code works?
Comment 2 Javen O'Neal 2016-06-19 06:13:00 UTC
I took a stab at writing a unit test in r1749131 given your description in comment 0. Please let me know if any changes are needed to the unit test, as I have never used DataValidationConstraints before.

If no changes are needed, let me know and I will close the bug.
Comment 3 Greg Woolsey 2016-06-19 21:24:10 UTC
Created attachment 33964 [details]
updates to use constants, add comments, and another unit test case

My latest patch attachment uses the new QUOTE constant one more place and adds one more unit test, validating the formula1 value is set properly from the constructor that accepts an array of string literals.

I think with these and your other tests and changes this is ready to go.  Why the OOXML format overloads formula1 this way is baffling to me.  I bet the old binary format had some similar form of overloading.
Comment 4 Javen O'Neal 2016-06-20 05:01:56 UTC
It seems like there are more entry points to XSSFDataValidation than necessary, and each of the entry points end up initializing the member variables in subtly different ways (leading left quote gets stripped, both quotes get stripped, neither quote gets stripped; whitespace is removed or not removed from the formula, etc).

Additionally, even if the storage format overloads formula1, it doesn't mean we need to overload it.

Since you have experience with this class, could you provide a recommendation on what constructors or methods should be deprecated to make this class both easier to maintain and fool-proof for users to use? Save that for a future bug, I guess.

Applied via r1749265.