Bug 56781 - XSSFName.validateName(String) Only Checks for the First Character's Validity and Presence of Spaces
Summary: XSSFName.validateName(String) Only Checks for the First Character's Validity ...
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.9-FINAL
Hardware: PC Mac OS X 10.4
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks: 60246
  Show dependency tree
 
Reported: 2014-07-28 21:55 UTC by Ryan O'Meara
Modified: 2016-10-14 10:12 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ryan O'Meara 2014-07-28 21:55:00 UTC
XSSFName.validateName(String) (and HSSFName.validateName(String)) each validate name input for named ranged (when set via "setNameName(String)'). This validation only checks that the first character is an underscore or letter, and that there are no spaces in the name. 

This does not prevent other invalid characters (such as '-') from being used within the body of the name. Upon opening the file in Excel 2010, excel claims there is corrupted data and attempts repair (which consists of removing the offending named range(s), after which the file is readable without issue)

Both of these validation methods should check the full length of the name for invalid characters. 

There is a forum post in the excel forums with someone asking what the valid useable chracters are. The most relevant response was:

"From Excel Help:

What characters are allowed? The first character of a name must be a letter, an underscore character (_), or a backslash (\).
Remaining characters in the name can be letters, numbers, periods, and underscore characters."

There are additional restrictions documented in the javadoc of HSSFName.setNameName and XSSFName.setNameName, such as maximum length.
Comment 1 Ryan O'Meara 2014-07-28 21:55:45 UTC
Side note: If I manage to get the POI environment setup, I'd be willing to take a crack at a patch for this
Comment 2 Nick Burch 2014-07-28 22:08:31 UTC
http://poi.apache.org/howtobuild.html and http://poi.apache.org/guidelines.html should hopefully cover you on getting started to contribute

FormulaParser currently has logic for deciding if a sequence of characters is a valid (anything) for a formula, my first hunch is it might be possible to perhaps re-use the name validating parts of that?
Comment 3 Javen O'Neal 2016-06-20 10:29:26 UTC
That Excel help forum may not be reputable (for example, unicode characters are allowed, allowing backslash as the first character but not subsequent characters raises flags).

Better would be documentation from Microsoft on allowed names in Excel 97-2003 and 2007+. This would need to be verified using Excel (preferably these files could be used for POI unit tests).

I made up a regular expression that validates the entire string in r1749293. This regular expression will likely need tweaking to match Excel's rules.
Comment 4 Javen O'Neal 2016-06-20 11:28:56 UTC
Reverted back to using Character.isLetter in r1749305 because Regular Expressions \p{IsAlphabetic} metaclass is not supported in Java 6.
Comment 5 Javen O'Neal 2016-10-14 10:12:05 UTC
Fixed in r1764854.