Apache OpenOffice (AOO) Bugzilla – Issue 85269
Regular expressions should use standard syntax for named class
Last modified: 2008-02-13 11:37:35 UTC
[This is a split from issue 64368] OOo supports named character classes in regular expressions, e.g. "[:space:]" for any character from the class "white space". However, the syntax supported by OOo does not conform to the POSIX standard, which specifies that the named classes are only valid within a character class[1]. To use the named white space class within a regular expression, you have to write it as "[[:space:]]". OOo instead does exactly the opposite: the named classes are only recognized if they are used outside a regular character class. This leads to confusion when a user is familiar with standard regexp syntax, or when external regexp documentation is consulted. OOo's behavior is unique--unlike any other regular expression implementation. It also leads to loss of functionality in the regular expression language. E.g., there is no way in OOo to express a negated named class. In standard POSIX syntax, you can write "[^[:space:]]". Further, some patterns are more difficult to write and debug. In standard POSIX syntax, the character class "white space or period" can be expressed using a straightforward class: "[[:space:].]". In OOo, you have to use the more complex alternative syntax: "([:space:]|\.)" I first raised this problem in the context of issue 64368, but it seems clear that it should exist as a separate issue, since it is a distinct problem and any discussion of it distracts from the other. However, this issue encloses the other (does it make sense to fix matching errors when the syntax is wrong?), so perhaps this one should be added as blocking the other, or as a dependency; I'm not sure what the policy is, so I leave that to someone else. [1] Open Group "Base Definitions" Chapter 9, section 9.3.5 RE Bracket Expression: http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_03_05
This issue is quite moot except one purpose: block all other regex issues, which actually is a good idea. Before OOo would touch functionality of the regex engine we'd rather switch to a different engine, namely that of ICU, based on Perl regular expressions. For details please see http://icu-project.org/userguide/regexp.html
It should've been filed separately to begin with; now it's too late to be of any use: is that about it? ;-) No problem: close; change; ignore -- whatever works.
.
closed