Issue 85269 - Regular expressions should use standard syntax for named class
Summary: Regular expressions should use standard syntax for named class
Status: CLOSED WONT_FIX
Alias: None
Product: General
Classification: Code
Component: ui (show other issues)
Version: OOo 2.3.1
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: thorsten.martens
QA Contact: issues@framework
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-01-15 16:55 UTC by Joe Smith
Modified: 2008-02-13 11:37 UTC (History)
2 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Joe Smith 2008-01-15 16:55:26 UTC
[This is a split from issue 64368]

OOo supports named character classes in regular expressions, e.g. "[:space:]"
for any character from the class "white space".

However, the syntax supported by OOo does not conform to the POSIX standard,
which specifies that the named classes are only valid within a character
class[1]. To use the named white space class within a regular expression, you
have to write it as "[[:space:]]". OOo instead does exactly the opposite: the
named classes are only recognized if they are used outside a regular character
class.

This leads to confusion when a user is familiar with standard regexp syntax, or
when external regexp documentation is consulted. OOo's behavior is
unique--unlike any other regular expression implementation.

It also leads to loss of functionality in the regular expression language. E.g.,
there is no way in OOo to express a negated named class. In standard POSIX
syntax, you can write "[^[:space:]]".

Further, some patterns are more difficult to write and debug. In standard POSIX
syntax, the character class "white space or period" can be expressed using a
straightforward class: "[[:space:].]". In OOo, you have to use the more complex
alternative syntax: "([:space:]|\.)"

I first raised this problem in the context of issue 64368, but it seems clear
that it should exist as a separate issue, since it is a distinct problem and any
discussion of it distracts from the other. However, this issue encloses the
other (does it make sense to fix matching errors when the syntax is wrong?), so
perhaps this one should be added as blocking the other, or as a dependency; I'm
not sure what the policy is, so I leave that to someone else.

[1] Open Group "Base Definitions" Chapter 9, section 9.3.5 RE Bracket
Expression:
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_03_05
Comment 1 ooo 2008-01-15 18:48:34 UTC
This issue is quite moot except one purpose: block all other regex issues, which
actually is a good idea. Before OOo would touch functionality of the regex
engine we'd rather switch to a different engine, namely that of ICU, based on
Perl regular expressions. For details please see
http://icu-project.org/userguide/regexp.html
Comment 2 Joe Smith 2008-01-15 21:17:08 UTC
It should've been filed separately to begin with; now it's too late to be of any
use: is that about it? ;-)

No problem: close; change; ignore -- whatever works.
Comment 3 thorsten.martens 2008-02-13 11:37:05 UTC
.
Comment 4 thorsten.martens 2008-02-13 11:37:35 UTC
closed