Issue 46015 - RegEx: Allow less greedy regular expressions
Summary: RegEx: Allow less greedy regular expressions
Status: CLOSED OBSOLETE
Alias: None
Product: General
Classification: Code
Component: ui (show other issues)
Version: OOo 2.0 Beta
Hardware: All All
: P4 Trivial with 15 votes (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-03-25 17:02 UTC by erpel
Modified: 2017-05-20 10:04 UTC (History)
3 users (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description erpel 2005-03-25 17:02:41 UTC
At the moment, regular expressions in OOo are always greedy. It would be nice to 
be able to make them a bit less greedy... :-) (In other regexs, this is often 
implemented by a Question Mark.)

In combination with issue 15666, the Search-And-Replace with regular expressions 
would be much more powerful.

For more information about the greediness of regexs, look at section 5.2 (para 
"And Quantifier Again") of:
http://www.regenechsen.de/regex_en/regex_3_en.html


<Quote>
Ok, now let's move on to some special behaviour relating to quantifiers, Some of 
them have a 'human' peculiarity: they are greedy! You don't believe that? Well, 
look at the following string <g>:

"The abbreviation 'ISP' stands for 'Internet Service Provider'."

We want a regex that finds the text that is enclosed by inverted commas and 
stores it in a subpattern:

"(.*)'(.*)'.*"

Nothing difficult really: find everything that comes before an inverted comma, 
then everything in between and finally everything that follows…

And? Did you try it on the regex-tester? What is in subpattern 2? "Internet 
Service Provider". Ooops, I expected "ISP" because it comes first in the string. 
:-o It is quite obvious that the first group (.*) greedily matched most of the 
string and left only what was at least necessary for subpattern 2 to match the 
whole string. Furthermore, the last element ".*" in the regex allowed 'nothing' 
or void to follow. Keeping this in mind: this part leads to a successful match 
even if nothing is to be matched. The star stands for as many appearances as 
there are or none at all!

[...]

Conclusion: Alter the Regex to "(.*?)'(.*?)'.*".
</Quote>
Comment 1 mci 2005-03-29 07:30:15 UTC
Hi erpel,
thanks for using and supporting OpenOffice.org...

reassigned to requirements...
Comment 2 tombil 2006-03-02 16:10:58 UTC
*** Issue 46015 has been confirmed by votes. ***
Comment 3 macadamia 2008-01-15 20:57:01 UTC
This is a important feature! Example: I have to search and replace expressions
in double-square-brackets like [[this is some text]]. In the moment it is not
possible to search for such strings if there are more than one in a paragraph!
Example: in "sentence is [[123]] an so on [[456789]] etc." the search for the
regex \[\[.*\]\] returns "[[123]] an so on [[456789]]" because it is "greedy".
There is really urgent need for the "lazy"-operator "?".
Comment 4 hanya 2015-01-12 16:29:43 UTC
Since Apache OpenOffice 3.4, the engine of the regular expression has been 
replaced with the one from ICU and more powreful. The requested function should work with it.