46015 – RegEx: Allow less greedy regular expressions

Issue 46015 - RegEx: Allow less greedy regular expressions

Summary: RegEx: Allow less greedy regular expressions

Status:	CLOSED OBSOLETE

Alias:	None

Product:	General
Classification:	Code
Component:	ui (show other issues)
Version:	OOo 2.0 Beta
Hardware:	All All

Importance:	P4 Trivial with 15 votes (vote)
Target Milestone:	---
Assignee:	AOO issues mailing list
QA Contact:

URL:
Keywords:

Depends on:
Blocks:

Reported:	2005-03-25 17:02 UTC by erpel
Modified:	2017-05-20 10:04 UTC (History)
CC List:	3 users (show)

See Also:
Issue Type:	ENHANCEMENT
Latest Confirmation in:	---
Developer Difficulty:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description erpel 2005-03-25 17:02:41 UTC

At the moment, regular expressions in OOo are always greedy. It would be nice to 
be able to make them a bit less greedy... :-) (In other regexs, this is often 
implemented by a Question Mark.)

In combination with issue 15666, the Search-And-Replace with regular expressions 
would be much more powerful.

For more information about the greediness of regexs, look at section 5.2 (para 
"And Quantifier Again") of:
http://www.regenechsen.de/regex_en/regex_3_en.html


<Quote>
Ok, now let's move on to some special behaviour relating to quantifiers, Some of 
them have a 'human' peculiarity: they are greedy! You don't believe that? Well, 
look at the following string <g>:

"The abbreviation 'ISP' stands for 'Internet Service Provider'."

We want a regex that finds the text that is enclosed by inverted commas and 
stores it in a subpattern:

"(.*)'(.*)'.*"

Nothing difficult really: find everything that comes before an inverted comma, 
then everything in between and finally everything that followsâ€¦

And? Did you try it on the regex-tester? What is in subpattern 2? "Internet 
Service Provider". Ooops, I expected "ISP" because it comes first in the string. 
:-o It is quite obvious that the first group (.*) greedily matched most of the 
string and left only what was at least necessary for subpattern 2 to match the 
whole string. Furthermore, the last element ".*" in the regex allowed 'nothing' 
or void to follow. Keeping this in mind: this part leads to a successful match 
even if nothing is to be matched. The star stands for as many appearances as 
there are or none at all!

[...]

Conclusion: Alter the Regex to "(.*?)'(.*?)'.*".
</Quote>

Comment 1 mci 2005-03-29 07:30:15 UTC

Hi erpel,
thanks for using and supporting OpenOffice.org...

reassigned to requirements...

Comment 2 tombil 2006-03-02 16:10:58 UTC

*** Issue 46015 has been confirmed by votes. ***

Comment 3 macadamia 2008-01-15 20:57:01 UTC

This is a important feature! Example: I have to search and replace expressions
in double-square-brackets like [[this is some text]]. In the moment it is not
possible to search for such strings if there are more than one in a paragraph!
Example: in "sentence is [[123]] an so on [[456789]] etc." the search for the
regex \[\[.*\]\] returns "[[123]] an so on [[456789]]" because it is "greedy".
There is really urgent need for the "lazy"-operator "?".

Comment 4 hanya 2015-01-12 16:29:43 UTC

Since Apache OpenOffice 3.4, the engine of the regular expression has been 
replaced with the one from ICU and more powreful. The requested function should work with it.