Apache OpenOffice (AOO) Bugzilla – Issue 46015
RegEx: Allow less greedy regular expressions
Last modified: 2017-05-20 10:04:42 UTC
At the moment, regular expressions in OOo are always greedy. It would be nice to be able to make them a bit less greedy... :-) (In other regexs, this is often implemented by a Question Mark.) In combination with issue 15666, the Search-And-Replace with regular expressions would be much more powerful. For more information about the greediness of regexs, look at section 5.2 (para "And Quantifier Again") of: http://www.regenechsen.de/regex_en/regex_3_en.html <Quote> Ok, now let's move on to some special behaviour relating to quantifiers, Some of them have a 'human' peculiarity: they are greedy! You don't believe that? Well, look at the following string <g>: "The abbreviation 'ISP' stands for 'Internet Service Provider'." We want a regex that finds the text that is enclosed by inverted commas and stores it in a subpattern: "(.*)'(.*)'.*" Nothing difficult really: find everything that comes before an inverted comma, then everything in between and finally everything that follows… And? Did you try it on the regex-tester? What is in subpattern 2? "Internet Service Provider". Ooops, I expected "ISP" because it comes first in the string. :-o It is quite obvious that the first group (.*) greedily matched most of the string and left only what was at least necessary for subpattern 2 to match the whole string. Furthermore, the last element ".*" in the regex allowed 'nothing' or void to follow. Keeping this in mind: this part leads to a successful match even if nothing is to be matched. The star stands for as many appearances as there are or none at all! [...] Conclusion: Alter the Regex to "(.*?)'(.*?)'.*". </Quote>
Hi erpel, thanks for using and supporting OpenOffice.org... reassigned to requirements...
*** Issue 46015 has been confirmed by votes. ***
This is a important feature! Example: I have to search and replace expressions in double-square-brackets like [[this is some text]]. In the moment it is not possible to search for such strings if there are more than one in a paragraph! Example: in "sentence is [[123]] an so on [[456789]] etc." the search for the regex \[\[.*\]\] returns "[[123]] an so on [[456789]]" because it is "greedy". There is really urgent need for the "lazy"-operator "?".
Since Apache OpenOffice 3.4, the engine of the regular expression has been replaced with the one from ICU and more powreful. The requested function should work with it.