Issue 46165

Summary: Regular expressions work inconsistently or not at all when combined.
Product: Writer Reporter: fran <frachiar>
Component: editingAssignee: AOO issues mailing list <issues>
Status: CONFIRMED --- QA Contact:
Severity: Trivial    
Priority: P3 CC: andrew_dowden, cno, hgtoo, issues, kpalagin, mike.hall, openoffice, stefan.baltzer
Version: OOo 1.1.3   
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: DEFECT Latest Confirmation in: ---
Developer Difficulty: ---

Description fran 2005-03-28 16:52:56 UTC
I know that with "^$" I can find empty paragraphs, but it is not possible to 
find sequences, like "^$^$", in the case of wanting 2 empty paragraphs next 
each other.
Comment 1 eric.savary 2005-03-28 17:10:59 UTC
Reassigned
Comment 2 eric.savary 2005-06-28 08:18:31 UTC
as described
Comment 3 wmstrome 2005-12-07 14:52:18 UTC
The feature that I miss most in OO is an ability to "Search and Replace" the new
paragraph, new line,  or <CR> character.  If you search for the regular
expression \n, you can find "soft new line" or <Shift><Enter> and can also
replace it with a hard return or <Enter> but as far as I can tell, you cannot
search for the <Enter> in a document.
Comment 4 dowdenan 2006-04-09 02:33:08 UTC
Also (and very similar to this issue)

Not enough capability in REGEX (regular expression) capability:
 -  can't search for  /text to be italic/  or  *text to be bold*
   (No ability to address content of match, only whole match)

 -  can't replace  text(space)(newline)text  with  text(space)text
   (REGEX only understands 'this line/paragraph' as search target.)
   WORK-AROUND: cut/paste to other application (I use EditPad Pro),
     fix text using REGEX search/replace, then cut/paste back.

 -  current method: find [ ^$ ], replace [(nothing)] also has problems
   (will ALSO remove any page-break attached to blank lines)

 -  find / replace (only change attributes) is also unsafe
    (find [(REGEX expression)], replace [&], bold/italic
      will sometimes result in '&' (the character), replacing target)
Comment 5 stefan.baltzer 2007-03-08 16:47:19 UTC
*** Issue 75214 has been marked as a duplicate of this issue. ***
Comment 6 stefan.baltzer 2007-03-08 17:00:57 UTC
SBA: I adjusted the summary to reflect the general problem. From issue 75214,
there are more examples:
(1) "\n" and "\t" do not work in square brackets
(2) "^" works differently in square brackets
(3) "\n" finds a line break but inserts a paragraph break

I do not regard this as a collective issue. The many existing issues about
Regular Expressions show that this area needs a "general rework" so that regular
expressions may become a consistent, powerful and INTUITIVE tool for the user.
Comment 7 stefan.baltzer 2007-03-08 17:38:04 UTC
*** Issue 70554 has been marked as a duplicate of this issue. ***
Comment 8 dowdenan 2007-03-08 20:10:58 UTC
> .. this area needs a "general rework" so that regular expressions may
> become a consistent, powerful and INTUITIVE tool for the user.

Have been considering a spec. for just that, and will try and refine it to post
as attachment to this issue.

Is there any consensus on what level we need to try to target:  eg.
  (a) expert regex user,
  (b) journeyman regex, OR
  (c) basic regex + 'non-regex' wrapper/wizard for neophytes.

by (b) I mean: ".. not as good as say  EditPad Pro , but better than that other
   rival product."
Comment 9 kpalagin 2007-03-08 20:56:59 UTC
For (c) we have RFE http://www.openoffice.org/issues/show_bug.cgi?id=63074 
already.
We also have RFE http://www.openoffice.org/issues/show_bug.cgi?id=28913
Comment 10 singalen 2007-05-30 14:00:31 UTC
When I checked, there were no non-greedy patterns.
Comment 11 stefan.baltzer 2007-07-23 16:39:21 UTC
*** Issue 70554 has been marked as a duplicate of this issue. ***
Comment 12 gudmund 2007-11-25 11:49:33 UTC
"My" issue, http://www.openoffice.org/issues/show_bug.cgi?id=70554 was closed as
a duplicate of this one. I've been looking at
http://www.openoffice.org/issues/show_bug.cgi?id=15666, which looked promising,
but now doesn't seem to address the problems in this issue or 7054.

In short, what issue 70554 is about, is giving an ordinary user the ability to
- search and find line breaks, any kind
- search and find paragraph breaks, any kind
- substitute any of the above, be it one or many with one or many of any
combination of the above.

Some *potentially* interesting info from drking in 15666 (which is targeted for
2.4):
"The good news is that if OOo migrates to the ICU regex engine, many of the
existing issues may be resolved at a stroke. Although (looking at the ICU regex
spec) probably not all of them."

How likely this is to happen, or if it indeed solves this issue, I don't know.
Is there a "general rework" in progress? What issue should one look at to see
what's up, if any?

@SBA: in 70554 you say "You can as well produce an entire specification
(better use the spec template :-) and attach it to that issue.". 

What do you mean by spec template? If there's something I am able to do that may
contribute to getting this fixed, I'll do what I can, of course.
Comment 13 Oliver Specht 2007-11-26 07:03:19 UTC
Reassigned to ama
Comment 14 cno 2008-04-14 08:09:59 UTC
added myself as cc
Comment 15 gudmund 2008-05-20 18:44:43 UTC
Since I can't find an answer to the question I asked below, and the behaviour in
OOo 2.4 has not improved for e. g. substituting line breaks but is still broken,
I ask again:

@SBA: in 70554 you say "You can as well produce an entire specification
(better use the spec template :-) and attach it to that issue.". 

What do you mean by spec template? If there's something I am able to do that may
contribute to getting this fixed, I'll do what I can, of course.
Comment 16 gudmund 2008-05-28 15:03:20 UTC
Experimenting with 2.4 in conjunction with issue 15666 seems to give some hints
on two small improvements that may perhaps be possible for reducing the impact
of not being able to search for ^$ in conjunction with any string at all:

1. Use \r for inserting a paragraph mark/paragraph break instead of \n
2. Use \n for inserting a newline/line break

Rationale:
- OOo 2.4 can insert newlines, try a regex search and replace searching for \n
(any number) replacing it by & or $0 ($n if () are used in the search expression)
- OOo 2.4 can insert paragraph breaks already, but uses the wrong (illogical)
expression for it, \n, which is already used for newlines.

If this part of the code today is remotely like what it AFAIR was in 1.x, using
\r instead of \n for inserting a paragraph break seems to be a matter of
replacing the string \n by \r in the corresponding places of the code (and the
help files).

Let's hope a real programmer beats me to finding, downloading and messing up a
current snapshot of the code, although that part seems to be the easy part,
after looking at this:

http://specs.openoffice.org/
http://wiki.services.openoffice.org/wiki/Specification
http://specs.openoffice.org/collaterals/template/2.0/OpenOffice-org-Specification-Template.ott
http://wiki.services.openoffice.org/wiki/The_Three_Golden_Rules_for_Writing_OpenOffice.org_Specifications
(etc. etc. etc. etc....)
Comment 17 mitaly 2008-06-14 21:53:54 UTC
On my OO.org 2.4 on Win32 (Italian translation) the regex do not work correctly;
if you put in the search box
[A-Z][a-z]*
and in the replace box
&
the replaced words have a "x" appended; e.g. this:
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Maecenas tincidunt
metus vitae tortor. Sed elementum luctus diam. Etiam lorem.
becomes
Loremx ipsum dolor sit amet, consectetuer adipiscing elit. Maecenasx tincidunt
metus vitae tortor. Sedx elementum luctus diam. Etiamx lorem.
.
Note that even if you change around the expression, e.g. putting $0 instead of
&, or ([A-Z][a-z]*) and $1 the bug do not disappear. With some different regexes
(but still with "&" in the replace box) I also found that the text was replaced
with the searched string only few times, and then with an ampersand (&);
changing the & in a $0 didn't solve the problem (the words were replaced with $0).
The issue seems to be Win32-only (I reproduced it on Windows XP SP3 and Windows
2000 SP4), since on Linux (Ubuntu 8.04) these regexes work fine.
Comment 18 mitaly 2008-06-19 22:15:17 UTC
OK, it has been fixed in 2.4.1.
Comment 19 stefan.baltzer 2008-09-19 16:35:18 UTC
SBA: Put myself on c/c.
Comment 20 stefan.baltzer 2008-09-19 16:37:14 UTC
*** Issue 84828 has been marked as a duplicate of this issue. ***
Comment 21 rupxamqon 2008-12-06 19:37:07 UTC
I'd like to support this issue, specially gudmund's improvement suggestion:
"1. Use \r for inserting a paragraph mark/paragraph break instead of \n
2. Use \n for inserting a newline/line break"
That would be a good step forward.

But I'm also convinced, as sba wrote: "The many existing issues about
Regular Expressions show that this area needs a "general rework" so that regular
expressions may become a consistent, powerful and INTUITIVE tool for the user."
- They are not yet!
Comment 22 stefan.baltzer 2009-04-22 14:29:38 UTC
*** Issue 76634 has been marked as a duplicate of this issue. ***
Comment 23 stefan.baltzer 2009-12-01 11:58:59 UTC
*** Issue 69534 has been marked as a duplicate of this issue. ***