Bug 24234 - Doesn't seem to be a way to use '\' in subst with back_references
Summary: Doesn't seem to be a way to use '\' in subst with back_references
Status: NEW
Alias: None
Product: Regexp
Classification: Unclassified
Component: Other (show other bugs)
Version: unspecified
Hardware: PC All
: P3 normal (vote)
Target Milestone: ---
Assignee: Jakarta Notifications Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-10-29 23:23 UTC by Christopher Ng
Modified: 2005-03-20 17:06 UTC (History)
0 users



Attachments
Patch to allow escaping of '\' character in substitution string (4.37 KB, patch)
2003-10-30 00:49 UTC, Christopher Ng
Details | Diff
new suggested fix (7.73 KB, patch)
2004-02-29 06:28 UTC, Oleg Sukhodolsky
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Christopher Ng 2003-10-29 23:23:33 UTC
specifically, trying to substitute "\$0" for "[.?]" doesn't seem to work - it is
not possible to escape the '\' because of the logic in subst, so it always
treats the $ as a literal (ie you get "$0" instead of eg "\.").

also, is there any way to mark a substring of the regexp as literal (ie don't
interpret metachars)?
Comment 1 Christopher Ng 2003-10-30 00:49:13 UTC
Created attachment 8813 [details]
Patch to allow escaping of '\' character in substitution string
Comment 2 Christopher Ng 2003-10-30 00:51:48 UTC
the above patch also has a fix for Bug 22928 (cutting the first two characters).
 i think it is slightly neater than the current fix in RE.java 1.14.
Comment 3 Vadim Gritsenko 2003-11-18 13:53:47 UTC
Christopher,

Please use 4 space indent instead of tabs (if you noticed, the rest of the code
does this).

Vadim
Comment 4 Oleg Sukhodolsky 2004-02-29 06:27:57 UTC
Previous suggested fix has some problems with handling escaped \,
e.g. it's impossible to write string so it produces one \ followed by some 
backreferenced part (i.e. "\\\$0" wiil produce "\\<some_text>", not "\<some text>"
Also the patch introduces imcompatibility.
To fix these problems I add new REPLACE_WITH_ESCAPES constant and rewrite
subst() so it consistently handles escaped characters.
Comment 5 Oleg Sukhodolsky 2004-02-29 06:28:56 UTC
Created attachment 10615 [details]
new suggested fix
Comment 6 Christopher Ng 2004-03-19 06:24:41 UTC
i can't reproduce the behaviour you describe with the patch that i wrote.

at least when using string literals inside the code, "\\\$0" is not even a valid
literal.  "\\" is a valid escape sequence, but "\$" is not.  if i wanted to
include a "\" before whatever $0 is, u need to use "\\\\$0" which results in
"\<some_text>".

can you show me an actual code snippet which causes the error u describe?  also,
what incompatibilities are you talking about?
Comment 7 Oleg Sukhodolsky 2004-03-19 17:52:07 UTC
try this:
        r = new RE("[.?]");
        actual = r.subst(".", "\\\\$0",
                         RE.REPLACE_BACKREFERENCES);
        System.err.println(actual);
        assertEquals("Wrong subst() result", "\\.", actual);

        actual = r.subst(".", "\\\\\\$0",
                         RE.REPLACE_BACKREFERENCES);
        System.err.println(actual);
        assertEquals("Wrong subst() result", "\\$0", actual);

Also your changes introduce incompatibility.
Comment 8 Christopher Ng 2004-03-22 00:36:37 UTC
ah, i see what u mean.  bummer.  by incompatabilities, do you mean trying to
subst in "\\" results in an empty string :)?  d'oh.

nice fix, very 'solid', no fudging :).  only thing i would suggest (and perhaps
there was a reason taht escapes me) is using ArrayLists rather than Vectors to
avoid the overhead of synchronization (which seems unnecessary here).

Comment 9 Oleg Sukhodolsky 2004-03-22 04:06:45 UTC
Incompatibility is that now it's possible to escape '\', but before it's not.

I didn't use ArrayList because I'm not sure what is target version of jdk for
jakarta-regexp (ArrayList was introduced in 1.2)
Comment 10 Christopher Ng 2004-03-22 04:44:59 UTC
oic.  thanks for the clarification.
Comment 11 Vadim Gritsenko 2004-05-27 12:03:44 UTC
I think that patch introduces more confusion than it solves. \$ without
REPLACE_WITH_ESCAPES is still escaped - but logic suggests otherwise.
Additionally, behavior of escaping $ with \ is not documented in Javadoc and not
reflected in the unit test.

Because this behavior is not documented, and it was introduced recently (in
previous release), I suggest to change it (and document in javadoc / unit test).
I suggest following syntax:

When REPLACE_BACKREFERENCES is on:
  Process all $ as backreferences. No escaping performed at all.

When REPLACE_BACKREFERENCES and REPLACE_WITH_ESCAPES both are on:
  Process all $ as backreferences.
  Process \ as escape symbol.

So, what do you think?

Vadim