Bug 54216

Summary: replaceregexp with backlash throws StringIndexOutOfBoundsException
Product: Ant Reporter: Emmanuel Harguindey <emmanuel.harguindey>
Component: Core tasksAssignee: Ant Notifications List <notifications>
Status: NEW ---    
Severity: normal CC: jglick
Priority: P2    
Version: 1.8.3   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: stop stripping backslash in regex substitution

Description Emmanuel Harguindey 2012-11-28 08:32:22 UTC
The follwing ant code:
<replaceregexp file="FILE.EXT" match="hello" replace="\" />

When the regexp is applied against a file regardless of its name with the text matching 'hello' or whichever the string is selected as the match, this Exception will be thrown:

java.lang.StringIndexOutOfBoundsException: String index out of range: 1

It must have something to do with the backlash character escaping. The problem disappears when applying a replace string escaped as follows:

<replaceregexp file="FILE.EXT" match="hello" replace="\\\\" />

In this case, the replacement is carried out successfully.

The bug won´t manifest itself when the text inside FILE.EXT does not match the regular expression in the 'match' attribute.
Comment 1 Jesse Glick 2012-11-30 19:10:48 UTC
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#replaceAll(java.lang.String) confirms that \ is intended to be an escape sequence. Nonetheless Jdk14RegexpRegexp.substitute seems to be doing its own preprocessing of the replacement string, converting \1 to $1, and I seem to recall having problems in the past using this task to work with Windows file paths. The manual does not appear to specify what the expected syntax of the replacement string is.

At any rate, the argument → subst conversion is clearly wrong, since \ is converted to \ which is syntactically illegal, but \\ is converted to \ too! A patch (including additions to RegexpTest.java and the manual) would be great.
Comment 2 Stefan Bodewig 2014-01-02 12:20:12 UTC
One culprit here is java.util.regex.Matcher#appendReplacement which throws the exception if it encounters a replacement that ends in an unescaped backslash.

I agree \\ should not be replaced by \ and the patch I'll attach fixes that, but I'm afraid this is going to break existing builds that contain quadrupeled backslashes to work around this long standing "feature".

Maybe just documenting what happens is better than fixing the bug, not sure.
Comment 3 Stefan Bodewig 2014-01-02 12:21:41 UTC
Created attachment 31167 [details]
stop stripping backslash in regex substitution