Bug 45529

Summary: B flag in mod_rewrite RewriteRule doesn't escape & and other sim characters
Product: Apache httpd-2 Reporter: Mårten Berglund <marten_berglund>
Component: mod_rewriteAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: RESOLVED FIXED    
Severity: normal CC: tfavre
Priority: P2 Keywords: FixedInTrunk
Version: 2.2.9   
Target Milestone: ---   
Hardware: All   
OS: All   
URL: http://www.update.uu.se/~marten/test/abc/teststring

Description Mårten Berglund 2008-08-03 08:59:54 UTC
In Apache 2 documentation for mod_rewrite a new B flag is supposed to not unescape back references, see:

http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriteflags

When I'm trying this on my testpage in Apache 2.2.9, Apache fails doing just that, which the above example from mod_rewrite's documentation states that it should. That is, C++ or C%2B%2B becomes in the querystring C++ and not C%2B%2B. The same with a&b or a%26b which becomes a&b and not a%26b. For some characters it works: a%23b becomes a%23b.

This is essential when making rewrite rules for the MediaWiki, which should be able to have pagenames with & and other similar characters in it. Now if an & character appears, it is interpreted as a variable separator in the URL. The B flag should, according to the documentation above, take care of this problem, but it doesn't.

Try it yourself on my test page, and change teststring in the URL below to whatever string you like:

http://www.update.uu.se/~marten/test/abc/teststring

So you know what code you are testing against, the test page contains the following files, a .htaccess file and an index.php file:

.htaccess:
----------
RewriteEngine On
RewriteRule ^abc/(.*)$ /~marten/test/index.php?show=$1 [B]

index.php:
----------
<?php
if ($_SERVER['SERVER_SOFTWARE'])
{
echo("_SERVER['SERVER_SOFTWARE']: " . $_SERVER['SERVER_SOFTWARE'] . "<br><br>");
}

if ($_GET['show'])
{
echo("_GET['show']: " . $_GET['show'] . "<br>");
}

if ($_SERVER['QUERY_STRING'])
{
echo("_SERVER['QUERY_STRING']: " . $_SERVER['QUERY_STRING'] . "<br>");
}

if ($_GET['show2'])
{
echo("_GET['show2']: " . $_GET['show2'] . "<br>");
}

$test_var = "<br>Some special characters like &, % and some escaped %2B and %26 et.c. works fine in a normal PHP string variable.";
echo($test_var);
?>
Comment 1 Ruediger Pluem 2008-08-08 07:51:29 UTC
Can please set the rewriteloglevel to at least 6 and provide the rewritelog for the cases where things fail to work?
Comment 2 Mårten Berglund 2008-08-12 03:50:27 UTC
Sorry, I don't have root access to perform the rewritelog directives. Maybe somebody else could try to reproduce this on her/his server, and provide the logs? 

I've had the same problem appear on two completely independent servers (i.e. on two different web hosting services), the only thing in common is that they run Apache 2.2.9.
Comment 3 Markus Stockhausen 2008-08-15 15:59:06 UTC
Hello,

here are the results from an test with the following settings:

RewriteEngine on
RewriteRule ^/rewrite_from/(.*) /rewrite_to/$1 [B,L]
RewriteLog /var/log/apache/rewrite.log
RewriteLogLevel 6

First test where everything works correctly. Requested URL is http://www.myserver.com/rewrite_from/a%23b

192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (2) init rewrite engine with requested uri /rewrite_from/a#b
192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) applying pattern '^/rewrite_from/(.*)' to uri '/rewrite_from/a#b'
192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (5) escaping backreference 'a#b' to 'a%23b'
192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (2) rewrite '/rewrite_from/a#b' -> '/rewrite_to/a%23b'
192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (2) local path result: /rewrite_to/a%23b
192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (2) prefixed with document_root to /mnt/www/html/rewrite_to/a%23b
192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (1) go-ahead with /mnt/www/html/rewrite_to/a%23b [OK]
192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] add path info postfix: /mnt/www/html/rewrite_to -> /mnt/www/html/rewrite_to/a%23b
192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] strip per-dir prefix: /mnt/www/html/rewrite_to/a%23b -> rewrite_to/a%23b
192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] applying pattern '^(.*)$' to uri 'rewrite_to/a%23b'
192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (4) [perdir /mnt/www/html/] RewriteCond: input='' pattern='mosConfig_[a-zA-Z_]{1,21}(=|\%3D)' => not-matched
192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (4) [perdir /mnt/www/html/] RewriteCond: input='' pattern='base64_encode.*\(.*\)' => not-matched
192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (4) [perdir /mnt/www/html/] RewriteCond: input='' pattern='(\<|%3C).*script.*(\>|%3E)' [NC] => not-matched
192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (4) [perdir /mnt/www/html/] RewriteCond: input='' pattern='GLOBALS(=|\[|\%[0-9A-Z]{0,2})' => not-matched
192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (4) [perdir /mnt/www/html/] RewriteCond: input='' pattern='_REQUEST(=|\[|\%[0-9A-Z]{0,2})' => not-matched
192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] add path info postfix: /mnt/www/html/rewrite_to -> /mnt/www/html/rewrite_to/a%23b
192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] strip per-dir prefix: /mnt/www/html/rewrite_to/a%23b -> rewrite_to/a%23b
192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] applying pattern '(.*)' to uri 'rewrite_to/a%23b'

Second test where everything goes wrong. Requested URL is http://www.myserver.com/rewrite_from/a%26b

192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (2) init rewrite engine with requested uri /rewrite_from/a&b
192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) applying pattern '^/rewrite_from/(.*)' to uri '/rewrite_from/a&b'
192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (5) escaping backreference 'a&b' to 'a&b'
192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (2) rewrite '/rewrite_from/a&b' -> '/rewrite_to/a&b'
192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (2) local path result: /rewrite_to/a&b
192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (2) prefixed with document_root to /mnt/www/html/rewrite_to/a&b
192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (1) go-ahead with /mnt/www/html/rewrite_to/a&b [OK]
192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] add path info postfix: /mnt/www/html/rewrite_to -> /mnt/www/html/rewrite_to/a&b
192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] strip per-dir prefix: /mnt/www/html/rewrite_to/a&b -> rewrite_to/a&b
192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] applying pattern '^(.*)$' to uri 'rewrite_to/a&b'
192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (4) [perdir /mnt/www/html/] RewriteCond: input='' pattern='mosConfig_[a-zA-Z_]{1,21}(=|\%3D)' => not-matched
192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (4) [perdir /mnt/www/html/] RewriteCond: input='' pattern='base64_encode.*\(.*\)' => not-matched
192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (4) [perdir /mnt/www/html/] RewriteCond: input='' pattern='(\<|%3C).*script.*(\>|%3E)' [NC] => not-matched
192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (4) [perdir /mnt/www/html/] RewriteCond: input='' pattern='GLOBALS(=|\[|\%[0-9A-Z]{0,2})' => not-matched
192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (4) [perdir /mnt/www/html/] RewriteCond: input='' pattern='_REQUEST(=|\[|\%[0-9A-Z]{0,2})' => not-matched
192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] add path info postfix: /mnt/www/html/rewrite_to -> /mnt/www/html/rewrite_to/a&b
192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] strip per-dir prefix: /mnt/www/html/rewrite_to/a&b -> rewrite_to/a&b
192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] applying pattern '(.*)' to uri 'rewrite_to/a&b'

I hope I got the correct lines from the debug output as I have further rewrite rules in my .htaccess files.

Best regards.

Markus
Comment 4 Mårten Berglund 2008-08-20 13:43:25 UTC
Thank you Markus for the log! Seems that my bug report is confirmed then. Any comment from any Apache developer - it is really a bug, isn't it? Can it be easily fixed within the next release?
Comment 5 Samuel Williams 2008-09-17 02:42:53 UTC
I seem to be having this problem too:

10.0.0.128 - - [17/Sep/2008:21:36:48 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (2) init rewrite engine with requested uri /Scripting/XcodeUserScripts.git/web/Menu/Code/C++/
10.0.0.128 - - [17/Sep/2008:21:36:48 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (3) applying pattern '^(.*\.git)\/web\/(.*)$' to uri '/Scripting/XcodeUserScripts.git/web/Menu/Code/C++/'
10.0.0.128 - - [17/Sep/2008:21:36:48 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (5) escaping backreference '/Scripting/XcodeUserScripts.git' to '%2fScripting%2fXcodeUserScripts.git'
10.0.0.128 - - [17/Sep/2008:21:36:48 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (5) escaping backreference 'Menu/Code/C++/' to 'Menu%2fCode%2fC++%2f'
10.0.0.128 - - [17/Sep/2008:21:36:48 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (2) rewrite '/Scripting/XcodeUserScripts.git/web/Menu/Code/C++/' -> '/git.rhtml?r=%2fScripting%2fXcodeUserScripts.git&p=Menu%2fCode%2fC++%2f'
10.0.0.128 - - [17/Sep/2008:21:36:48 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (3) split uri=/git.rhtml?r=%2fScripting%2fXcodeUserScripts.git&p=Menu%2fCode%2fC++%2f -> uri=/git.rhtml, args=r=%2fScripting%2fXcodeUserScripts.git&p=Menu%2fCode%2fC++%2f
10.0.0.128 - - [17/Sep/2008:21:36:48 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (2) local path result: /git.rhtml
10.0.0.128 - - [17/Sep/2008:21:36:48 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (2) prefixed with document_root to /srv/git/www/git.rhtml
10.0.0.128 - - [17/Sep/2008:21:36:48 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (1) go-ahead with /srv/git/www/git.rhtml [OK]
10.0.0.128 - - [17/Sep/2008:21:36:49 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (2) init rewrite engine with requested uri /favicon.ico
10.0.0.128 - - [17/Sep/2008:21:36:49 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (3) applying pattern '^(.*\.git)\/web\/(.*)$' to uri '/favicon.ico'
10.0.0.128 - - [17/Sep/2008:21:36:49 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (1) pass through /favicon.ico




Comment 6 Samuel Williams 2008-09-17 03:29:33 UTC
I've solved this for my very specific case:

	RewriteEngine on
	RewriteCond %{THE_REQUEST} ^.*?\/(.*\.git)\/web\/(.*)\s+HTTP.*$
	RewriteRule ^.*$ /git.rhtml?r=%1&p=%2 [L]

What I expected to be able to write is:

	RewriteEngine on
	RewriteRule ^(.*\.git)\/web\/(.*)$ /git.rhtml?r=$1&p=$2 [B]

But this doesn't seem to work. I've noticed that when requesting 

http://git.oriontransfer.org/Scripting/XcodeUserScripts.git/web/Menu/Code/C%2B%2B/20-Unique+Header.rb

It is logged as:

10.0.0.128 - - [17/Sep/2008:22:22:47 +1200] [git.oriontransfer.org/sid#b818eb00][rid#b83cf178/initial] (2) init rewrite engine with requested uri /Scripting/XcodeUserScripts.git/web/Menu/Code/C++/10-CPP+Class+Members.rb


"C++" is not escaped as "C%2B%2B" as it is passed from the original query. I'm not sure how Apache is handling this internally, but I would have expected to see:

init rewrite engine with requested uri /Scripting/XcodeUserScripts.git/web/Menu/Code/C%2B%2B/10-CPP+Class+Members.rb

i.e. something before mod_rewrite is un-escaping the URL before it gets to mod-rewrite. This might be why the [B] flag appears to have no effect.

Kind regards,
Samul
Comment 7 Thibauld Favre 2008-10-29 15:08:46 UTC
Hi,

This bug got me scratched my head heavily and, at the end, I must say I was (almost) pleased to see it was a bug :) I can confirm that the 'plus' sign gets unescaped too on Apache 2.2.8 on Ubuntu Hardy. Here's the test code I used to reproduce the bug:

thibs@taj:~/public_html$ cat .htaccess 
RewriteEngine On
RewriteBase /
RewriteRule ^action/(.*)$ index.php?show=$1 [B]

thibs@taj:~/public_html$ cat index.php 
<?php
echo("request uri: ".$_SERVER['REQUEST_URI']."<br/>");
echo("query string: ".$_SERVER['QUERY_STRING']."<br/>");
echo("'show' param value: ".$_GET['show']);
?>

Now accessing http://www.myserver.com/action/test%2Ba gives:
request uri: /action/test%2Ba
query string: show=test+a
'show' param value: test a

My immediate (very) quick and dirty work around was to manually escape and unescape the plus sign by replacing it with a '_plusign_' string. If you need more logs or info about this bug, let me know...
Regards,

Thibauld
Comment 8 Bob Ionescu 2008-12-19 14:54:55 UTC
This is caused by a "regression" in trunk (r589343). Maybe bug 34602 should be reopened, because the committed patch there (r573831) works.

The problem with r589343 seems to be that ap_escape_path_segment() is used now.
Comment 9 Nick Kew 2008-12-20 15:40:39 UTC
Bob, I'm sitting here with a wretched cold and can't think much.  Can you identify what difference(s) in ap_escape_path_segment vs the r573831 code are causing the problem?  Is the test in ap_escape_path_segment_buffer wrong for this, or does something rely on ' ' --> '+' escaping?
Comment 10 Bob Ionescu 2008-12-20 18:16:58 UTC
(In reply to comment #9)
> or does something rely on ' ' --> '+' escaping?

No, it's good that a space should be encoded into %20 since a substitution /$1?query=$2 and flags [B,NE,R] should escape a space in $1 (i.e. URL-path segment) into a %20.

ap_escape_path_segment checks T_ESCAPE_PATH_SEGMENT as far as I understand.

I found in gen_test_char.c

        if (!apr_isalnum(c) && !strchr("$-_.+!*'(),:@&=~", c)) {
            flags |= T_ESCAPE_PATH_SEGMENT;
        }

which reads to me if the char is not alpha-numeric and not $-_.+!*'(),:@&=~ flag it as T_ESCAPE_PATH_SEGMENT. But the problem is, that we don't want to exclude those special characters here. The B flag works pretty well if you request e.g. %22%23 (-> "# --> %22%23).

> I'm sitting here with a wretched cold

I hope you will get well soon!
Comment 11 Nick Kew 2008-12-30 16:09:33 UTC
OK, I've verified that reverting r589343 fixes "B" flag breakage, and reverted it in trunk.  Thanks for the analysis.
Comment 12 Bob Ionescu 2008-12-30 16:40:39 UTC
Just one question beyond (may be better at dev@?): How should we handle a slash? Currently (AFAICR), the B flag escapes '/' into '%2f', too. That could be a "problem" in per-dir context, if the filepath is being changed by a backreference containing a slash, too. The escaping of the path itself wouldn't be a problem due to the internal redirect processing,* except the encoded slash and may be the space, which should be changed into %20 again, I think.

Or should we consider the B flag to be unsupported, if s/o uses a backreference in the path and queryString simultaneously?

(* in per-server context unescape map for the path)
Comment 13 Nick Kew 2009-01-07 17:25:56 UTC
Fix backported to 2.2 in r732578
Comment 14 Mårten Berglund 2009-05-14 04:35:47 UTC
Thanks for all the work! However, is it supposed to work now in e.g. 2.2.11? I tried my testpage above (see first post above) and it still doesn't work, even though my server now has Apache 2.2.11.
Comment 15 Dan Poirier 2009-05-14 05:34:01 UTC
2.2.11 was released before this fix was backported to the 2.2 code stream.  Look for the fix in 2.2.12 when that comes out.