In Apache 2 documentation for mod_rewrite a new B flag is supposed to not unescape back references, see: http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriteflags When I'm trying this on my testpage in Apache 2.2.9, Apache fails doing just that, which the above example from mod_rewrite's documentation states that it should. That is, C++ or C%2B%2B becomes in the querystring C++ and not C%2B%2B. The same with a&b or a%26b which becomes a&b and not a%26b. For some characters it works: a%23b becomes a%23b. This is essential when making rewrite rules for the MediaWiki, which should be able to have pagenames with & and other similar characters in it. Now if an & character appears, it is interpreted as a variable separator in the URL. The B flag should, according to the documentation above, take care of this problem, but it doesn't. Try it yourself on my test page, and change teststring in the URL below to whatever string you like: http://www.update.uu.se/~marten/test/abc/teststring So you know what code you are testing against, the test page contains the following files, a .htaccess file and an index.php file: .htaccess: ---------- RewriteEngine On RewriteRule ^abc/(.*)$ /~marten/test/index.php?show=$1 [B] index.php: ---------- <?php if ($_SERVER['SERVER_SOFTWARE']) { echo("_SERVER['SERVER_SOFTWARE']: " . $_SERVER['SERVER_SOFTWARE'] . "<br><br>"); } if ($_GET['show']) { echo("_GET['show']: " . $_GET['show'] . "<br>"); } if ($_SERVER['QUERY_STRING']) { echo("_SERVER['QUERY_STRING']: " . $_SERVER['QUERY_STRING'] . "<br>"); } if ($_GET['show2']) { echo("_GET['show2']: " . $_GET['show2'] . "<br>"); } $test_var = "<br>Some special characters like &, % and some escaped %2B and %26 et.c. works fine in a normal PHP string variable."; echo($test_var); ?>
Can please set the rewriteloglevel to at least 6 and provide the rewritelog for the cases where things fail to work?
Sorry, I don't have root access to perform the rewritelog directives. Maybe somebody else could try to reproduce this on her/his server, and provide the logs? I've had the same problem appear on two completely independent servers (i.e. on two different web hosting services), the only thing in common is that they run Apache 2.2.9.
Hello, here are the results from an test with the following settings: RewriteEngine on RewriteRule ^/rewrite_from/(.*) /rewrite_to/$1 [B,L] RewriteLog /var/log/apache/rewrite.log RewriteLogLevel 6 First test where everything works correctly. Requested URL is http://www.myserver.com/rewrite_from/a%23b 192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (2) init rewrite engine with requested uri /rewrite_from/a#b 192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) applying pattern '^/rewrite_from/(.*)' to uri '/rewrite_from/a#b' 192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (5) escaping backreference 'a#b' to 'a%23b' 192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (2) rewrite '/rewrite_from/a#b' -> '/rewrite_to/a%23b' 192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (2) local path result: /rewrite_to/a%23b 192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (2) prefixed with document_root to /mnt/www/html/rewrite_to/a%23b 192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (1) go-ahead with /mnt/www/html/rewrite_to/a%23b [OK] 192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] add path info postfix: /mnt/www/html/rewrite_to -> /mnt/www/html/rewrite_to/a%23b 192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] strip per-dir prefix: /mnt/www/html/rewrite_to/a%23b -> rewrite_to/a%23b 192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] applying pattern '^(.*)$' to uri 'rewrite_to/a%23b' 192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (4) [perdir /mnt/www/html/] RewriteCond: input='' pattern='mosConfig_[a-zA-Z_]{1,21}(=|\%3D)' => not-matched 192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (4) [perdir /mnt/www/html/] RewriteCond: input='' pattern='base64_encode.*\(.*\)' => not-matched 192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (4) [perdir /mnt/www/html/] RewriteCond: input='' pattern='(\<|%3C).*script.*(\>|%3E)' [NC] => not-matched 192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (4) [perdir /mnt/www/html/] RewriteCond: input='' pattern='GLOBALS(=|\[|\%[0-9A-Z]{0,2})' => not-matched 192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (4) [perdir /mnt/www/html/] RewriteCond: input='' pattern='_REQUEST(=|\[|\%[0-9A-Z]{0,2})' => not-matched 192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] add path info postfix: /mnt/www/html/rewrite_to -> /mnt/www/html/rewrite_to/a%23b 192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] strip per-dir prefix: /mnt/www/html/rewrite_to/a%23b -> rewrite_to/a%23b 192.168.2.203 - - [16/Aug/2008:00:48:17 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] applying pattern '(.*)' to uri 'rewrite_to/a%23b' Second test where everything goes wrong. Requested URL is http://www.myserver.com/rewrite_from/a%26b 192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (2) init rewrite engine with requested uri /rewrite_from/a&b 192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) applying pattern '^/rewrite_from/(.*)' to uri '/rewrite_from/a&b' 192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (5) escaping backreference 'a&b' to 'a&b' 192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (2) rewrite '/rewrite_from/a&b' -> '/rewrite_to/a&b' 192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (2) local path result: /rewrite_to/a&b 192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (2) prefixed with document_root to /mnt/www/html/rewrite_to/a&b 192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (1) go-ahead with /mnt/www/html/rewrite_to/a&b [OK] 192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] add path info postfix: /mnt/www/html/rewrite_to -> /mnt/www/html/rewrite_to/a&b 192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] strip per-dir prefix: /mnt/www/html/rewrite_to/a&b -> rewrite_to/a&b 192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] applying pattern '^(.*)$' to uri 'rewrite_to/a&b' 192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (4) [perdir /mnt/www/html/] RewriteCond: input='' pattern='mosConfig_[a-zA-Z_]{1,21}(=|\%3D)' => not-matched 192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (4) [perdir /mnt/www/html/] RewriteCond: input='' pattern='base64_encode.*\(.*\)' => not-matched 192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (4) [perdir /mnt/www/html/] RewriteCond: input='' pattern='(\<|%3C).*script.*(\>|%3E)' [NC] => not-matched 192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (4) [perdir /mnt/www/html/] RewriteCond: input='' pattern='GLOBALS(=|\[|\%[0-9A-Z]{0,2})' => not-matched 192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (4) [perdir /mnt/www/html/] RewriteCond: input='' pattern='_REQUEST(=|\[|\%[0-9A-Z]{0,2})' => not-matched 192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] add path info postfix: /mnt/www/html/rewrite_to -> /mnt/www/html/rewrite_to/a&b 192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] strip per-dir prefix: /mnt/www/html/rewrite_to/a&b -> rewrite_to/a&b 192.168.2.203 - - [16/Aug/2008:00:49:26 +0200] [www.myserver.com/sid#80aafa0][rid#828f000/initial] (3) [perdir /mnt/www/html/] applying pattern '(.*)' to uri 'rewrite_to/a&b' I hope I got the correct lines from the debug output as I have further rewrite rules in my .htaccess files. Best regards. Markus
Thank you Markus for the log! Seems that my bug report is confirmed then. Any comment from any Apache developer - it is really a bug, isn't it? Can it be easily fixed within the next release?
I seem to be having this problem too: 10.0.0.128 - - [17/Sep/2008:21:36:48 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (2) init rewrite engine with requested uri /Scripting/XcodeUserScripts.git/web/Menu/Code/C++/ 10.0.0.128 - - [17/Sep/2008:21:36:48 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (3) applying pattern '^(.*\.git)\/web\/(.*)$' to uri '/Scripting/XcodeUserScripts.git/web/Menu/Code/C++/' 10.0.0.128 - - [17/Sep/2008:21:36:48 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (5) escaping backreference '/Scripting/XcodeUserScripts.git' to '%2fScripting%2fXcodeUserScripts.git' 10.0.0.128 - - [17/Sep/2008:21:36:48 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (5) escaping backreference 'Menu/Code/C++/' to 'Menu%2fCode%2fC++%2f' 10.0.0.128 - - [17/Sep/2008:21:36:48 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (2) rewrite '/Scripting/XcodeUserScripts.git/web/Menu/Code/C++/' -> '/git.rhtml?r=%2fScripting%2fXcodeUserScripts.git&p=Menu%2fCode%2fC++%2f' 10.0.0.128 - - [17/Sep/2008:21:36:48 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (3) split uri=/git.rhtml?r=%2fScripting%2fXcodeUserScripts.git&p=Menu%2fCode%2fC++%2f -> uri=/git.rhtml, args=r=%2fScripting%2fXcodeUserScripts.git&p=Menu%2fCode%2fC++%2f 10.0.0.128 - - [17/Sep/2008:21:36:48 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (2) local path result: /git.rhtml 10.0.0.128 - - [17/Sep/2008:21:36:48 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (2) prefixed with document_root to /srv/git/www/git.rhtml 10.0.0.128 - - [17/Sep/2008:21:36:48 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (1) go-ahead with /srv/git/www/git.rhtml [OK] 10.0.0.128 - - [17/Sep/2008:21:36:49 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (2) init rewrite engine with requested uri /favicon.ico 10.0.0.128 - - [17/Sep/2008:21:36:49 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (3) applying pattern '^(.*\.git)\/web\/(.*)$' to uri '/favicon.ico' 10.0.0.128 - - [17/Sep/2008:21:36:49 +1200] [git.oriontransfer.org/sid#824e4f0][rid#8474120/initial] (1) pass through /favicon.ico
I've solved this for my very specific case: RewriteEngine on RewriteCond %{THE_REQUEST} ^.*?\/(.*\.git)\/web\/(.*)\s+HTTP.*$ RewriteRule ^.*$ /git.rhtml?r=%1&p=%2 [L] What I expected to be able to write is: RewriteEngine on RewriteRule ^(.*\.git)\/web\/(.*)$ /git.rhtml?r=$1&p=$2 [B] But this doesn't seem to work. I've noticed that when requesting http://git.oriontransfer.org/Scripting/XcodeUserScripts.git/web/Menu/Code/C%2B%2B/20-Unique+Header.rb It is logged as: 10.0.0.128 - - [17/Sep/2008:22:22:47 +1200] [git.oriontransfer.org/sid#b818eb00][rid#b83cf178/initial] (2) init rewrite engine with requested uri /Scripting/XcodeUserScripts.git/web/Menu/Code/C++/10-CPP+Class+Members.rb "C++" is not escaped as "C%2B%2B" as it is passed from the original query. I'm not sure how Apache is handling this internally, but I would have expected to see: init rewrite engine with requested uri /Scripting/XcodeUserScripts.git/web/Menu/Code/C%2B%2B/10-CPP+Class+Members.rb i.e. something before mod_rewrite is un-escaping the URL before it gets to mod-rewrite. This might be why the [B] flag appears to have no effect. Kind regards, Samul
Hi, This bug got me scratched my head heavily and, at the end, I must say I was (almost) pleased to see it was a bug :) I can confirm that the 'plus' sign gets unescaped too on Apache 2.2.8 on Ubuntu Hardy. Here's the test code I used to reproduce the bug: thibs@taj:~/public_html$ cat .htaccess RewriteEngine On RewriteBase / RewriteRule ^action/(.*)$ index.php?show=$1 [B] thibs@taj:~/public_html$ cat index.php <?php echo("request uri: ".$_SERVER['REQUEST_URI']."<br/>"); echo("query string: ".$_SERVER['QUERY_STRING']."<br/>"); echo("'show' param value: ".$_GET['show']); ?> Now accessing http://www.myserver.com/action/test%2Ba gives: request uri: /action/test%2Ba query string: show=test+a 'show' param value: test a My immediate (very) quick and dirty work around was to manually escape and unescape the plus sign by replacing it with a '_plusign_' string. If you need more logs or info about this bug, let me know... Regards, Thibauld
This is caused by a "regression" in trunk (r589343). Maybe bug 34602 should be reopened, because the committed patch there (r573831) works. The problem with r589343 seems to be that ap_escape_path_segment() is used now.
Bob, I'm sitting here with a wretched cold and can't think much. Can you identify what difference(s) in ap_escape_path_segment vs the r573831 code are causing the problem? Is the test in ap_escape_path_segment_buffer wrong for this, or does something rely on ' ' --> '+' escaping?
(In reply to comment #9) > or does something rely on ' ' --> '+' escaping? No, it's good that a space should be encoded into %20 since a substitution /$1?query=$2 and flags [B,NE,R] should escape a space in $1 (i.e. URL-path segment) into a %20. ap_escape_path_segment checks T_ESCAPE_PATH_SEGMENT as far as I understand. I found in gen_test_char.c if (!apr_isalnum(c) && !strchr("$-_.+!*'(),:@&=~", c)) { flags |= T_ESCAPE_PATH_SEGMENT; } which reads to me if the char is not alpha-numeric and not $-_.+!*'(),:@&=~ flag it as T_ESCAPE_PATH_SEGMENT. But the problem is, that we don't want to exclude those special characters here. The B flag works pretty well if you request e.g. %22%23 (-> "# --> %22%23). > I'm sitting here with a wretched cold I hope you will get well soon!
OK, I've verified that reverting r589343 fixes "B" flag breakage, and reverted it in trunk. Thanks for the analysis.
Just one question beyond (may be better at dev@?): How should we handle a slash? Currently (AFAICR), the B flag escapes '/' into '%2f', too. That could be a "problem" in per-dir context, if the filepath is being changed by a backreference containing a slash, too. The escaping of the path itself wouldn't be a problem due to the internal redirect processing,* except the encoded slash and may be the space, which should be changed into %20 again, I think. Or should we consider the B flag to be unsupported, if s/o uses a backreference in the path and queryString simultaneously? (* in per-server context unescape map for the path)
Fix backported to 2.2 in r732578
Thanks for all the work! However, is it supposed to work now in e.g. 2.2.11? I tried my testpage above (see first post above) and it still doesn't work, even though my server now has Apache 2.2.11.
2.2.11 was released before this fix was backported to the 2.2 code stream. Look for the fix in 2.2.12 when that comes out.