A resource with a encoded question mark (%3F) will be split up, by mod_rewrite, into name and args. Expected result would be for mod_rewrite to leave the name intact. # From rewrite.log (2) init rewrite engine with requested uri /trigrewrite_filename?args (3) applying pattern '^/trigrewrite_(.*)' to uri '/trigrewrite_filename?args' (2) rewrite '/trigrewrite_filename?args' -> '/filename?args' (3) split uri=/filename?args -> uri=/filename, args=args (2) local path result: /filename # Vhost conf Listen 9004 <VirtualHost *:9004> Servername rewritebug DocumentRoot /home/apache/www/rewrite # AllowEncodedSlashes On RewriteEngine On RewriteLog "logs/rewrite.log" RewriteLogLevel 9 RewriteRule ^/trigrewrite_(.*) /$1 [L] <Directory "/home/apache/www/rewrite"> Options None Order allow,deny Allow from all </Directory> </VirtualHost> # From access_log "GET /trigrewrite_filename%3Fargs HTTP/1.1" 404 223 # index.html test rewrite <script type="text/javascript"> var uri = '/trigrewrite_filename%3Fargs'; document.write('<iframe src="'+uri +'"></iframe>'); </script>
rewrite flags doc sure thinks a literal ? in a backreference will be escaped by default, and even uses it in the example! "NE|noescape By default, special characters, such as & and ?, for example, will be converted to their hexcode equivalent. Using the [NE] flag prevents that from happening."
I think I have just stumbled upon the same problem. I have a file with a '?' in its name that I want to acccess via rewrite rules, because it used to be in /old, but is now in /new. I can work around this in this simple case by just using an equivalent Alias statement. The RewriteRule is RewriteRule ^\/\/?old\/(.*)$ /tmp/new/$1 [T=text/plain,L,NE] Just checking that the rule works with a "normal" file "foo" with contents "foo foo": curl http://172.28.51.128:11003/old/foo 172.28.51.128 - - [29/Mar/2012:15:36:14 +0200] [172.28.51.128/sid#8611b58][rid#89faec8/initial] (2) init rewrite engine with requested uri /old/foo 172.28.51.128 - - [29/Mar/2012:15:36:14 +0200] [172.28.51.128/sid#8611b58][rid#89faec8/initial] (3) applying pattern '^\/\/?old\/(.*)$' to uri '/old/foo' 172.28.51.128 - - [29/Mar/2012:15:36:14 +0200] [172.28.51.128/sid#8611b58][rid#89faec8/initial] (2) rewrite '/old/foo' -> '/tmp/new/foo' 172.28.51.128 - - [29/Mar/2012:15:36:14 +0200] [172.28.51.128/sid#8611b58][rid#89faec8/initial] (2) remember /tmp/new/foo to have MIME-type 'text/plain' 172.28.51.128 - - [29/Mar/2012:15:36:14 +0200] [172.28.51.128/sid#8611b58][rid#89faec8/initial] (2) local path result: /tmp/new/foo 172.28.51.128 - - [29/Mar/2012:15:36:14 +0200] [172.28.51.128/sid#8611b58][rid#89faec8/initial] (1) go-ahead with /tmp/new/foo [OK] 172.28.51.128 - - [29/Mar/2012:15:36:14 +0200] [172.28.51.128/sid#8611b58][rid#89faec8/initial] (1) force filename /tmp/new/foo to have MIME-type 'text/plain' foo foo The file named "bar?bar" (results with or without QSA or NE are identical): curl http://172.28.51.128:11003/old/bar%3Fbar 172.28.51.128 - - [29/Mar/2012:15:36:23 +0200] [172.28.51.128/sid#8611b58][rid#89fced0/initial] (2) init rewrite engine with requested uri /old/bar?bar 172.28.51.128 - - [29/Mar/2012:15:36:23 +0200] [172.28.51.128/sid#8611b58][rid#89fced0/initial] (3) applying pattern '^\/\/?old\/(.*)$' to uri '/old/bar?bar' 172.28.51.128 - - [29/Mar/2012:15:36:23 +0200] [172.28.51.128/sid#8611b58][rid#89fced0/initial] (2) rewrite '/old/bar?bar' -> '/tmp/new/bar?bar' 172.28.51.128 - - [29/Mar/2012:15:36:23 +0200] [172.28.51.128/sid#8611b58][rid#89fced0/initial] (3) split uri=/tmp/new/bar?bar -> uri=/tmp/new/bar, args=bar 172.28.51.128 - - [29/Mar/2012:15:36:23 +0200] [172.28.51.128/sid#8611b58][rid#89fced0/initial] (2) remember /tmp/new/bar to have MIME-type 'text/plain' 172.28.51.128 - - [29/Mar/2012:15:36:23 +0200] [172.28.51.128/sid#8611b58][rid#89fced0/initial] (2) local path result: /tmp/new/bar 172.28.51.128 - - [29/Mar/2012:15:36:23 +0200] [172.28.51.128/sid#8611b58][rid#89fced0/initial] (1) go-ahead with /tmp/new/bar [OK] 172.28.51.128 - - [29/Mar/2012:15:36:23 +0200] [172.28.51.128/sid#8611b58][rid#89fced0/initial] (1) force filename /tmp/new/bar to have MIME-type 'text/plain' <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>404 Not Found</title> </head><body> <h1>Not Found</h1> <p>The requested URL /old/bar?bar was not found on this server.</p> </body></html> A file named "baz?baz" in directory /new2 in the document root works as expected: curl http://172.28.51.128:11003/new2/baz%3Fbaz 172.28.51.128 - - [29/Mar/2012:15:37:23 +0200] [172.28.51.128/sid#8611b58][rid#8a04ef0/initial] (2) init rewrite engine with requested uri /new2/baz?baz 172.28.51.128 - - [29/Mar/2012:15:37:23 +0200] [172.28.51.128/sid#8611b58][rid#8a04ef0/initial] (3) applying pattern '^\/\/?old\/(.*)$' to uri '/new2/baz?baz' 172.28.51.128 - - [29/Mar/2012:15:37:23 +0200] [172.28.51.128/sid#8611b58][rid#8a04ef0/initial] (3) applying pattern '.' to uri '/new2/baz?baz' 172.28.51.128 - - [29/Mar/2012:15:37:23 +0200] [172.28.51.128/sid#8611b58][rid#8a04ef0/initial] (4) RewriteCond: input='curl/7.24.0 (i686-pc-linux-gnu) libcurl/7.24.0 OpenSSL/0.9.8u zlib/1.2.5' pattern='Apache \(internal dummy connection\)' => not-matched 172.28.51.128 - - [29/Mar/2012:15:37:23 +0200] [172.28.51.128/sid#8611b58][rid#8a04ef0/initial] (1) pass through /new2/baz?baz baz baz Adding [B] to the flags doesn't have the desired results either - it's actually worse, because '/' would get encoded to %2F, making directories inaccessible: curl http://172.28.51.128:11003/old/bar%3Fbar 172.28.51.128 - - [29/Mar/2012:15:39:39 +0200] [172.28.51.128/sid#847f6b0][rid#8a04eb0/initial] (2) init rewrite engine with requested uri /old/bar?bar 172.28.51.128 - - [29/Mar/2012:15:39:39 +0200] [172.28.51.128/sid#847f6b0][rid#8a04eb0/initial] (3) applying pattern '^\/\/?old\/(.*)$' to uri '/old/bar?bar' 172.28.51.128 - - [29/Mar/2012:15:39:39 +0200] [172.28.51.128/sid#847f6b0][rid#8a04eb0/initial] (5) escaping backreference 'bar?bar' to 'bar%3fbar' 172.28.51.128 - - [29/Mar/2012:15:39:39 +0200] [172.28.51.128/sid#847f6b0][rid#8a04eb0/initial] (2) rewrite '/old/bar?bar' -> '/tmp/new/bar%3fbar' 172.28.51.128 - - [29/Mar/2012:15:39:39 +0200] [172.28.51.128/sid#847f6b0][rid#8a04eb0/initial] (2) remember /tmp/new/bar%3fbar to have MIME-type 'text/plain' 172.28.51.128 - - [29/Mar/2012:15:39:39 +0200] [172.28.51.128/sid#847f6b0][rid#8a04eb0/initial] (2) local path result: /tmp/new/bar%3fbar 172.28.51.128 - - [29/Mar/2012:15:39:39 +0200] [172.28.51.128/sid#847f6b0][rid#8a04eb0/initial] (1) go-ahead with /tmp/new/bar%3fbar [OK] 172.28.51.128 - - [29/Mar/2012:15:39:39 +0200] [172.28.51.128/sid#847f6b0][rid#8a04eb0/initial] (1) force filename /tmp/new/bar%3fbar to have MIME-type 'text/plain' <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>404 Not Found</title> </head><body> <h1>Not Found</h1> <p>The requested URL /old/bar?bar was not found on this server.</p> </body></html>
Same problem on Apache 2.4.7. And it isn’t unique to encoded question marks (%3f); encoded hash (%23) has a similarly catastrophic effect.
(In reply to Anders Kaseorg from comment #3) > Same problem on Apache 2.4.7. And it isn’t unique to encoded question marks > (%3f); encoded hash (%23) has a similarly catastrophic effect. Can you elaborate on the problem with %23? I was able to capture and substitute it just fine and find a file with a literal hash mark in it.
With this .htaccess file: RewriteEngine on RewriteRule ^page/(.*)$ /cgi-bin/page.cgi/$1 a request for page/foo%23bar is rewritten to /cgi-bin/page.cgi/foo (the CGI script sees PATH_INFO=/foo), and a request for page/foo%3fbar is rewritten to /cgi-bin/page.cgi/foo?bar (the CGI script sees PATH_INFO=/foo and QUERY_STRING=bar).
I tried passing every unescaped and escaped character through mod_rewrite in a loop, and came up with this list of escaping problems. RewriteRule ^page/(.*)$ page.cgi/$1 page.cgi/foo%0abar: PATH_INFO="/foo\nbar" page/foo%0abar: 404 page.cgi/foo%23bar: PATH_INFO="/foo#bar" page/foo%23bar: PATH_INFO="/foo" page.cgi/foo%25bar: PATH_INFO="/foo%bar" page/foo%25bar: PATH_INFO="/foo\x{BA}r" page.cgi/foo%3fbar: PATH_INFO="/foo?bar" page/foo%3fbar: PATH_INFO="/foo" QUERY_STRING="bar" (%0a is the regex’s fault in this case, since . doesn’t match newline by default, but the problem doesn’t go away with a more careful regex like (?s)\Apage/(.*)\z .) I’ve seen some people recommend the [B] flag as a solution, so I tried that too. That comes with its own set of problems: RewriteRule ^page/(.*)$ page.cgi/$1 [B] page.cgi/foo%0abar: PATH_INFO="/foo\nbar" page/foo%0abar: 404 page.cgi/foo%20bar: PATH_INFO="/foo bar" page/foo%20bar: PATH_INFO="/foo+bar" page.cgi/foo/bar: PATH_INFO="/foo/bar" page/foo/bar: 404 I’m running apache2 2.4.7-1ubuntu1 on Ubuntu trusty amd64.
(In reply to Anders Kaseorg from comment #5) > With this .htaccess file: > > RewriteEngine on > RewriteRule ^page/(.*)$ /cgi-bin/page.cgi/$1 > > a request for page/foo%23bar is rewritten to /cgi-bin/page.cgi/foo (the CGI > script sees PATH_INFO=/foo), and a request for page/foo%3fbar is rewritten > to /cgi-bin/page.cgi/foo?bar (the CGI script sees PATH_INFO=/foo and > QUERY_STRING=bar). I guess the cause is the same here, capturing and substituting the decoded characters and 'B' being unsafe to use in a general purpuse substitution (and requiring per-directory rewrite to allow the re-encoded strings to be decoded again by the core). I haven't looked in detail, but I think in the case of %23 it's the core splitting the URL later as opposed to mod_rewrite splittin the URL shortly after the substitution -- so the solution might be different. From a workaround perspective, 1) What I would normally suggest here is capturing against %{THE_REQUEST} to deal exclusively with the client-encoded form of the request. 2) for the #, it would seem to be feasible to use [N] and replace #->&23 and not pollute every capturing rule. This doesn't work for ? because the split happens right away. From a partial fix perspective, Unfortunately I do not really see a full/acceptable or non opt-in fix at this time 1) something like [B=#?] would allow a rule to be fine tuned a little better _after_ finding a problem. I actually like this one as a general tool. 2) an option to split the query string on the right-most question-mark (this by default would break a URL w/ a query passed in the query) 3) some option to remember that the ? was captured at the time we try to split it. Even this breaks captures against %{THE_REQUEST} in a rewritecond.
> I’ve seen some people recommend the [B] flag as a solution, so I tried that > too. That comes with its own set of problems: In general [B] is very scary. It is not context aware at all.
> 2) an option to split the query string on the right-most question-mark (this > by default would break a URL w/ a query passed in the query) Even with this you'd of course have to manually manage the query string to avoid a ? sneaking in from a substitution as the sole question mark.
I have the same issue with this config: RewriteEngine On RewriteBase "/" RewriteCond %{HTTP:Connection} Upgrade [NC] RewriteCond %{HTTP:Upgrade} websocket [NC] RewriteRule .* ws://backend_ip:backend_port%{REQUEST_URI} [P,L] RewriteRule .* http://backend_ip:backend_port%{REQUEST_URI} [P,L] ProxyPassReverse ws://backend_ip:backend_port/ ProxyPassReverse http://backend_ip:backend_port/ </Location> <Proxy "ws://backend_ip:backend_port"> ProxySet keepalive=On </Proxy> <Proxy "http://backend_ip:backend_port"> ProxySet keepalive=On </Proxy> Using this ugly workaround because I'd need to upgrade apache for proper mod_proxy auto upgrade of websockets support.