Bug 49642

Summary: mod_rewrite mistakes encoded question mark as path/query string separator
Product: Apache httpd-2 Reporter: Petter Berntsen <petterb>
Component: mod_rewriteAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: NEW ---    
Severity: major CC: andersk, jonas.maanmies
Priority: P2    
Version: 2.4.7   
Target Milestone: ---   
Hardware: PC   
OS: Linux   

Description Petter Berntsen 2010-07-23 09:24:56 UTC
A resource with a encoded question mark (%3F) will be split up, by mod_rewrite, into name and args. Expected result would be for mod_rewrite to leave the name intact.


# From rewrite.log
(2) init rewrite engine with requested uri /trigrewrite_filename?args
(3) applying pattern '^/trigrewrite_(.*)' to uri '/trigrewrite_filename?args'
(2) rewrite '/trigrewrite_filename?args' -> '/filename?args'
(3) split uri=/filename?args -> uri=/filename, args=args
(2) local path result: /filename


# Vhost conf
Listen 9004
<VirtualHost *:9004>
    Servername rewritebug
    DocumentRoot /home/apache/www/rewrite
#   AllowEncodedSlashes On
    RewriteEngine On
    RewriteLog "logs/rewrite.log"
    RewriteLogLevel 9

    RewriteRule ^/trigrewrite_(.*) /$1 [L]

    <Directory "/home/apache/www/rewrite">
        Options None
        Order allow,deny
        Allow from all
    </Directory>
</VirtualHost>


# From access_log
"GET /trigrewrite_filename%3Fargs HTTP/1.1" 404 223


# index.html
test rewrite
<script type="text/javascript">
var uri = '/trigrewrite_filename%3Fargs';
document.write('<iframe src="'+uri +'"></iframe>');
</script>
Comment 1 Eric Covener 2010-07-23 09:44:34 UTC
rewrite flags doc sure thinks a literal ? in a backreference will be escaped by default, and even uses it in the example!

"NE|noescape
By default, special characters, such as & and ?, for example, will be converted to their hexcode equivalent. Using the [NE] flag prevents that from happening."
Comment 2 Rainer Canavan 2012-03-29 15:32:04 UTC
I think I have just stumbled upon the same problem. I have a file with a '?' in its name that I want to acccess via rewrite rules, because it used to be in /old, but is now in /new. I can work around this in this simple case by just using an equivalent Alias statement.

The RewriteRule is

RewriteRule ^\/\/?old\/(.*)$ /tmp/new/$1 [T=text/plain,L,NE]

Just checking that the rule works with a "normal" file "foo" with contents "foo foo":

curl http://172.28.51.128:11003/old/foo
172.28.51.128 - - [29/Mar/2012:15:36:14 +0200] [172.28.51.128/sid#8611b58][rid#89faec8/initial] (2) init rewrite engine with requested uri /old/foo
172.28.51.128 - - [29/Mar/2012:15:36:14 +0200] [172.28.51.128/sid#8611b58][rid#89faec8/initial] (3) applying pattern '^\/\/?old\/(.*)$' to uri '/old/foo'
172.28.51.128 - - [29/Mar/2012:15:36:14 +0200] [172.28.51.128/sid#8611b58][rid#89faec8/initial] (2) rewrite '/old/foo' -> '/tmp/new/foo'
172.28.51.128 - - [29/Mar/2012:15:36:14 +0200] [172.28.51.128/sid#8611b58][rid#89faec8/initial] (2) remember /tmp/new/foo to have MIME-type 'text/plain'
172.28.51.128 - - [29/Mar/2012:15:36:14 +0200] [172.28.51.128/sid#8611b58][rid#89faec8/initial] (2) local path result: /tmp/new/foo
172.28.51.128 - - [29/Mar/2012:15:36:14 +0200] [172.28.51.128/sid#8611b58][rid#89faec8/initial] (1) go-ahead with /tmp/new/foo [OK]
172.28.51.128 - - [29/Mar/2012:15:36:14 +0200] [172.28.51.128/sid#8611b58][rid#89faec8/initial] (1) force filename /tmp/new/foo to have MIME-type 'text/plain'
foo foo

The file named "bar?bar" (results with or without QSA  or NE are identical):
curl http://172.28.51.128:11003/old/bar%3Fbar
172.28.51.128 - - [29/Mar/2012:15:36:23 +0200] [172.28.51.128/sid#8611b58][rid#89fced0/initial] (2) init rewrite engine with requested uri /old/bar?bar
172.28.51.128 - - [29/Mar/2012:15:36:23 +0200] [172.28.51.128/sid#8611b58][rid#89fced0/initial] (3) applying pattern '^\/\/?old\/(.*)$' to uri '/old/bar?bar'
172.28.51.128 - - [29/Mar/2012:15:36:23 +0200] [172.28.51.128/sid#8611b58][rid#89fced0/initial] (2) rewrite '/old/bar?bar' -> '/tmp/new/bar?bar'
172.28.51.128 - - [29/Mar/2012:15:36:23 +0200] [172.28.51.128/sid#8611b58][rid#89fced0/initial] (3) split uri=/tmp/new/bar?bar -> uri=/tmp/new/bar, args=bar
172.28.51.128 - - [29/Mar/2012:15:36:23 +0200] [172.28.51.128/sid#8611b58][rid#89fced0/initial] (2) remember /tmp/new/bar to have MIME-type 'text/plain'
172.28.51.128 - - [29/Mar/2012:15:36:23 +0200] [172.28.51.128/sid#8611b58][rid#89fced0/initial] (2) local path result: /tmp/new/bar
172.28.51.128 - - [29/Mar/2012:15:36:23 +0200] [172.28.51.128/sid#8611b58][rid#89fced0/initial] (1) go-ahead with /tmp/new/bar [OK]
172.28.51.128 - - [29/Mar/2012:15:36:23 +0200] [172.28.51.128/sid#8611b58][rid#89fced0/initial] (1) force filename /tmp/new/bar to have MIME-type 'text/plain'
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /old/bar?bar was not found on this server.</p>
</body></html>

A file named "baz?baz" in directory /new2 in the document root works as expected:
curl http://172.28.51.128:11003/new2/baz%3Fbaz
172.28.51.128 - - [29/Mar/2012:15:37:23 +0200] [172.28.51.128/sid#8611b58][rid#8a04ef0/initial] (2) init rewrite engine with requested uri /new2/baz?baz
172.28.51.128 - - [29/Mar/2012:15:37:23 +0200] [172.28.51.128/sid#8611b58][rid#8a04ef0/initial] (3) applying pattern '^\/\/?old\/(.*)$' to uri '/new2/baz?baz'
172.28.51.128 - - [29/Mar/2012:15:37:23 +0200] [172.28.51.128/sid#8611b58][rid#8a04ef0/initial] (3) applying pattern '.' to uri '/new2/baz?baz'
172.28.51.128 - - [29/Mar/2012:15:37:23 +0200] [172.28.51.128/sid#8611b58][rid#8a04ef0/initial] (4) RewriteCond: input='curl/7.24.0 (i686-pc-linux-gnu) libcurl/7.24.0 OpenSSL/0.9.8u zlib/1.2.5' pattern='Apache \(internal dummy connection\)' => not-matched
172.28.51.128 - - [29/Mar/2012:15:37:23 +0200] [172.28.51.128/sid#8611b58][rid#8a04ef0/initial] (1) pass through /new2/baz?baz
baz baz

Adding [B] to the flags doesn't have the desired results either - it's actually worse, because '/' would get encoded to %2F, making directories inaccessible:
curl http://172.28.51.128:11003/old/bar%3Fbar
172.28.51.128 - - [29/Mar/2012:15:39:39 +0200] [172.28.51.128/sid#847f6b0][rid#8a04eb0/initial] (2) init rewrite engine with requested uri /old/bar?bar
172.28.51.128 - - [29/Mar/2012:15:39:39 +0200] [172.28.51.128/sid#847f6b0][rid#8a04eb0/initial] (3) applying pattern '^\/\/?old\/(.*)$' to uri '/old/bar?bar'
172.28.51.128 - - [29/Mar/2012:15:39:39 +0200] [172.28.51.128/sid#847f6b0][rid#8a04eb0/initial] (5) escaping backreference 'bar?bar' to 'bar%3fbar'
172.28.51.128 - - [29/Mar/2012:15:39:39 +0200] [172.28.51.128/sid#847f6b0][rid#8a04eb0/initial] (2) rewrite '/old/bar?bar' -> '/tmp/new/bar%3fbar'
172.28.51.128 - - [29/Mar/2012:15:39:39 +0200] [172.28.51.128/sid#847f6b0][rid#8a04eb0/initial] (2) remember /tmp/new/bar%3fbar to have MIME-type 'text/plain'
172.28.51.128 - - [29/Mar/2012:15:39:39 +0200] [172.28.51.128/sid#847f6b0][rid#8a04eb0/initial] (2) local path result: /tmp/new/bar%3fbar
172.28.51.128 - - [29/Mar/2012:15:39:39 +0200] [172.28.51.128/sid#847f6b0][rid#8a04eb0/initial] (1) go-ahead with /tmp/new/bar%3fbar [OK]
172.28.51.128 - - [29/Mar/2012:15:39:39 +0200] [172.28.51.128/sid#847f6b0][rid#8a04eb0/initial] (1) force filename /tmp/new/bar%3fbar to have MIME-type 'text/plain'
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /old/bar?bar was not found on this server.</p>
</body></html>
Comment 3 Anders Kaseorg 2014-03-09 12:07:14 UTC
Same problem on Apache 2.4.7.  And it isn’t unique to encoded question marks (%3f); encoded hash (%23) has a similarly catastrophic effect.
Comment 4 Eric Covener 2014-03-09 12:20:20 UTC
(In reply to Anders Kaseorg from comment #3)
> Same problem on Apache 2.4.7.  And it isn’t unique to encoded question marks
> (%3f); encoded hash (%23) has a similarly catastrophic effect.

Can you elaborate on the problem with %23?  I was able to capture and substitute it just fine and find a file with a literal hash mark in it.
Comment 5 Anders Kaseorg 2014-03-09 12:45:57 UTC
With this .htaccess file:

RewriteEngine on
RewriteRule ^page/(.*)$ /cgi-bin/page.cgi/$1

a request for page/foo%23bar is rewritten to /cgi-bin/page.cgi/foo (the CGI script sees PATH_INFO=/foo), and a request for page/foo%3fbar is rewritten to /cgi-bin/page.cgi/foo?bar (the CGI script sees PATH_INFO=/foo and QUERY_STRING=bar).
Comment 6 Anders Kaseorg 2014-03-09 13:32:38 UTC
I tried passing every unescaped and escaped character through mod_rewrite in a loop, and came up with this list of escaping problems.

RewriteRule ^page/(.*)$ page.cgi/$1

page.cgi/foo%0abar: PATH_INFO="/foo\nbar"
page/foo%0abar: 404

page.cgi/foo%23bar: PATH_INFO="/foo#bar"
page/foo%23bar: PATH_INFO="/foo"

page.cgi/foo%25bar: PATH_INFO="/foo%bar"
page/foo%25bar: PATH_INFO="/foo\x{BA}r"

page.cgi/foo%3fbar: PATH_INFO="/foo?bar"
page/foo%3fbar: PATH_INFO="/foo" QUERY_STRING="bar"

(%0a is the regex’s fault in this case, since . doesn’t match newline by default, but the problem doesn’t go away with a more careful regex like (?s)\Apage/(.*)\z .)

I’ve seen some people recommend the [B] flag as a solution, so I tried that too.  That comes with its own set of problems:

RewriteRule ^page/(.*)$ page.cgi/$1 [B]

page.cgi/foo%0abar: PATH_INFO="/foo\nbar"
page/foo%0abar: 404

page.cgi/foo%20bar: PATH_INFO="/foo bar"
page/foo%20bar: PATH_INFO="/foo+bar"

page.cgi/foo/bar: PATH_INFO="/foo/bar"
page/foo/bar: 404

I’m running apache2 2.4.7-1ubuntu1 on Ubuntu trusty amd64.
Comment 7 Eric Covener 2014-03-09 13:34:35 UTC
(In reply to Anders Kaseorg from comment #5)
> With this .htaccess file:
> 
> RewriteEngine on
> RewriteRule ^page/(.*)$ /cgi-bin/page.cgi/$1
> 
> a request for page/foo%23bar is rewritten to /cgi-bin/page.cgi/foo (the CGI
> script sees PATH_INFO=/foo), and a request for page/foo%3fbar is rewritten
> to /cgi-bin/page.cgi/foo?bar (the CGI script sees PATH_INFO=/foo and
> QUERY_STRING=bar).

I guess the cause is the same here, capturing and substituting the decoded characters and 'B' being unsafe to use in a general purpuse substitution (and requiring per-directory rewrite to allow the re-encoded strings to be decoded again by the core).

I haven't looked in detail, but I think in the case of %23 it's the core splitting the URL later as opposed to mod_rewrite splittin the URL shortly after the substitution -- so the solution might be different.

From a workaround perspective,

1)
What I would normally suggest here is capturing against %{THE_REQUEST} to deal exclusively with the client-encoded form of the request.

2)
for the #, it would seem to be feasible to use [N] and replace #->&23 and not pollute every capturing rule.  This doesn't work for ? because the split happens right away.

From a partial fix perspective, Unfortunately I do not really see a full/acceptable or non opt-in fix at this time


1) something like [B=#?] would allow a rule to be fine tuned a little better _after_ finding a problem.  I actually like this one as a general tool.

2) an option to split the query string on the right-most question-mark (this by default would break a URL w/ a query passed in the query)

3) some option to remember that the ? was captured at the time we try to split it. Even this breaks captures against %{THE_REQUEST} in a rewritecond.
Comment 8 Eric Covener 2014-03-09 13:35:46 UTC
> I’ve seen some people recommend the [B] flag as a solution, so I tried that
> too.  That comes with its own set of problems:

In general [B] is very scary. It is not context aware at all.
Comment 9 Eric Covener 2014-03-09 14:15:59 UTC
> 2) an option to split the query string on the right-most question-mark (this
> by default would break a URL w/ a query passed in the query)

Even with this you'd of course have to manually manage the query string to avoid a ? sneaking in from a substitution as the sole question mark.
Comment 10 Jonas Maanmies 2018-10-25 14:50:07 UTC
I have the same issue with this config:

  RewriteEngine On
  RewriteBase "/"
  RewriteCond %{HTTP:Connection} Upgrade [NC]
  RewriteCond %{HTTP:Upgrade} websocket [NC]
  RewriteRule .* ws://backend_ip:backend_port%{REQUEST_URI} [P,L]
  RewriteRule .* http://backend_ip:backend_port%{REQUEST_URI} [P,L]
  ProxyPassReverse ws://backend_ip:backend_port/
  ProxyPassReverse http://backend_ip:backend_port/
  </Location>
  <Proxy "ws://backend_ip:backend_port">
    ProxySet keepalive=On
  </Proxy>
  <Proxy "http://backend_ip:backend_port">
    ProxySet keepalive=On
  </Proxy>


Using this ugly workaround because I'd need to upgrade apache for proper mod_proxy auto upgrade of websockets support.