Bug 56718

Summary: Cleanup request Host headers when absolute URI are used
Product: Apache httpd-2 Reporter: regilero <regis.leroy>
Component: CoreAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: NEW ---    
Severity: normal CC: apache-bugzilla, bjoern, tfrancis, ylavic.dev
Priority: P2 Keywords: PatchAvailable
Version: 2.4.9   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: patch for server/protocol.c
Alternative fix for protocol.c, includes port number etc
Fix on protocol.c and vhost.c

Description regilero 2014-07-14 12:49:24 UTC
Created attachment 31812 [details]
patch for server/protocol.c

Some programs rely on the HTTP request HOST header to guess the right absolute uri hostname they should use in an application. This is sometimes a fallback, sometimes the default behavior. We all know this is an insecure thing but this is used in real life application (big python or PHP CMS for example).

Usually people believe that using a default catch-all-bad-names Virtualhost would prevent bad Host headers to reach the application, but this is not true, especially because of the absolute-URI managment in Apache.

This as been described publicly in this document:
 * [Practical HTTP Host header attacks] http://www.skeletonscribe.net/2013/05/practical-http-host-header-attacks.html

So, the problem is that Apache respect this RFC2616 rule:

> 1. If Request-URI is an absoluteURI, the host is part of the
>      Request-URI. Any Host header field value in the request MUST be
>      ignored.

And by the way with an HTTP/1.1 request containing both "Host" headers and a hostname in the request URI, the hostname from the URI is used to choose the right named-based VirtualHost and not the Host header hostname.

But the Host headers are kept untouched, and forwarded to any program running after Apache httpd. Any naive python or PHP program may assume that the Host header of the request was used to reach the application, when it was not.

I think this rfc instruction "MUST be ignored" means that the http server should make anything -- even external programs -- unaware of theses headers.

So I think the Host header should be reset to the hostname value extracted from the URI if any Host header is present.
That's a nicer way to ignore theses headers.

AFAIK, There is no legitimate usage of Host headers when using an absolute URI. But this allows for a lot of nasty usages (I could give examples but that's maybe not the right place).
Some Hostname checks are not performed against theses untouched Host headers.

What I'd like is that the general security policy rule which state "use a catch-all default Virtualhost" should always work, even with absolute-URI usage.

How to test:
==========
Add a dummy-host2.example.com non-default Virtualhost with "ServerNamedummy-host2.example.com".

If you want to test received HTTP Headers you can add theses mod_rewrite rules:
<pre>
<VirtualHost *:80>
    ServerAdmin webmaster@dummy-host2.example.com
    DocumentRoot "/opt/apache2/docs/dummy-host2.example.com"
    ServerName dummy-host2.example.com

    <Directory /opt/apache2/docs/dummy-host2.example.com>
        Require all granted
        RewriteEngine On
        RewriteBase /
        RewriteRule .* - [E=INFO_HTTP_HOST:%{HTTP_HOST},NE]
        RewriteRule .* - [E=INFO_SERVER_NAME:%{SERVER_NAME},NE]
        RewriteRule .* - [E=INFO_THE_REQUEST:%{THE_REQUEST},NE]

        Header set INFO_HTTP_HOST "%{INFO_HTTP_HOST}e"
        Header set INFO_SERVER_NAME "%{INFO_SERVER_NAME}e"
        Header set INFO_THE_REQUEST "%{INFO_THE_REQUEST}e"
    </Directory>
</VirtualHost>
</pre>
I then made a very simple /opt/apache2/docs/dummy-host2.example.com/index.html:
<pre>
    <h1>Dummy Host 2</h1>
</pre>

I can test this host and reach it with
<pre>
    printf 'GET http://dummy-host2.example.com/ HTTP/1.1\nHost: ../../../etc/passwd%%0awww.fooobar.!ple.com\n\n' | nc -w 10 -q 10 127.0.0.1 80
</pre>

Before the patch the answer is:
================================
<pre>
HTTP/1.1 200 OK
Date: Mon, 14 Jul 2014 12:37:55 GMT
Server: Apache/2.4.9 (Unix)
Last-Modified: Thu, 05 Jun 2014 17:06:11 GMT
ETag: "16-4fb19c2fed43d"
Accept-Ranges: bytes
Content-Length: 22
INFO_HTTP_HOST: ../../../etc/passwd%0awww.fooobar.!ple.com
INFO_SERVER_NAME: dummy-host2.example.com
INFO_THE_REQUEST: GET http://dummy-host2.example.com/ HTTP/1.1
Content-Type: text/html

<h1>Dummy Host 2</h1>
</pre>

Without the absolute URI I get no response from Apache (expected).

After the patch the answer is:
==============================
<pre>
HTTP/1.1 200 OK
Date: Mon, 14 Jul 2014 12:47:30 GMT
Server: Apache/2.4.9 (Unix)
Last-Modified: Thu, 05 Jun 2014 17:06:11 GMT
ETag: "16-4fb19c2fed43d"
Accept-Ranges: bytes
Content-Length: 22
INFO_HTTP_HOST: dummy-host2.example.com
INFO_SERVER_NAME: dummy-host2.example.com
INFO_THE_REQUEST: GET http://dummy-host2.example.com/ HTTP/1.1
Content-Type: text/html

<h1>Dummy Host 2</h1>
</pre>

About the patch
===============
I made a patch (attached) for Apache 2.4.9 . The fix is done in ap_read_request after the first hostname extraction from absolute URI or CONNECT requests.
And it's done before ap_update_vhost_from_headers which may read hostname from headers if no hostname is available yet, and which also apply some cleanup on the hostname.

Of course I'm not sure that nothing could break with this patch, I do not have a deep knowledge of Apache httpd internals.
* for empty Hosts with absolute URI it seems OK,
* for Hosts headers with absolute URI they're replaced by the unfiltered hostname from the URI
* for CONNECT I do not know this proxy protocol enough to ensure everything is ok, but I think the host Header MUST match the one used in the absolute URI to have a valid CONNECT, so for CONNECT an invalid response is maybe already sent somewhere if it does not match
* I do not think this could be used for cache poisonning. A Reverse proxy cache using the first host Header and not the one extracted from the URI would be a problem, but this would be against the rfc 2616, so this would be an issue for the reverse proxy cache and not for Apache.
* I did not check if Apache used as a reverse proxy would use theses Host headers in the cache key, if so it's wrong, so I'm pretty sure that's not the case.
Comment 1 Tom Francis 2015-02-11 03:12:30 UTC
Hi,

I discovered this bug myself recently, although my discovery of it was within the wording of the newer RFC7230, Section 5.4 and related to mod_proxy - see https://issues.apache.org/bugzilla/show_bug.cgi?id=57563

However I think that fixing this in core would be preferable as it fixes the problem for more scenarios. In a way, you can consider the PHP or other scripting back-end to be the 'true' origin server and that httpd passing requests to PHP is acting as a proxy. If this viewpoint is taken, then httpd MUST rewrite the Host: header to match the authority info of any request-target that is in the absolute-form.

As well as trying to fix this in mod_proxy (See the patch in my linked bug report), I also patched it in vhost.c, although I came up with a different implementation to the original poster of this bug. My version, directly assigns the authority component of a request-target that has the absolute-form, using the parsed_uri struct that belongs to the request. I tasked myself to learn gdb and step through the code to debug the behaviour (was fun!) and it seems to work in all cases, including appending the port to the Host: header if it was a part of the request-target.

Thanks,

Tom...
Comment 2 Tom Francis 2015-02-11 03:15:59 UTC
Created attachment 32450 [details]
Alternative fix for protocol.c, includes port number etc

This patch updates the Host: header, only if request-target is in absolute-form and if Host: header has already been set.
Comment 3 Yann Ylavic 2015-02-11 15:13:45 UTC
Created attachment 32455 [details]
Fix on protocol.c and vhost.c

Thanks Regis and Tom for the report and patches.

The case is already handled by ap_update_vhost_from_headers() when "HttpProtocol strict" is configured (though without the URI's :port being appended).

If we were to conform to the RFC by default, which I agree with but like to ear others about the possible compatibily caveats, I think the proposed patch are not complete.

This new (attached) patch avoids the double work in ap_read_request() and ap_update_vhost_from_headers(), and also takes care of IPv6-literal hostnames (which need to be surrounded by square brackets in the Host header).
Comment 4 Yann Ylavic 2015-02-11 16:08:28 UTC
*** Bug 57563 has been marked as a duplicate of this bug. ***
Comment 5 Michael Kaufmann 2015-07-02 15:35:28 UTC
I think that this is an important issue. The patch/bugfix will prevent many attacks.

The latest patch could be improved: If the request does not have a "Host" header (HTTP 1.0) and if an absolute URI is used, then a "Host" header should be added to the request.
Comment 6 Patryk Szalanski 2016-05-31 09:01:44 UTC
Are there any plans on integrating the patch in the near future? We would like to improve security of our WSGI applications, without the need to implement a middleware to handle the host header.