Bug 61820

Summary: 304 headers stripped
Product: Apache httpd-2 Reporter: Mark Nottingham <mnot>
Component: CoreAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: NEW ---    
Severity: normal CC: andreas.luik, ben.rubson, covener, fielding, herr.ernst, matt, sebastian, simon.oberhammer
Priority: P2 Keywords: FixedInTrunk, PatchAvailable
Version: 2.5-HEAD   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: Strip only unwanted headers
Strip unwanted headers
Strip unwanted headers without subprocess_env envvar

Description Mark Nottingham 2017-11-26 23:19:24 UTC
In http_filters:ap_http_header_filter, 304 responses get special treatment, in that they're sent with only a fixed set of headers (if present):

1414	    if (r->status == HTTP_NOT_MODIFIED) {
1415	        apr_table_do((int (*)(void *, const char *, const char *)) form_header_field,
1416	                     (void *) &h, r->headers_out,
1417	                     "Connection",
1418	                     "Keep-Alive",
1419	                     "ETag",
1420	                     "Content-Location",
1421	                     "Expires",
1422	                     "Cache-Control",
1423	                     "Vary",
1424	                     "Warning",
1425	                     "WWW-Authenticate",
1426	                     "Proxy-Authenticate",
1427	                     "Set-Cookie",
1428	                     "Set-Cookie2",
1429	                     NULL);
1430	    }

<https://svn.apache.org/viewvc/httpd/httpd/trunk/modules/http/http_filters.c?revision=1777672&view=markup#l1414>

This means that any header value that a generator (whether CGI, an upstream origin via mod_proxy, etc.) updates in a 304 will be lost.

RFC7234 specifies how headers on a 304 are supposed to be handled:
  http://httpwg.org/specs/rfc7234.html#freshening.responses

This has caused interoperability problems in the wild, e.g.,:
  https://github.com/hueniverse/hawk/issues/224
Comment 1 Eric Covener 2017-11-28 03:35:57 UTC
Hi Mark, I think some of the lack of action on this is related to this text:

https://tools.ietf.org/html/rfc7232#section-4.1

The server generating a 304 response MUST generate any of the
   following header fields that would have been sent in a 200 (OK)
   response to the same request: Cache-Control, Content-Location, Date,
   ETag, Expires, and Vary.
...

Since the goal of a 304 response is to minimize information transfer
   when the recipient already has one or more cached representations, a
   sender SHOULD NOT generate representation metadata other than the
   above listed fields unless said metadata exists for the purpose of
   guiding cache updates (e.g., Last-Modified might be useful if the
   response does not have an ETag field).


7230 lists a few "representation metadata" but I'm not sure it's meant to be exhaustive, so it's difficult to whitelist or blacklist it.  Any advice on how we differentiate here?

3.1.  Representation Metadata

   Representation header fields provide metadata about the
   representation.  When a message includes a payload body, the
   representation header fields describe how to interpret the
   representation data enclosed in the payload body.  In a response to a
   HEAD request, the representation header fields describe the
   representation data that would have been enclosed in the payload body
   if the same request had been a GET.

   The following header fields convey representation metadata:

   +-------------------+-----------------+
   | Header Field Name | Defined in...   |
   +-------------------+-----------------+
   | Content-Type      | Section 3.1.1.5 |
   | Content-Encoding  | Section 3.1.2.2 |
   | Content-Language  | Section 3.1.3.2 |
   | Content-Location  | Section 3.1.4.2 |
   +-------------------+-----------------+
Comment 2 Mark Nottingham 2017-11-28 07:02:18 UTC
The key word here is "generate." The problem is that when the 304 is being generated elsewhere -- whether by a CGI handler or by an upstream server -- it has authority to decide what headers to include.

I think this could be addressed by removing the code here and imposing this kind of filtering (if necessary) where Apache actually generates a 304 as part of handling a conditional request itself (e.g., from cache, from disk, but not from CGI, etc.).
Comment 3 Roy T. Fielding 2018-10-26 20:42:42 UTC
I can confirm that this is a bug.

The original code that I wrote (in 1997) for the canned response generator (ap_send_error_response) was moved in 2000 to the generic http filter in a mistaken attempt to solve an unrelated bug in 2.0a streams.

RFC2616 did not distinguish between sending and generating, so this used to be a reasonable attempt to adhere to a SHOULD requirement only on responses that the core generated itself. It has no business being applied to all 304 responses.

I think the best solution right now is to simply remove the entire conditional that attempts to reduce 304 responses. If that results in too much data being sent, we should have a configurable list of fields to exclude on 304 responses rather than a fixed table in the source code.
Comment 4 William A. Rowe Jr. 2018-11-07 18:15:01 UTC
Eliminating the filter may leave us exposed...

for all r->header_only and AP_STATUS_HAS_BODY(r->status) requests, reading
rfc7230 3.3.1 and 3.3.2, it would be best to still drop Transfer-Encoding
if we encounter it, it is actively disallowed in 1xx, 204, and of no apparent
benefit to 304 requests. Content-Length can be preserved.
Comment 5 William A. Rowe Jr. 2018-11-07 19:36:12 UTC
(In reply to William A. Rowe Jr. from comment #4)
> Eliminating the filter may leave us exposed...
> 
> for all r->header_only and AP_STATUS_HAS_BODY(r->status) requests, reading
> rfc7230 3.3.1 and 3.3.2, it would be best to still drop Transfer-Encoding
> if we encounter it, it is actively disallowed in 1xx, 204, and of no apparent
> benefit to 304 requests. Content-Length can be preserved.

Specifically, I think we want to apply this test to drop Transfer-Encoding when it is encountered, either here where Roy identified the foolish filtering, or at some later point prior to transmission;

  if (r->header_only || AP_STATUS_IS_HEADER_ONLY(r->status))
     drop T-E;
Comment 6 Mark Nottingham 2018-11-08 01:09:39 UTC
The HTTP WG is discussing the exact list of headers to strip here:
  https://github.com/httpwg/http-core/issues/165

Please join in.
Comment 7 andreas.luik 2020-03-03 15:31:32 UTC
Any progress on this?  We have the problem that Apache httpd (version 2.4.x) used as a reverse proxy removes CORS headers (with a 304 repsonse), resulting in browser access errors.  This makes Apache https useless as a reverse proxy with modern web applications.  Am I missing something?  (We are switching to nginx, which does pass the headers as expected ...)
Comment 8 Giovanni Bechis 2020-03-12 18:40:47 UTC
Created attachment 37097 [details]
Strip only unwanted headers

Following the list of headers Gecko doesn't update on a 304,
the following patch strips only unwanted headers.
Comment 9 Eric Covener 2020-03-12 18:46:19 UTC
Comment on attachment 37097 [details]
Strip only unwanted headers

wdyt about an subprocess_env envvar that allows the old behavior? Even if default behavior changes this allows a downstream user to back out just this bit if it causes some unexpected problem.
Comment 10 Mark Nottingham 2020-03-12 22:50:50 UTC
> Following the list of headers Gecko doesn't update on a 304

Please don't do this; it follows and reinforces demonstrably bad practice based upon misinterpretation of the specification.

Please participate in the standards discussion and implement the resolution of that.
Comment 11 Ruediger Pluem 2020-03-13 07:37:42 UTC
(In reply to Mark Nottingham from comment #10)
> > Following the list of headers Gecko doesn't update on a 304
> 
> Please don't do this; it follows and reinforces demonstrably bad practice
> based upon misinterpretation of the specification.
> 
> Please participate in the standards discussion and implement the resolution
> of that.

So your advice is that we shouldn't change any code until the standards discussion, where we can / should take part, comes to a conclusion what is the correct thing to do?
Comment 12 Mark Nottingham 2020-03-16 23:00:31 UTC
Yes. Hopefully, that will be soon.
Comment 13 Sebastian Kippe 2020-05-26 07:36:53 UTC
Looks like the discussion has been resolved:

https://github.com/httpwg/http-core/issues/165
Comment 14 Ruediger Pluem 2020-05-26 09:13:12 UTC
So removing

TE
Trailer
Transfer-Encoding
Upgrade
Content-Encoding
Content-Length
Content-MD5
Content-Range

should be enough?
mod_proxy_http already removes all headers that are in the connection header of the backend response and we set our own connection and keep-alive header.
Comment 15 Giovanni Bechis 2020-05-26 09:27:29 UTC
Shouldn't ETag be removed as well ?
Comment 16 Ruediger Pluem 2020-05-26 09:41:58 UTC
(In reply to Giovanni Bechis from comment #15)
> Shouldn't ETag be removed as well ?

ETag is not listed here:

https://github.com/httpwg/http-core/pull/337/files
Comment 17 Giovanni Bechis 2020-05-26 11:56:15 UTC
Maybe I am missing something but this is part of the code that has been committed:

Caches MUST-NOT update the following header fields:
Content-Encoding,
Content-Length,
Content-MD5 ("RFC2616#14.15),
Content-Range,
ETag.
Comment 18 Ruediger Pluem 2020-05-26 13:14:47 UTC
(In reply to Giovanni Bechis from comment #17)
> Maybe I am missing something but this is part of the code that has been
> committed:
> 
> Caches MUST-NOT update the following header fields:
> Content-Encoding,
> Content-Length,
> Content-MD5 ("RFC2616#14.15),
> Content-Range,
> ETag.

My bad :-(. I missed ETag. You are correct.
So remove

TE
Trailer
Transfer-Encoding
Upgrade
Content-Encoding
Content-Length
Content-MD5
Content-Range
ETag
Comment 19 Giovanni Bechis 2020-05-27 16:38:58 UTC
Created attachment 37269 [details]
Strip unwanted headers

Add a subprocess_env to restore old behaviour, uncertain about correct naming and about where this should be documented.
Comment 20 Ruediger Pluem 2020-05-28 07:06:52 UTC
I am not sure if we need to be able to provide the old behavior and if yes I think the name of the environment variable should be "better" (whatever that means, yes sweet naming discussions :-)).
Comment 21 Giovanni Bechis 2020-05-28 08:42:27 UTC
Honestly I do not think it makes sense to provide backward compatibility in this case because it would deviate from the standard.
I am all for committing just the portion of the code that strips the correct headers.
Comment 22 Giovanni Bechis 2020-07-01 13:22:35 UTC
Created attachment 37345 [details]
Strip unwanted headers without subprocess_env envvar

Strip unwanted headers without using a subprocess_env var to avoid this behaviour
Comment 23 Matt McCutchen 2020-09-07 15:22:06 UTC
Thanks to everyone who worked or is working on this issue!  While we wait for the fix to be committed and deployed to our web hosts (which I'm guessing may take years), here's a workaround I plan to use on my site:

For each request, compute a "salt" string that uniquely identifies the relevant response header values, and then emulate a resource whose ETag incorporates the salt.  Thus, if a request returns a salted ETag that is found in the client's cache, then the response headers in the cache are guaranteed to equal those that would have appeared in the actual response if Apache didn't strip them.  Since a last-modified time cannot accommodate salt, the If-Modified-Since request header must be dropped so that a 304 is not returned on that basis.

In general, the response headers and thus the salt could depend on both the server configuration and the request headers.  In most scenarios, it's probably easiest to represent the server configuration by an integer that is manually incremented for each relevant change.  Request headers can just be concatenated into the salt with any problematic characters suitably encoded.

Of course, we must ensure the client sends the request to the server in the first place rather than fulfilling it from cache.  This is independent of the current issue but probably worth reminding people about.  For a dependency on a request header, we just merge its name into the Vary response header.  For a change in server configuration, the best we can do is set the Cache-Control max-age to ensure clients eventually find out about the change.

Here's a code example for a .htaccess file where, if the Origin request header specifies an allowed origin, we want to set an equal Access-Control-Allow-Origin response header.  The server configuration revision is 42; we would increment the number if we changed the logic for generating response headers.  I'm assuming no browser sends an Origin header with nasty characters; this might need more research.  Please let me know if anything is wrong with this example!

~~~~~~~~
SetEnv HEADER_CONFIG_REV 42
RewriteRule ^ - [E=ETAG_SALT:%{ENV:HEADER_CONFIG_REV}-%{HTTP:Origin}]
Header merge Vary Origin
Header set Cache-Control max-age=1200

# Our goal is to strip the salt from the If-None-Match and If-Match headers if
# it equals the current salt.  However, %{ETAG_SALT}e expansion in the regular
# expression is not supported, so we use the following hack: append the current
# salt to all ETags, and then if an ETag ends in two equal salts (as determined
# by a regular expression with a \1 backreference), strip both of them.  ETags
# that did not originally end with the current salt will be left with at least
# one salt and won't be able to match the real ETag, so their presence does not
# matter.
RequestHeader edit* If-None-Match "([^\ ])\"" "$1+%{ETAG_SALT}e\""
RequestHeader edit* If-None-Match "(\+[^+\" ]*)\1\"" "\""
RequestHeader edit* If-Match "([^\ ])\"" "$1+%{ETAG_SALT}e\""
RequestHeader edit* If-Match "(\+[^+\" ]*)\1\"" "\""
RequestHeader unset If-Modified-Since

# Always add the salt to the ETag response header.
Header edit ETag "\"$" "+%{ETAG_SALT}e\""

RewriteCond %{HTTP:Origin} ^https://(.+\.)?example\.com$
RewriteRule ^ - [E=ACAO:%{HTTP:Origin}]
Header set Access-Control-Allow-Origin %{ACAO}e env=ACAO
~~~~~~~~

When calling a CGI, some of this processing could probably be done in the CGI,
but I haven't tested that.
Comment 24 Ruediger Pluem 2020-09-08 06:23:25 UTC
(In reply to Giovanni Bechis from comment #22)
> Created attachment 37345 [details]
> Strip unwanted headers without subprocess_env envvar
> 
> Strip unwanted headers without using a subprocess_env var to avoid this
> behaviour

Looks good. Can you commit?
Comment 25 Michael Kaufmann 2020-10-19 14:35:17 UTC
This has been fixed in http://svn.apache.org/r1881590 and http://svn.apache.org/r1881624 .

Please backport to 2.4.x