Bug 61820 - 304 headers stripped
Summary: 304 headers stripped
Status: RESOLVED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: Core (show other bugs)
Version: 2.5-HEAD
Hardware: All All
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords: FixedInTrunk, PatchAvailable
Depends on:
Blocks:
 
Reported: 2017-11-26 23:19 UTC by Mark Nottingham
Modified: 2021-06-09 13:38 UTC (History)
9 users (show)



Attachments
Strip only unwanted headers (2.11 KB, patch)
2020-03-12 18:40 UTC, Giovanni Bechis
Details | Diff
Strip unwanted headers (1.89 KB, patch)
2020-05-27 16:38 UTC, Giovanni Bechis
Details | Diff
Strip unwanted headers without subprocess_env envvar (1.65 KB, patch)
2020-07-01 13:22 UTC, Giovanni Bechis
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Nottingham 2017-11-26 23:19:24 UTC
In http_filters:ap_http_header_filter, 304 responses get special treatment, in that they're sent with only a fixed set of headers (if present):

1414	    if (r->status == HTTP_NOT_MODIFIED) {
1415	        apr_table_do((int (*)(void *, const char *, const char *)) form_header_field,
1416	                     (void *) &h, r->headers_out,
1417	                     "Connection",
1418	                     "Keep-Alive",
1419	                     "ETag",
1420	                     "Content-Location",
1421	                     "Expires",
1422	                     "Cache-Control",
1423	                     "Vary",
1424	                     "Warning",
1425	                     "WWW-Authenticate",
1426	                     "Proxy-Authenticate",
1427	                     "Set-Cookie",
1428	                     "Set-Cookie2",
1429	                     NULL);
1430	    }

<https://svn.apache.org/viewvc/httpd/httpd/trunk/modules/http/http_filters.c?revision=1777672&view=markup#l1414>

This means that any header value that a generator (whether CGI, an upstream origin via mod_proxy, etc.) updates in a 304 will be lost.

RFC7234 specifies how headers on a 304 are supposed to be handled:
  http://httpwg.org/specs/rfc7234.html#freshening.responses

This has caused interoperability problems in the wild, e.g.,:
  https://github.com/hueniverse/hawk/issues/224
Comment 1 Eric Covener 2017-11-28 03:35:57 UTC
Hi Mark, I think some of the lack of action on this is related to this text:

https://tools.ietf.org/html/rfc7232#section-4.1

The server generating a 304 response MUST generate any of the
   following header fields that would have been sent in a 200 (OK)
   response to the same request: Cache-Control, Content-Location, Date,
   ETag, Expires, and Vary.
...

Since the goal of a 304 response is to minimize information transfer
   when the recipient already has one or more cached representations, a
   sender SHOULD NOT generate representation metadata other than the
   above listed fields unless said metadata exists for the purpose of
   guiding cache updates (e.g., Last-Modified might be useful if the
   response does not have an ETag field).


7230 lists a few "representation metadata" but I'm not sure it's meant to be exhaustive, so it's difficult to whitelist or blacklist it.  Any advice on how we differentiate here?

3.1.  Representation Metadata

   Representation header fields provide metadata about the
   representation.  When a message includes a payload body, the
   representation header fields describe how to interpret the
   representation data enclosed in the payload body.  In a response to a
   HEAD request, the representation header fields describe the
   representation data that would have been enclosed in the payload body
   if the same request had been a GET.

   The following header fields convey representation metadata:

   +-------------------+-----------------+
   | Header Field Name | Defined in...   |
   +-------------------+-----------------+
   | Content-Type      | Section 3.1.1.5 |
   | Content-Encoding  | Section 3.1.2.2 |
   | Content-Language  | Section 3.1.3.2 |
   | Content-Location  | Section 3.1.4.2 |
   +-------------------+-----------------+
Comment 2 Mark Nottingham 2017-11-28 07:02:18 UTC
The key word here is "generate." The problem is that when the 304 is being generated elsewhere -- whether by a CGI handler or by an upstream server -- it has authority to decide what headers to include.

I think this could be addressed by removing the code here and imposing this kind of filtering (if necessary) where Apache actually generates a 304 as part of handling a conditional request itself (e.g., from cache, from disk, but not from CGI, etc.).
Comment 3 Roy T. Fielding 2018-10-26 20:42:42 UTC
I can confirm that this is a bug.

The original code that I wrote (in 1997) for the canned response generator (ap_send_error_response) was moved in 2000 to the generic http filter in a mistaken attempt to solve an unrelated bug in 2.0a streams.

RFC2616 did not distinguish between sending and generating, so this used to be a reasonable attempt to adhere to a SHOULD requirement only on responses that the core generated itself. It has no business being applied to all 304 responses.

I think the best solution right now is to simply remove the entire conditional that attempts to reduce 304 responses. If that results in too much data being sent, we should have a configurable list of fields to exclude on 304 responses rather than a fixed table in the source code.
Comment 4 William A. Rowe Jr. 2018-11-07 18:15:01 UTC
Eliminating the filter may leave us exposed...

for all r->header_only and AP_STATUS_HAS_BODY(r->status) requests, reading
rfc7230 3.3.1 and 3.3.2, it would be best to still drop Transfer-Encoding
if we encounter it, it is actively disallowed in 1xx, 204, and of no apparent
benefit to 304 requests. Content-Length can be preserved.
Comment 5 William A. Rowe Jr. 2018-11-07 19:36:12 UTC
(In reply to William A. Rowe Jr. from comment #4)
> Eliminating the filter may leave us exposed...
> 
> for all r->header_only and AP_STATUS_HAS_BODY(r->status) requests, reading
> rfc7230 3.3.1 and 3.3.2, it would be best to still drop Transfer-Encoding
> if we encounter it, it is actively disallowed in 1xx, 204, and of no apparent
> benefit to 304 requests. Content-Length can be preserved.

Specifically, I think we want to apply this test to drop Transfer-Encoding when it is encountered, either here where Roy identified the foolish filtering, or at some later point prior to transmission;

  if (r->header_only || AP_STATUS_IS_HEADER_ONLY(r->status))
     drop T-E;
Comment 6 Mark Nottingham 2018-11-08 01:09:39 UTC
The HTTP WG is discussing the exact list of headers to strip here:
  https://github.com/httpwg/http-core/issues/165

Please join in.
Comment 7 andreas.luik 2020-03-03 15:31:32 UTC
Any progress on this?  We have the problem that Apache httpd (version 2.4.x) used as a reverse proxy removes CORS headers (with a 304 repsonse), resulting in browser access errors.  This makes Apache https useless as a reverse proxy with modern web applications.  Am I missing something?  (We are switching to nginx, which does pass the headers as expected ...)
Comment 8 Giovanni Bechis 2020-03-12 18:40:47 UTC
Created attachment 37097 [details]
Strip only unwanted headers

Following the list of headers Gecko doesn't update on a 304,
the following patch strips only unwanted headers.
Comment 9 Eric Covener 2020-03-12 18:46:19 UTC
Comment on attachment 37097 [details]
Strip only unwanted headers

wdyt about an subprocess_env envvar that allows the old behavior? Even if default behavior changes this allows a downstream user to back out just this bit if it causes some unexpected problem.
Comment 10 Mark Nottingham 2020-03-12 22:50:50 UTC
> Following the list of headers Gecko doesn't update on a 304

Please don't do this; it follows and reinforces demonstrably bad practice based upon misinterpretation of the specification.

Please participate in the standards discussion and implement the resolution of that.
Comment 11 Ruediger Pluem 2020-03-13 07:37:42 UTC
(In reply to Mark Nottingham from comment #10)
> > Following the list of headers Gecko doesn't update on a 304
> 
> Please don't do this; it follows and reinforces demonstrably bad practice
> based upon misinterpretation of the specification.
> 
> Please participate in the standards discussion and implement the resolution
> of that.

So your advice is that we shouldn't change any code until the standards discussion, where we can / should take part, comes to a conclusion what is the correct thing to do?
Comment 12 Mark Nottingham 2020-03-16 23:00:31 UTC
Yes. Hopefully, that will be soon.
Comment 13 Sebastian Kippe 2020-05-26 07:36:53 UTC
Looks like the discussion has been resolved:

https://github.com/httpwg/http-core/issues/165
Comment 14 Ruediger Pluem 2020-05-26 09:13:12 UTC
So removing

TE
Trailer
Transfer-Encoding
Upgrade
Content-Encoding
Content-Length
Content-MD5
Content-Range

should be enough?
mod_proxy_http already removes all headers that are in the connection header of the backend response and we set our own connection and keep-alive header.
Comment 15 Giovanni Bechis 2020-05-26 09:27:29 UTC
Shouldn't ETag be removed as well ?
Comment 16 Ruediger Pluem 2020-05-26 09:41:58 UTC
(In reply to Giovanni Bechis from comment #15)
> Shouldn't ETag be removed as well ?

ETag is not listed here:

https://github.com/httpwg/http-core/pull/337/files
Comment 17 Giovanni Bechis 2020-05-26 11:56:15 UTC
Maybe I am missing something but this is part of the code that has been committed:

Caches MUST-NOT update the following header fields:
Content-Encoding,
Content-Length,
Content-MD5 ("RFC2616#14.15),
Content-Range,
ETag.
Comment 18 Ruediger Pluem 2020-05-26 13:14:47 UTC
(In reply to Giovanni Bechis from comment #17)
> Maybe I am missing something but this is part of the code that has been
> committed:
> 
> Caches MUST-NOT update the following header fields:
> Content-Encoding,
> Content-Length,
> Content-MD5 ("RFC2616#14.15),
> Content-Range,
> ETag.

My bad :-(. I missed ETag. You are correct.
So remove

TE
Trailer
Transfer-Encoding
Upgrade
Content-Encoding
Content-Length
Content-MD5
Content-Range
ETag
Comment 19 Giovanni Bechis 2020-05-27 16:38:58 UTC
Created attachment 37269 [details]
Strip unwanted headers

Add a subprocess_env to restore old behaviour, uncertain about correct naming and about where this should be documented.
Comment 20 Ruediger Pluem 2020-05-28 07:06:52 UTC
I am not sure if we need to be able to provide the old behavior and if yes I think the name of the environment variable should be "better" (whatever that means, yes sweet naming discussions :-)).
Comment 21 Giovanni Bechis 2020-05-28 08:42:27 UTC
Honestly I do not think it makes sense to provide backward compatibility in this case because it would deviate from the standard.
I am all for committing just the portion of the code that strips the correct headers.
Comment 22 Giovanni Bechis 2020-07-01 13:22:35 UTC
Created attachment 37345 [details]
Strip unwanted headers without subprocess_env envvar

Strip unwanted headers without using a subprocess_env var to avoid this behaviour
Comment 23 Matt McCutchen 2020-09-07 15:22:06 UTC
Thanks to everyone who worked or is working on this issue!  While we wait for the fix to be committed and deployed to our web hosts (which I'm guessing may take years), here's a workaround I plan to use on my site:

For each request, compute a "salt" string that uniquely identifies the relevant response header values, and then emulate a resource whose ETag incorporates the salt.  Thus, if a request returns a salted ETag that is found in the client's cache, then the response headers in the cache are guaranteed to equal those that would have appeared in the actual response if Apache didn't strip them.  Since a last-modified time cannot accommodate salt, the If-Modified-Since request header must be dropped so that a 304 is not returned on that basis.

In general, the response headers and thus the salt could depend on both the server configuration and the request headers.  In most scenarios, it's probably easiest to represent the server configuration by an integer that is manually incremented for each relevant change.  Request headers can just be concatenated into the salt with any problematic characters suitably encoded.

Of course, we must ensure the client sends the request to the server in the first place rather than fulfilling it from cache.  This is independent of the current issue but probably worth reminding people about.  For a dependency on a request header, we just merge its name into the Vary response header.  For a change in server configuration, the best we can do is set the Cache-Control max-age to ensure clients eventually find out about the change.

Here's a code example for a .htaccess file where, if the Origin request header specifies an allowed origin, we want to set an equal Access-Control-Allow-Origin response header.  The server configuration revision is 42; we would increment the number if we changed the logic for generating response headers.  I'm assuming no browser sends an Origin header with nasty characters; this might need more research.  Please let me know if anything is wrong with this example!

~~~~~~~~
SetEnv HEADER_CONFIG_REV 42
RewriteRule ^ - [E=ETAG_SALT:%{ENV:HEADER_CONFIG_REV}-%{HTTP:Origin}]
Header merge Vary Origin
Header set Cache-Control max-age=1200

# Our goal is to strip the salt from the If-None-Match and If-Match headers if
# it equals the current salt.  However, %{ETAG_SALT}e expansion in the regular
# expression is not supported, so we use the following hack: append the current
# salt to all ETags, and then if an ETag ends in two equal salts (as determined
# by a regular expression with a \1 backreference), strip both of them.  ETags
# that did not originally end with the current salt will be left with at least
# one salt and won't be able to match the real ETag, so their presence does not
# matter.
RequestHeader edit* If-None-Match "([^\ ])\"" "$1+%{ETAG_SALT}e\""
RequestHeader edit* If-None-Match "(\+[^+\" ]*)\1\"" "\""
RequestHeader edit* If-Match "([^\ ])\"" "$1+%{ETAG_SALT}e\""
RequestHeader edit* If-Match "(\+[^+\" ]*)\1\"" "\""
RequestHeader unset If-Modified-Since

# Always add the salt to the ETag response header.
Header edit ETag "\"$" "+%{ETAG_SALT}e\""

RewriteCond %{HTTP:Origin} ^https://(.+\.)?example\.com$
RewriteRule ^ - [E=ACAO:%{HTTP:Origin}]
Header set Access-Control-Allow-Origin %{ACAO}e env=ACAO
~~~~~~~~

When calling a CGI, some of this processing could probably be done in the CGI,
but I haven't tested that.
Comment 24 Ruediger Pluem 2020-09-08 06:23:25 UTC
(In reply to Giovanni Bechis from comment #22)
> Created attachment 37345 [details]
> Strip unwanted headers without subprocess_env envvar
> 
> Strip unwanted headers without using a subprocess_env var to avoid this
> behaviour

Looks good. Can you commit?
Comment 25 Michael Kaufmann 2020-10-19 14:35:17 UTC
This has been fixed in http://svn.apache.org/r1881590 and http://svn.apache.org/r1881624 .

Please backport to 2.4.x
Comment 26 Mark Nottingham 2021-01-08 04:02:14 UTC
Note that the specification text has been updated to reflect subsequent discussion: see
  https://httpwg.org/http-core/draft-ietf-httpbis-cache-latest.html#update

I believe that Apache behaviour with this patch is conferment, but I haven't yet updated my tests.
Comment 27 Graham Leggett 2021-01-16 13:28:23 UTC
Backported to v2.4.47:

http://svn.apache.org/viewvc?rev=1885568&view=rev
Comment 28 Eric Covener 2021-04-29 11:29:40 UTC
Ivan has reported that we likely got this wrong.

The context of https://github.com/httpwg/http-core/pull/337/files is:
- titled "Freshening Stored Responses upon Validation"
- is about what caches do "When a cache receives a 304" and how they change what's stored.

But we aren't freshening our stored response, we are sending the client a response -- and if a cache can receive a 304 w/ ETAG in the first place then we are obviously meant to be able to send such a response.
Comment 29 Ruediger Pluem 2021-04-29 11:39:36 UTC
(In reply to Eric Covener from comment #28)
> Ivan has reported that we likely got this wrong.
> 
> The context of https://github.com/httpwg/http-core/pull/337/files is:
> - titled "Freshening Stored Responses upon Validation"
> - is about what caches do "When a cache receives a 304" and how they change
> what's stored.
> 
> But we aren't freshening our stored response, we are sending the client a
> response -- and if a cache can receive a 304 w/ ETAG in the first place then
> we are obviously meant to be able to send such a response.

True, but can Mark or Roy please comment if and how we read the specs wrong here?
Comment 30 Eric Covener 2021-04-29 12:07:22 UTC
> obviously meant to be able to send such a response.

pre-emptive apologies for the "obviously" here, bad choice of wording -- i meant only then "it follows that..."
Comment 31 Eric Covener 2021-04-29 12:10:39 UTC
(In reply to Eric Covener from comment #28)
> Ivan has reported that we likely got this wrong.
> 
> The context of https://github.com/httpwg/http-core/pull/337/files is:

Another red flag, we are working from [a draft of) the cache spec (7234) instead of the conditional request spec (7232)
Comment 32 Roy T. Fielding 2021-04-29 16:41:28 UTC
Please see my note on the dev list. As I said earlier, the http_filter should not be filtering out header fields from 304 responses. That's code from the canned error response *generator* and should not be in the filter. That's what this issue is about.

The cache should adhere to cache requirements on updates when receiving a 304, which has nothing to do with this part of the code.
Comment 33 Ruediger Pluem 2021-04-30 20:09:43 UTC
r1889341 should address this in trunk and fix the regression.
Comment 34 Christophe JAILLET 2021-05-03 20:05:30 UTC
Hi,

I don't know if feasible, but integrating the tool run on https://cache-tests.fyi/ could be great.

At least, there is an explicit "HTTP cache must include ETag in a 304 Not Modified" which should have trigger the issue during testing.


I've not really looked into detail, but in the code repository ([1]), there is docker image, so maybe some kind of integration with or travis CI can be done?


[1]: https://github.com/http-tests/cache-tests
Comment 35 Mark Nottingham 2021-05-04 02:16:53 UTC
I don't know if you'd want to integrate it with CI, but it's possible to run automatically; e.g., Traffic Server is as part of their regression suite.
Comment 36 Ruediger Pluem 2021-06-09 13:38:43 UTC
Fixed in 2.4.48 as r1889764.