Summary: | 304 headers stripped | ||
---|---|---|---|
Product: | Apache httpd-2 | Reporter: | Mark Nottingham <mnot> |
Component: | Core | Assignee: | Apache HTTPD Bugs Mailing List <bugs> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | andreas.luik, ben.rubson, covener, fielding, herr.ernst, ivan, matt, sebastian, simon.oberhammer |
Priority: | P2 | Keywords: | FixedInTrunk, PatchAvailable |
Version: | 2.5-HEAD | ||
Target Milestone: | --- | ||
Hardware: | All | ||
OS: | All | ||
Attachments: |
Strip only unwanted headers
Strip unwanted headers Strip unwanted headers without subprocess_env envvar |
Description
Mark Nottingham
2017-11-26 23:19:24 UTC
Hi Mark, I think some of the lack of action on this is related to this text: https://tools.ietf.org/html/rfc7232#section-4.1 The server generating a 304 response MUST generate any of the following header fields that would have been sent in a 200 (OK) response to the same request: Cache-Control, Content-Location, Date, ETag, Expires, and Vary. ... Since the goal of a 304 response is to minimize information transfer when the recipient already has one or more cached representations, a sender SHOULD NOT generate representation metadata other than the above listed fields unless said metadata exists for the purpose of guiding cache updates (e.g., Last-Modified might be useful if the response does not have an ETag field). 7230 lists a few "representation metadata" but I'm not sure it's meant to be exhaustive, so it's difficult to whitelist or blacklist it. Any advice on how we differentiate here? 3.1. Representation Metadata Representation header fields provide metadata about the representation. When a message includes a payload body, the representation header fields describe how to interpret the representation data enclosed in the payload body. In a response to a HEAD request, the representation header fields describe the representation data that would have been enclosed in the payload body if the same request had been a GET. The following header fields convey representation metadata: +-------------------+-----------------+ | Header Field Name | Defined in... | +-------------------+-----------------+ | Content-Type | Section 3.1.1.5 | | Content-Encoding | Section 3.1.2.2 | | Content-Language | Section 3.1.3.2 | | Content-Location | Section 3.1.4.2 | +-------------------+-----------------+ The key word here is "generate." The problem is that when the 304 is being generated elsewhere -- whether by a CGI handler or by an upstream server -- it has authority to decide what headers to include. I think this could be addressed by removing the code here and imposing this kind of filtering (if necessary) where Apache actually generates a 304 as part of handling a conditional request itself (e.g., from cache, from disk, but not from CGI, etc.). I can confirm that this is a bug. The original code that I wrote (in 1997) for the canned response generator (ap_send_error_response) was moved in 2000 to the generic http filter in a mistaken attempt to solve an unrelated bug in 2.0a streams. RFC2616 did not distinguish between sending and generating, so this used to be a reasonable attempt to adhere to a SHOULD requirement only on responses that the core generated itself. It has no business being applied to all 304 responses. I think the best solution right now is to simply remove the entire conditional that attempts to reduce 304 responses. If that results in too much data being sent, we should have a configurable list of fields to exclude on 304 responses rather than a fixed table in the source code. Eliminating the filter may leave us exposed... for all r->header_only and AP_STATUS_HAS_BODY(r->status) requests, reading rfc7230 3.3.1 and 3.3.2, it would be best to still drop Transfer-Encoding if we encounter it, it is actively disallowed in 1xx, 204, and of no apparent benefit to 304 requests. Content-Length can be preserved. (In reply to William A. Rowe Jr. from comment #4) > Eliminating the filter may leave us exposed... > > for all r->header_only and AP_STATUS_HAS_BODY(r->status) requests, reading > rfc7230 3.3.1 and 3.3.2, it would be best to still drop Transfer-Encoding > if we encounter it, it is actively disallowed in 1xx, 204, and of no apparent > benefit to 304 requests. Content-Length can be preserved. Specifically, I think we want to apply this test to drop Transfer-Encoding when it is encountered, either here where Roy identified the foolish filtering, or at some later point prior to transmission; if (r->header_only || AP_STATUS_IS_HEADER_ONLY(r->status)) drop T-E; The HTTP WG is discussing the exact list of headers to strip here: https://github.com/httpwg/http-core/issues/165 Please join in. Any progress on this? We have the problem that Apache httpd (version 2.4.x) used as a reverse proxy removes CORS headers (with a 304 repsonse), resulting in browser access errors. This makes Apache https useless as a reverse proxy with modern web applications. Am I missing something? (We are switching to nginx, which does pass the headers as expected ...) Created attachment 37097 [details]
Strip only unwanted headers
Following the list of headers Gecko doesn't update on a 304,
the following patch strips only unwanted headers.
Comment on attachment 37097 [details]
Strip only unwanted headers
wdyt about an subprocess_env envvar that allows the old behavior? Even if default behavior changes this allows a downstream user to back out just this bit if it causes some unexpected problem.
> Following the list of headers Gecko doesn't update on a 304
Please don't do this; it follows and reinforces demonstrably bad practice based upon misinterpretation of the specification.
Please participate in the standards discussion and implement the resolution of that.
(In reply to Mark Nottingham from comment #10) > > Following the list of headers Gecko doesn't update on a 304 > > Please don't do this; it follows and reinforces demonstrably bad practice > based upon misinterpretation of the specification. > > Please participate in the standards discussion and implement the resolution > of that. So your advice is that we shouldn't change any code until the standards discussion, where we can / should take part, comes to a conclusion what is the correct thing to do? Yes. Hopefully, that will be soon. Looks like the discussion has been resolved: https://github.com/httpwg/http-core/issues/165 So removing TE Trailer Transfer-Encoding Upgrade Content-Encoding Content-Length Content-MD5 Content-Range should be enough? mod_proxy_http already removes all headers that are in the connection header of the backend response and we set our own connection and keep-alive header. Shouldn't ETag be removed as well ? (In reply to Giovanni Bechis from comment #15) > Shouldn't ETag be removed as well ? ETag is not listed here: https://github.com/httpwg/http-core/pull/337/files Maybe I am missing something but this is part of the code that has been committed: Caches MUST-NOT update the following header fields: Content-Encoding, Content-Length, Content-MD5 ("RFC2616#14.15), Content-Range, ETag. (In reply to Giovanni Bechis from comment #17) > Maybe I am missing something but this is part of the code that has been > committed: > > Caches MUST-NOT update the following header fields: > Content-Encoding, > Content-Length, > Content-MD5 ("RFC2616#14.15), > Content-Range, > ETag. My bad :-(. I missed ETag. You are correct. So remove TE Trailer Transfer-Encoding Upgrade Content-Encoding Content-Length Content-MD5 Content-Range ETag Created attachment 37269 [details]
Strip unwanted headers
Add a subprocess_env to restore old behaviour, uncertain about correct naming and about where this should be documented.
I am not sure if we need to be able to provide the old behavior and if yes I think the name of the environment variable should be "better" (whatever that means, yes sweet naming discussions :-)). Honestly I do not think it makes sense to provide backward compatibility in this case because it would deviate from the standard. I am all for committing just the portion of the code that strips the correct headers. Created attachment 37345 [details]
Strip unwanted headers without subprocess_env envvar
Strip unwanted headers without using a subprocess_env var to avoid this behaviour
Thanks to everyone who worked or is working on this issue! While we wait for the fix to be committed and deployed to our web hosts (which I'm guessing may take years), here's a workaround I plan to use on my site: For each request, compute a "salt" string that uniquely identifies the relevant response header values, and then emulate a resource whose ETag incorporates the salt. Thus, if a request returns a salted ETag that is found in the client's cache, then the response headers in the cache are guaranteed to equal those that would have appeared in the actual response if Apache didn't strip them. Since a last-modified time cannot accommodate salt, the If-Modified-Since request header must be dropped so that a 304 is not returned on that basis. In general, the response headers and thus the salt could depend on both the server configuration and the request headers. In most scenarios, it's probably easiest to represent the server configuration by an integer that is manually incremented for each relevant change. Request headers can just be concatenated into the salt with any problematic characters suitably encoded. Of course, we must ensure the client sends the request to the server in the first place rather than fulfilling it from cache. This is independent of the current issue but probably worth reminding people about. For a dependency on a request header, we just merge its name into the Vary response header. For a change in server configuration, the best we can do is set the Cache-Control max-age to ensure clients eventually find out about the change. Here's a code example for a .htaccess file where, if the Origin request header specifies an allowed origin, we want to set an equal Access-Control-Allow-Origin response header. The server configuration revision is 42; we would increment the number if we changed the logic for generating response headers. I'm assuming no browser sends an Origin header with nasty characters; this might need more research. Please let me know if anything is wrong with this example! ~~~~~~~~ SetEnv HEADER_CONFIG_REV 42 RewriteRule ^ - [E=ETAG_SALT:%{ENV:HEADER_CONFIG_REV}-%{HTTP:Origin}] Header merge Vary Origin Header set Cache-Control max-age=1200 # Our goal is to strip the salt from the If-None-Match and If-Match headers if # it equals the current salt. However, %{ETAG_SALT}e expansion in the regular # expression is not supported, so we use the following hack: append the current # salt to all ETags, and then if an ETag ends in two equal salts (as determined # by a regular expression with a \1 backreference), strip both of them. ETags # that did not originally end with the current salt will be left with at least # one salt and won't be able to match the real ETag, so their presence does not # matter. RequestHeader edit* If-None-Match "([^\ ])\"" "$1+%{ETAG_SALT}e\"" RequestHeader edit* If-None-Match "(\+[^+\" ]*)\1\"" "\"" RequestHeader edit* If-Match "([^\ ])\"" "$1+%{ETAG_SALT}e\"" RequestHeader edit* If-Match "(\+[^+\" ]*)\1\"" "\"" RequestHeader unset If-Modified-Since # Always add the salt to the ETag response header. Header edit ETag "\"$" "+%{ETAG_SALT}e\"" RewriteCond %{HTTP:Origin} ^https://(.+\.)?example\.com$ RewriteRule ^ - [E=ACAO:%{HTTP:Origin}] Header set Access-Control-Allow-Origin %{ACAO}e env=ACAO ~~~~~~~~ When calling a CGI, some of this processing could probably be done in the CGI, but I haven't tested that. (In reply to Giovanni Bechis from comment #22) > Created attachment 37345 [details] > Strip unwanted headers without subprocess_env envvar > > Strip unwanted headers without using a subprocess_env var to avoid this > behaviour Looks good. Can you commit? This has been fixed in http://svn.apache.org/r1881590 and http://svn.apache.org/r1881624 . Please backport to 2.4.x Note that the specification text has been updated to reflect subsequent discussion: see https://httpwg.org/http-core/draft-ietf-httpbis-cache-latest.html#update I believe that Apache behaviour with this patch is conferment, but I haven't yet updated my tests. Backported to v2.4.47: http://svn.apache.org/viewvc?rev=1885568&view=rev Ivan has reported that we likely got this wrong. The context of https://github.com/httpwg/http-core/pull/337/files is: - titled "Freshening Stored Responses upon Validation" - is about what caches do "When a cache receives a 304" and how they change what's stored. But we aren't freshening our stored response, we are sending the client a response -- and if a cache can receive a 304 w/ ETAG in the first place then we are obviously meant to be able to send such a response. (In reply to Eric Covener from comment #28) > Ivan has reported that we likely got this wrong. > > The context of https://github.com/httpwg/http-core/pull/337/files is: > - titled "Freshening Stored Responses upon Validation" > - is about what caches do "When a cache receives a 304" and how they change > what's stored. > > But we aren't freshening our stored response, we are sending the client a > response -- and if a cache can receive a 304 w/ ETAG in the first place then > we are obviously meant to be able to send such a response. True, but can Mark or Roy please comment if and how we read the specs wrong here? > obviously meant to be able to send such a response.
pre-emptive apologies for the "obviously" here, bad choice of wording -- i meant only then "it follows that..."
(In reply to Eric Covener from comment #28) > Ivan has reported that we likely got this wrong. > > The context of https://github.com/httpwg/http-core/pull/337/files is: Another red flag, we are working from [a draft of) the cache spec (7234) instead of the conditional request spec (7232) Please see my note on the dev list. As I said earlier, the http_filter should not be filtering out header fields from 304 responses. That's code from the canned error response *generator* and should not be in the filter. That's what this issue is about. The cache should adhere to cache requirements on updates when receiving a 304, which has nothing to do with this part of the code. r1889341 should address this in trunk and fix the regression. Hi, I don't know if feasible, but integrating the tool run on https://cache-tests.fyi/ could be great. At least, there is an explicit "HTTP cache must include ETag in a 304 Not Modified" which should have trigger the issue during testing. I've not really looked into detail, but in the code repository ([1]), there is docker image, so maybe some kind of integration with or travis CI can be done? [1]: https://github.com/http-tests/cache-tests I don't know if you'd want to integrate it with CI, but it's possible to run automatically; e.g., Traffic Server is as part of their regression suite. Fixed in 2.4.48 as r1889764. |