Summary: | Apache reverse-proxy returns 304 on non-conditional GET request | ||
---|---|---|---|
Product: | Apache httpd-2 | Reporter: | Jan Kirschner <JKirschner> |
Component: | mod_cache | Assignee: | Apache HTTPD Bugs Mailing List <bugs> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | cyril.rohr, nathan |
Priority: | P2 | Keywords: | FixedInTrunk, RFC |
Version: | 2.2.3 | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | Linux | ||
Bug Depends on: | |||
Bug Blocks: | 50195 |
Description
Jan Kirschner
2008-07-04 05:16:13 UTC
Works fine, when the server.aa sends Expires: xxxx in the 304 respose - proxy.aa returns 200 and the cached content. But still... I'm also experiencing this bug, using Apache/2.2.8 with mod_cache and mod_disk_cache over an AJP reverse proxy of an Apache Wicket application; Wicket does not set Expire headers when responding with 304 (which works fine without the cache on). Here is my TCP trace: ------- GET /resources/com.typeturner.Typeturner/image?name=kitchen1 HTTP/1.1 User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_4; en-us) AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.20.1 Cache-Control: max-age=0 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us Accept-Encoding: gzip, deflate Cookie: Typeturner_USER=nathan; Typeturner_AUTH=xxxx; JSESSIONID=xxxx Connection: keep-alive Host: technically.us HTTP/1.1 304 NOT_MODIFIED Date: Sun, 13 Jul 2008 13:23:28 GMT Server: Server: Jetty(6.0.x) Connection: Keep-Alive Keep-Alive: timeout=15, max=100 Vary: Accept-Encoding ------- However, this failing behavior only seems to occur when a query string is present in the URL. Otherwise, mod_cache seems to perform an unconditional GET on the proxied server, and pass along that response to the client: ------- GET /resources/org.apache.wicket.Application/eat-photo HTTP/1.1 User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_4; en-us) AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.20.1 Cache-Control: max-age=0 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us Accept-Encoding: gzip, deflate Cookie: Typeturner_USER=nathan; Typeturner_AUTH=xxxx; JSESSIONID=xxxx Connection: keep-alive Host: technically.us HTTP/1.1 200 OK Date: Sun, 13 Jul 2008 14:44:56 GMT Server: Server: Jetty(6.0.x) Expires: Sun, 20 Jul 2008 14:44:14 GMT Cache-Control: max-age=604800 Vary: Accept-Encoding Content-Encoding: gzip Last-Modified: Wed, 09 Jul 2008 12:22:57 GMT Content-Length: 40831 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: image/jpeg [....] ------- To work around the problem, I've patched Wicket to set an Expires header when responding with 304 and I'll offer that to the project as an improvement, but I do think mod_cache is behaving incorrectly. (In reply to comment #1) > Works fine, when the server.aa sends Expires: xxxx in the 304 respose - > proxy.aa returns 200 and the cached content. But still... > Do your requests contain query strings like the ones of the second reporter? (In reply to comment #2) > I'm also experiencing this bug, using Apache/2.2.8 with mod_cache and > mod_disk_cache over an AJP reverse proxy of an Apache Wicket application; > Wicket does not set Expire headers when responding with 304 (which works fine > without the cache on). Here is my TCP trace: > This is somehow an error of Wicket. Why? Because requests with query strings are only cacheable if the response contains an expires date. So I can only think of the following situation in your case: 1. Request with query string is not cached 2. Response is 200 and contains expire date => Request/response is cached. 3. Subsequent request within the expiration time are served without problems as mod_cache does not sent conditional requests to the backend. 4. After the expiration date if passed mod_cache sents a *conditional* request to the backend even if the original request was unconditional. 5. The response is 304 *without* expiration date, which tells the cache that the cached response is unusable and needs to be ejected. This is an error of wicket as a 304 response should contain all headers that are relevant for caching. The cache is in a problematic situation here: 1. Sending a 304 is wrong, because the original request was unconditional. 2. Serving the contents from the cache is not allowed since the backend told the cache that the current contents of the cache is invalid without delivering new content. (Jan does appear to be requesting also with a query string, http://server.aa/creative?bannerId=618167&seqNo=4&set=sec) Yes, that is the sequence I have figured. But I'm not sure that the 304 response without an Expires header is strictly erroneous. If that is clear in the protocol I can file it as a bug in Wicket instead of an improvement, but I haven't been able to find anything definitive. I don't see how a plain 304 "tells the cache that the cached response is unusable and needs to be ejected". The cache has asked the back end if the resource has changed since a date. The answer is no, and the cache would do better to serve what it has. Relaying the 304 brings about a definite client error condition that would not have occurred without the cache. (In reply to comment #5) > (Jan does appear to be requesting also with a query string, > http://server.aa/creative?bannerId=618167&seqNo=4&set=sec) > > Yes, that is the sequence I have figured. But I'm not sure that the 304 > response without an Expires header is strictly erroneous. If that is clear in > the protocol I can file it as a bug in Wicket instead of an improvement, but I > haven't been able to find anything definitive. > > I don't see how a plain 304 "tells the cache that the cached response is > unusable and needs to be ejected". The cache has asked the back end if the This is because of 13.9 in RFC2616 > resource has changed since a date. The answer is no, and the cache would do The resource hasn't changed, but its metadata has as no expires header is delivered anymore. I know about 13.9, but I don't see where it says that all previous headers must be discarded. I do see that they must be updated if present, in 10.3.5: "If a cache uses a received 304 response to update a cache entry, the cache MUST update the entry to reflect any new field values given in the response." But there aren't any new field values given. Section 13.9.4, in talking about max-age=0 which is specifically where I'm having the problem, says simply "If the server replies with 304 (Not Modified), then the cache can return its now validated copy to the client with a 200 (OK) response." There is (always) some ambiguity, but I don't see why mod_cache would choose to interpret the back-end server's plain 304 as a request to nullify the cached Expires header to make the resource retroactively uncacheable. And even if that were the best way to read the situation, 10.3.5 tells you how to recover from a similar problem: "If a 304 response indicates an entity not currently cached, then the cache MUST disregard the response and repeat the request without the conditional." (In reply to comment #7) > I know about 13.9, but I don't see where it says that all previous headers must > be discarded. I do see that they must be updated if present, in 10.3.5: "If a Ok I guess I wasn't clear in my explanation. I did not say that previous headers must be discarded. The only reason I can image why the cache did a conditional request on the backend in this situation is because the cached entity passed the expiration date given in the cached response expires header. So the initial check of the cache on the entity reveals that it is not fresh enough any longer to serve directly and the freshness check with the backend delivers a response that makes this object no longer cachable as there is no expiration date for a request with args. > cache uses a received 304 response to update a cache entry, the cache MUST > update the entry to reflect any new field values given in the response." But > there aren't any new field values given. > > Section 13.9.4, in talking about max-age=0 which is specifically where I'm > having the problem, says simply "If the server replies with 304 (Not Modified), > then the cache can return its now validated copy to the client with a 200 (OK) > response." > > There is (always) some ambiguity, but I don't see why mod_cache would choose to > interpret the back-end server's plain 304 as a request to nullify the cached > Expires header to make the resource retroactively uncacheable. And even if that > were the best way to read the situation, 10.3.5 tells you how to recover from a > similar problem: "If a 304 response indicates an entity not currently cached, > then the cache MUST disregard the response and repeat the request without the > conditional." > This seems like a solution, but I guess it might be difficult to implement. So probably expect no quick solution when going down this path. "The only reason I can image why the cache did a conditional request on the backend in this situation is because the cached entity passed the expiration date given in the cached response expires header." It hasn't passed the expiration time it my testing. I would say it's requesting a refresh because Safari specified "max-age:0", and mod_cache is discarding the known, future expiration time. That is the first thing I would fix, and then the recovery procedure for when the expiration date has passed. (I never expect quick resolutions, or any resolutions, unless I'm providing a patch and it's small, so no worries there. I'm just trying to document the existing behavior and determine what would be the best correct behavior, for other people that run into the problem or that might want to fix it.) (In reply to comment #3) > > Do your requests contain query strings like the ones of the second reporter? > Yes they do. ... There is a mistake in my (manually modified) http trace - all request strings were the same. Sorry for long response time - I was on my holiday last week. This has nothing to do with query strings. It is as stated already: 1) the user submits a normal request but the item in ache is expired 2) mod_cache submits a conditional request to the upstream server looking for a new version 3) the upstream server responds with a 304 Not Modified, but then also includes headers that indicate this previously cached item is no longer cacheable. Either because the expires haven't been updated, the cache rules have changed, etc. 4) mod_cache checks for cacheability of the response, and if it cannot be cached (i.e., if 'reason' is set in the cache save filter), then it serves the upstream response directly, with no regard for whether the original request was conditional or not. As stated, the best options are to either a) serve the content as-is from cache, then purge it so subsequent requests come from the origin, or b) re-request from the upstream server without the conditional headers, and pass on the full result to the user. b) is probably the most standards-compliant. I have chosen to do a) for now. By not removing the cache_remove filter, the item is removed from cache after being served for this request. I can say that not fixing this makes mod_cache unusable in production, as it is never ok to return a 304 Not Modified / blank body to a non-conditional request, under any standard. This is a bad bug, unconfined to query strings not returning expires, or any other scenario thus far presented. It is perfectly reasonable to expect that an upstream server may change the cache rules for a piece of content, and it needs to be able to handle that on revalidation without creating an error for the user. Here is debug output. The scenario is that the cached item is cached, but expired, and the upstream server is responding with an Expires header in the past. The dates of the cached item were specifically altered for this test to be in the past, as well as the response from the upstream, but this spawned from real-world scenarios. * Please note, the line numbers will not directly correspond to line numbers in httpd repository. This server is running a modified 2.2.11. [Tue Apr 28 04:39:30 2009] [debug] mod_disk_cache.c(1217): disk_cache: Recalled status for cached URL http://host.com:80/content.file? from file /path/cache/path.header [Tue Apr 28 04:39:30 2009] [debug] mod_disk_cache.c(1758): disk_cache: Recalled headers for URL http://host.com:80/content.file? [Tue Apr 28 04:39:30 2009] [debug] cache_storage.c(361): Cached response for /content.file isn't fresh. Adding/replacing conditional request headers. [Tue Apr 28 04:39:30 2009] [debug] mod_cache.c(690): Adding CACHE_SAVE filter for /content.file [Tue Apr 28 04:39:30 2009] [debug] mod_cache.c(703): Adding CACHE_REMOVE_URL filter for /content.file [Tue Apr 28 04:39:30 2009] [debug] mod_proxy_balancer.c(46): proxy: BALANCER: *** [Tue Apr 28 04:39:30 2009] [debug] mod_proxy_balancer.c(1301): proxy: *** [Tue Apr 28 04:39:30 2009] [debug] mod_proxy_balancer.c(1369): proxy *** [Tue Apr 28 04:39:30 2009] [debug] mod_proxy.c(1006): Running scheme balancer handler (attempt 0) [Tue Apr 28 04:39:30 2009] [debug] mod_proxy_http.c(2135): proxy: HTTP: serving URL http://upstream/content.file [Tue Apr 28 04:39:30 2009] [debug] proxy_util.c(1999): proxy: HTTP: has acquired connection for (upstream) [Tue Apr 28 04:39:30 2009] [debug] proxy_util.c(2055): proxy: connecting http://upstream/content.file to upstream:80 [Tue Apr 28 04:39:30 2009] [debug] proxy_util.c(2153): proxy: connected /contet.file to upstream:80 [Tue Apr 28 04:39:30 2009] [debug] proxy_util.c(2308): proxy: HTTP: fam 2 socket created to connect to upstream [Tue Apr 28 04:39:30 2009] [debug] proxy_util.c(2414): proxy: HTTP: connection complete to upsteam:80 (upstream) [Tue Apr 28 04:39:30 2009] [debug] mod_proxy_http.c(2005): proxy: header only [Tue Apr 28 04:39:30 2009] [debug] mod_cache.c(1214): cache: /contet.file not cached. Reason: Expires header already expired, not cacheable [Tue Apr 28 04:39:30 2009] [debug] mod_cache.c(1687): in CACHE_REMOVE_URL filter, and there is a cache [Tue Apr 28 04:39:30 2009] [debug] mod_disk_cache.c(88): ** removed item ** [Tue Apr 28 04:39:30 2009] [debug] proxy_util.c(2017): proxy: HTTP: has released connection for (upstream) [Tue Apr 28 04:39:30 2009] [debug] mod_proxy.c(1053): proxy_handler(cleanup): access_status: 0, r->status: 304 And the request: wget --header --server-response http://server/content.file --04:28:56-- http://server/content.file => `content.file' Resolving server... ***.***.***.*** Connecting to server|***.***.***.***|:80... connected. HTTP request sent, awaiting response... HTTP/1.1 304 Not Modified Date: Tue, 28 Apr 2009 08:22:33 GMT Server: ** upstream server tokens ** Connection: Keep-Alive Keep-Alive: timeout=5, max=100 ETag: "** etag **" Expires: Mon, 06 Apr 2009 08:07:45 GMT Cache-Control: max-age=259 04:28:56 ERROR 304: Not Modified. It seems this behaviour is triggered when a 304 Not Modified response arrives that contains headers to say that the response isn't cacheable according to the RFC rules. The cacheability check of the response kicks in, says "this (304) is not cacheable, just pass this response through as is", and that's how the 304 gets out. This situation could arise if the origin server has in the mean time changed its mind about whether something is cachable (it was cacheable in the past, now it isn't), or if a broken origin server declares that 2xx responses are cacheable but 304 responses aren't (a contradiction in terms). Neither of these two happen often, which is probably why this problem is so rare in the wild. Given that it isn't practical to restart the request again, it is probably a sensible compromise to pretend we couldn't connect to the backend at all, and return cached content for this request marked with a Warning to say the content is stale (which is what we are allowed to do as per RFC2616), and then we delete this URL from the cache, respecting the headers on the 304. In the case where "s-maxage" or "must-revalidate" has been specified, and we can't return a warning, we delete the URL from the cache, and then could potentially return a redirect back to ourselves, in the process asking the client to try again. Will need to double check this logic against RFC2616. Fixed in r1001884 on httpd-trunk. Proposed for backport to v2.2. Adding dependency for completeness. Backported to 2.2.x as r1068313. |