Bug 45341

Summary: Apache reverse-proxy returns 304 on non-conditional GET request
Product: Apache httpd-2 Reporter: Jan Kirschner <JKirschner>
Component: mod_cacheAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: RESOLVED FIXED    
Severity: normal CC: cyril.rohr, nathan
Priority: P2 Keywords: FixedInTrunk, RFC
Version: 2.2.3   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Bug Depends on:    
Bug Blocks: 50195    

Description Jan Kirschner 2008-07-04 05:16:13 UTC
I'm running apache 2.2.3 as a caching reverse proxy (proxy.aa) to another server (server.aa). The problem appears when sending requests without If-Modified-Since present - the response shouldn't be 304.

First, I send a request:
telnet proxy.aa 8312
GET http://server.aa/request HTTP/1.1
Host: server.aa

the proxy forwards to the server:

GET /request HTTP/1.1

Host: server.aa

Max-Forwards: 10

X-Forwarded-For: a.a.a.a

X-Forwarded-Host: server.aa

X-Forwarded-Server: proxy.aa

Connection: Keep-Alive



and receives (still everything OK):
HTTP/1.1 200 OK

Date: Fri, 04 Jul 2008 08:50:15 GMT

Expires: Fri, 04 Jul 2008 08:52:25 GMT

Last-Modified: Fri, 04 Jul 2008 08:43:47 GMT

Cache-Control: max-age=130

Server: Server/2.3.1-1 (Unix, VB)

Allow: GET,HEAD

Accept-Ranges: none

Content-Type: image/jpeg; charset=utf-8

Content-Length: 3102

Connection: close
..... THE CONTENT ....

So the situation is: the proxy cached this request with max age 130s (2:10) and for every following 'request' doesn't forward and uses the cached result for 2 minutes. but than I send another request:

telnet proxy.aa 8312
GET http://server.aa/creative?bannerId=618167&seqNo=4&set=sec HTTP/1.1
Host: server.aa

and proxy.aa forwards to server.aa:

GET /request HTTP/1.1

Host: server.aa

If-Modified-Since: Fri, 04 Jul 2008 08:43:47 GMT

Max-Forwards: 10

X-Forwarded-For: 10.0.2.236

X-Forwarded-Host: server.aa

X-Forwarded-Server: proxy.aa

Connection: Keep-Alive



the server returns 304, because the content is still the same:

HTTP/1.1 304 Not modified

Date: Fri, 04 Jul 2008 08:52:25 GMT

Server: Server/2.3.1-1 (Unix, VB)

Allow: GET,HEAD

Accept-Ranges: none

Connection: close

and I receive:

HTTP/1.1 304 Not modified
Date: Fri, 04 Jul 2008 08:52:37 GMT
Server: Server/2.3.1-1 (Unix, VB)

Which is, I think, an error - from the client point of view. Client sends a non-conditional request and receives 304 - not modified. How is this possible?

The only idea, I've got is, that it is caused by missing Expires: or Cache-Control: max-age= in response from server.aa. But still, a 304 response on request without If-Modified-Since command.

The proxy (incomplete) configuration:
Listen 8312

LoadModule authz_default_module /usr/lib/apache2/modules/mod_authz_default.so
LoadModule authz_host_module /usr/lib/apache2/modules/mod_authz_host.so
LoadModule cache_module /usr/lib/apache2/modules/mod_cache.so
LoadModule disk_cache_module /usr/lib/apache2/modules/mod_disk_cache.so
LoadModule mime_module /usr/lib/apache2/modules/mod_mime.so
LoadModule mime_magic_module /usr/lib/apache2/modules/mod_mime_magic.so
LoadModule proxy_module /usr/lib/apache2/modules/mod_proxy.so
LoadModule proxy_connect_module /usr/lib/apache2/modules/mod_proxy_connect.so
LoadModule proxy_http_module /usr/lib/apache2/modules/mod_proxy_http.so
LoadModule setenvif_module /usr/lib/apache2/modules/mod_setenvif.so
LoadModule forwardedfor_module /usr/lib/apache2/modules/mod_forwardedfor.so

UseCanonicalName Off

CacheRoot /home/xxxxxxxx
CacheEnable disk /creative
CacheDirLevels 5
CacheDirLength 3
CacheDefaultExpire 3600

ProxyTimeout 1


NameVirtualHost *:8312

<VirtualHost *:8312>
    ProxyPass / http://server.aa:80/
</VirtualHost>
Comment 1 Jan Kirschner 2008-07-04 06:37:58 UTC
Works fine, when the server.aa sends Expires: xxxx in the 304 respose - proxy.aa returns 200 and the cached content. But still...
Comment 2 Nathan Hamblen 2008-07-13 10:34:26 UTC
I'm also experiencing this bug, using Apache/2.2.8  with mod_cache and mod_disk_cache over an AJP reverse proxy of an Apache Wicket application; Wicket does not set Expire headers when responding with 304 (which works fine without the cache on). Here is my TCP trace:

-------
GET /resources/com.typeturner.Typeturner/image?name=kitchen1 HTTP/1.1
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_4; en-us) AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.20.1
Cache-Control: max-age=0
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Cookie: Typeturner_USER=nathan; Typeturner_AUTH=xxxx; JSESSIONID=xxxx
Connection: keep-alive
Host: technically.us

HTTP/1.1 304 NOT_MODIFIED
Date: Sun, 13 Jul 2008 13:23:28 GMT
Server: Server: Jetty(6.0.x)
Connection: Keep-Alive
Keep-Alive: timeout=15, max=100
Vary: Accept-Encoding
-------

However, this failing behavior only seems to occur when a query string is present in the URL. Otherwise, mod_cache seems to perform an unconditional GET on the proxied server, and pass along that response to the client:

-------
GET /resources/org.apache.wicket.Application/eat-photo HTTP/1.1
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_4; en-us) AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.20.1
Cache-Control: max-age=0
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Cookie: Typeturner_USER=nathan; Typeturner_AUTH=xxxx; JSESSIONID=xxxx
Connection: keep-alive
Host: technically.us

HTTP/1.1 200 OK
Date: Sun, 13 Jul 2008 14:44:56 GMT
Server: Server: Jetty(6.0.x)
Expires: Sun, 20 Jul 2008 14:44:14 GMT
Cache-Control: max-age=604800
Vary: Accept-Encoding
Content-Encoding: gzip
Last-Modified: Wed, 09 Jul 2008 12:22:57 GMT
Content-Length: 40831
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: image/jpeg
[....]
-------

To work around the problem, I've patched Wicket to set an Expires header when responding with 304 and I'll offer that to the project as an improvement, but I do think mod_cache is behaving incorrectly.
Comment 3 Ruediger Pluem 2008-07-13 12:06:19 UTC
(In reply to comment #1)
> Works fine, when the server.aa sends Expires: xxxx in the 304 respose -
> proxy.aa returns 200 and the cached content. But still...
> 

Do your requests contain query strings like the ones of the second reporter?
Comment 4 Ruediger Pluem 2008-07-13 12:17:34 UTC
(In reply to comment #2)
> I'm also experiencing this bug, using Apache/2.2.8  with mod_cache and
> mod_disk_cache over an AJP reverse proxy of an Apache Wicket application;
> Wicket does not set Expire headers when responding with 304 (which works fine
> without the cache on). Here is my TCP trace:
> 

This is somehow an error of Wicket. Why?
Because requests with query strings are only cacheable if the response contains an expires date. So I can only think of the following situation in your case:

1. Request with query string is not cached
2. Response is 200 and contains expire date => Request/response is cached.
3. Subsequent request within the expiration time are served without problems
   as mod_cache does not sent conditional requests to the backend.
4. After the expiration date if passed mod_cache sents a *conditional* request to  
   the backend even if the original request was unconditional.
5. The response is 304 *without* expiration date, which tells the cache that the 
   cached response is unusable and needs to be ejected.

This is an error of wicket as a 304 response should contain all headers that are relevant for caching.

The cache is in a problematic situation here:

1. Sending a 304 is wrong, because the original request was unconditional.
2. Serving the contents from the cache is not allowed since the backend told the
   cache that the current contents of the cache is invalid without delivering
   new content.
Comment 5 Nathan Hamblen 2008-07-13 14:44:20 UTC
(Jan does appear to be requesting also with a query string, http://server.aa/creative?bannerId=618167&seqNo=4&set=sec)

Yes, that is the sequence I have figured. But I'm not sure that the 304 response without an Expires header is strictly erroneous. If that is clear in the protocol I can file it as a bug in Wicket instead of an improvement, but I haven't been able to find anything definitive.

I don't see how a plain 304 "tells the cache that the cached response is unusable and needs to be ejected". The cache has asked the back end if the resource has changed since a date. The answer is no, and the cache would do better to serve what it has. Relaying the 304 brings about a definite client error condition that would not have occurred without the cache.
Comment 6 Ruediger Pluem 2008-07-14 00:21:10 UTC
(In reply to comment #5)
> (Jan does appear to be requesting also with a query string,
> http://server.aa/creative?bannerId=618167&seqNo=4&set=sec)
> 
> Yes, that is the sequence I have figured. But I'm not sure that the 304
> response without an Expires header is strictly erroneous. If that is clear in
> the protocol I can file it as a bug in Wicket instead of an improvement, but I
> haven't been able to find anything definitive.
> 
> I don't see how a plain 304 "tells the cache that the cached response is
> unusable and needs to be ejected". The cache has asked the back end if the

This is because of 13.9 in RFC2616

> resource has changed since a date. The answer is no, and the cache would do

The resource hasn't changed, but its metadata has as no expires header is
delivered anymore.



Comment 7 Nathan Hamblen 2008-07-14 05:57:57 UTC
I know about 13.9, but I don't see where it says that all previous headers must be discarded. I do see that they must be updated if present, in 10.3.5: "If a cache uses a received 304 response to update a cache entry, the cache MUST update the entry to reflect any new field values given in the response." But there aren't any new field values given.

Section 13.9.4, in talking about max-age=0 which is specifically where I'm having the problem, says simply "If the server replies with 304 (Not Modified), then the cache can return its now validated copy to the client with a 200 (OK) response." 

There is (always) some ambiguity, but I don't see why mod_cache would choose to interpret the back-end server's plain 304 as a request to nullify the cached Expires header to make the resource retroactively uncacheable. And even if that were the best way to read the situation, 10.3.5 tells you how to recover from a similar problem: "If a 304 response indicates an entity not currently cached, then the cache MUST disregard the response and repeat the request without the conditional."
Comment 8 Ruediger Pluem 2008-07-14 08:49:33 UTC
(In reply to comment #7)
> I know about 13.9, but I don't see where it says that all previous headers must
> be discarded. I do see that they must be updated if present, in 10.3.5: "If a

Ok I guess I wasn't clear in my explanation. I did not say that previous headers must be discarded. The only reason I can image why the cache did a conditional request on the backend in this situation is because the cached entity passed the expiration date given in the cached response expires header.
So the initial check of the cache on the entity reveals that it is not fresh
enough any longer to serve directly and the freshness check with the backend
delivers a response that makes this object no longer cachable as there is no expiration date for a request with args.

> cache uses a received 304 response to update a cache entry, the cache MUST
> update the entry to reflect any new field values given in the response." But
> there aren't any new field values given.
> 
> Section 13.9.4, in talking about max-age=0 which is specifically where I'm
> having the problem, says simply "If the server replies with 304 (Not Modified),
> then the cache can return its now validated copy to the client with a 200 (OK)
> response." 
> 
> There is (always) some ambiguity, but I don't see why mod_cache would choose to
> interpret the back-end server's plain 304 as a request to nullify the cached
> Expires header to make the resource retroactively uncacheable. And even if that
> were the best way to read the situation, 10.3.5 tells you how to recover from a
> similar problem: "If a 304 response indicates an entity not currently cached,
> then the cache MUST disregard the response and repeat the request without the
> conditional."
> 

This seems like a solution, but I guess it might be difficult to implement. So probably expect no quick solution when going down this path.

Comment 9 Nathan Hamblen 2008-07-14 09:39:01 UTC
"The only reason I can image why the cache did a conditional request on the backend in this situation is because the cached entity passed the expiration date given in the cached response expires header."

It hasn't passed the expiration time it my testing. I would say it's requesting a refresh because Safari specified "max-age:0", and mod_cache is discarding the known, future expiration time. That is the first thing I would fix, and then the recovery procedure for when the expiration date has passed. (I never expect quick resolutions, or any resolutions, unless I'm providing a patch and it's small, so no worries there. I'm just trying to document the existing behavior and determine what would be the best correct behavior, for other people that run into the problem or that might want to fix it.)
Comment 10 Jan Kirschner 2008-07-21 00:32:47 UTC
(In reply to comment #3)
> 
> Do your requests contain query strings like the ones of the second reporter?
> 

Yes they do. ... There is a mistake in my (manually modified) http trace - all request strings were the same. Sorry for long response time - I was on my holiday last week.

Comment 11 Mike Venzke 2009-04-28 05:22:09 UTC
This has nothing to do with query strings.

It is as stated already:

1) the user submits a normal request but the item in ache is expired
2) mod_cache submits a conditional request to the upstream server looking for a new version
3) the upstream server responds with a 304 Not Modified, but then also includes
headers that indicate this previously cached item is no longer cacheable. Either
because the expires haven't been updated, the cache rules have changed, etc.
4) mod_cache checks for cacheability of the response, and if it cannot be cached (i.e., if 'reason' is set in the cache save filter), then it serves the upstream response directly, with no regard for whether the original request was conditional or not.

As stated, the best options are to either a) serve the content as-is from
cache, then purge it so subsequent requests come from the origin, or b)
re-request from the upstream server without the conditional headers, and pass on
the full result to the user.

b) is probably the most standards-compliant. I have chosen to do a) for now. By not removing the cache_remove filter, the item is removed from cache after being served for this request.

I can say that not fixing this makes mod_cache unusable in production, as it is never ok to return a 304 Not Modified / blank body to a non-conditional request, under any standard. This is a bad bug, unconfined to query strings not returning expires, or any other scenario thus far presented. It is perfectly reasonable to expect that an upstream server may change the cache rules for a piece of content, and it needs to be able to handle that on revalidation without creating an error for the user.

Here is debug output. The scenario is that the cached item is cached, but expired, and the upstream server is responding with an Expires header in the past. The dates of the cached item were specifically altered for this test to be in the past, as well as the response from the upstream, but this spawned from real-world scenarios.

* Please note, the line numbers will not directly correspond to line numbers in httpd repository. This server is running a modified 2.2.11.


[Tue Apr 28 04:39:30 2009] [debug] mod_disk_cache.c(1217): disk_cache: Recalled status for cached URL http://host.com:80/content.file? from file /path/cache/path.header
[Tue Apr 28 04:39:30 2009] [debug] mod_disk_cache.c(1758): disk_cache: Recalled headers for URL http://host.com:80/content.file?
[Tue Apr 28 04:39:30 2009] [debug] cache_storage.c(361): Cached response for /content.file isn't fresh.  Adding/replacing conditional request headers.
[Tue Apr 28 04:39:30 2009] [debug] mod_cache.c(690): Adding CACHE_SAVE filter for /content.file
[Tue Apr 28 04:39:30 2009] [debug] mod_cache.c(703): Adding CACHE_REMOVE_URL filter for /content.file
[Tue Apr 28 04:39:30 2009] [debug] mod_proxy_balancer.c(46): proxy: BALANCER: ***
[Tue Apr 28 04:39:30 2009] [debug] mod_proxy_balancer.c(1301): proxy: ***
[Tue Apr 28 04:39:30 2009] [debug] mod_proxy_balancer.c(1369): proxy ***
[Tue Apr 28 04:39:30 2009] [debug] mod_proxy.c(1006): Running scheme balancer handler (attempt 0)
[Tue Apr 28 04:39:30 2009] [debug] mod_proxy_http.c(2135): proxy: HTTP: serving URL http://upstream/content.file
[Tue Apr 28 04:39:30 2009] [debug] proxy_util.c(1999): proxy: HTTP: has acquired connection for (upstream)
[Tue Apr 28 04:39:30 2009] [debug] proxy_util.c(2055): proxy: connecting http://upstream/content.file to upstream:80
[Tue Apr 28 04:39:30 2009] [debug] proxy_util.c(2153): proxy: connected /contet.file to upstream:80
[Tue Apr 28 04:39:30 2009] [debug] proxy_util.c(2308): proxy: HTTP: fam 2 socket created to connect to upstream
[Tue Apr 28 04:39:30 2009] [debug] proxy_util.c(2414): proxy: HTTP: connection complete to upsteam:80 (upstream)
[Tue Apr 28 04:39:30 2009] [debug] mod_proxy_http.c(2005): proxy: header only
[Tue Apr 28 04:39:30 2009] [debug] mod_cache.c(1214): cache: /contet.file not cached. Reason: Expires header already expired, not cacheable
[Tue Apr 28 04:39:30 2009] [debug] mod_cache.c(1687): in CACHE_REMOVE_URL filter, and there is a cache
[Tue Apr 28 04:39:30 2009] [debug] mod_disk_cache.c(88): ** removed item **
[Tue Apr 28 04:39:30 2009] [debug] proxy_util.c(2017): proxy: HTTP: has released connection for (upstream)
[Tue Apr 28 04:39:30 2009] [debug] mod_proxy.c(1053): proxy_handler(cleanup): access_status: 0, r->status: 304


And the request:

wget --header --server-response http://server/content.file
--04:28:56--
http://server/content.file    => `content.file'
Resolving server... ***.***.***.***
Connecting to server|***.***.***.***|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 304 Not Modified
  Date: Tue, 28 Apr 2009 08:22:33 GMT
  Server: ** upstream server tokens **
  Connection: Keep-Alive
  Keep-Alive: timeout=5, max=100
  ETag: "** etag **" 
  Expires: Mon, 06 Apr 2009 08:07:45 GMT
  Cache-Control: max-age=259
04:28:56 ERROR 304: Not Modified.
Comment 12 Graham Leggett 2010-09-27 05:59:52 UTC
It seems this behaviour is triggered when a 304 Not Modified response arrives that contains headers to say that the response isn't cacheable according to the RFC rules.

The cacheability check of the response kicks in, says "this (304) is not cacheable, just pass this response through as is", and that's how the 304 gets out.

This situation could arise if the origin server has in the mean time changed its mind about whether something is cachable (it was cacheable in the past, now it isn't), or if a broken origin server declares that 2xx responses are cacheable but 304 responses aren't (a contradiction in terms). Neither of these two happen often, which is probably why this problem is so rare in the wild.

Given that it isn't practical to restart the request again, it is probably a sensible compromise to pretend we couldn't connect to the backend at all, and return cached content for this request marked with a Warning to say the content is stale (which is what we are allowed to do as per RFC2616), and then we delete this URL from the cache, respecting the headers on the 304.

In the case where "s-maxage" or "must-revalidate" has been specified, and we can't return a warning, we delete the URL from the cache, and then could potentially return a redirect back to ourselves, in the process asking the client to try again.

Will need to double check this logic against RFC2616.
Comment 13 Graham Leggett 2010-09-27 15:30:12 UTC
Fixed in r1001884 on httpd-trunk.
Comment 14 Graham Leggett 2010-10-18 18:44:06 UTC
Proposed for backport to v2.2.
Comment 15 Nick Kew 2010-11-11 11:34:08 UTC
Adding dependency for completeness.
Comment 16 Ruediger Pluem 2011-02-08 06:46:12 UTC
Backported to 2.2.x as r1068313.