Bug 46538

Summary: ETag must differ between compressed and uncompressed resource versions
Product: Tomcat 6 Reporter: Oliver Schoett <oliver.schoett>
Component: CatalinaAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED WONTFIX    
Severity: normal    
Priority: P2    
Version: 6.0.18   
Target Milestone: default   
Hardware: PC   
OS: Windows XP   
Attachments: Patch to correct ETags and Vary headers for compression
Disable sending and interpreting ETags (needs to be made into an option)

Description Oliver Schoett 2009-01-15 05:19:32 UTC
The Apache folks are about to fix the problem that ETags are the same for compressed and uncompressed versions of a resource:

   https://issues.apache.org/bugzilla/show_bug.cgi?id=39727

Tomcat 6.0.18 suffers from the same problem.

The effect is that if a caching proxy holds a gzipped version of a resource and is asked by a client for an unzipped version, it requests one from the server with the ETag of the cached version.  The server sees that the ETag of the version it would send out is the same as that of the version the cache already holds and tells the cache that its version is OK (response status code 304).  In the case of a Squid cache, this results in a gzipped version to be sent to the client, and this breaks in IE6 and IE7 when they are configured to use the HTTP 1.0 protocol.

Squid has been provided with a work-around option for this problem:

   http://www.squid-cache.org/Versions/v2/2.6/cfgman/broken_vary_encoding.html

but we should not rely on caches world-wide to provide a work-around for a Tomcat bug.
Comment 1 Oliver Schoett 2009-01-29 03:42:56 UTC
Created attachment 23190 [details]
Patch to correct ETags and Vary headers for compression

Here is a patch that corrects the ETag and Vary behaviour:

- ETags differ for gzipped and ungzipped output

- Vary: Accept-Encoding ist sent whenever a gzipped version is available

The latter change makes it possible for users of differently capable browsers to receive gzipped and ungzipped reponses through the same proxy cache. Previously, an ungzipped cached version would be delivered also to compression-capable browsers, because the cache could not know there was a gzipped version available.

This patch will be put in production on an e-commerce website shortly.
Comment 2 Oliver Schoett 2009-01-29 03:56:52 UTC
Here is the new behaviour when a resource is fetched without and with compression enabled:

curl -v -HPragma: http://tomcat6.example.com/web3/js/hbx.js

> GET /web3/js/hbx.js HTTP/1.1
> User-Agent: curl/7.16.3 (i686-pc-cygwin) libcurl/7.16.3 OpenSSL/0.9.8i zlib/1.2.3 libssh2/0.15-CVS
> Host: tomcat6.example.com
> Accept: */*

< Server: Apache-Coyote/1.1
< Expires: Thu, 29 Jan 2009 13:27:07 GMT
< ETag: W/"15453-1233218608000"
< Last-Modified: Thu, 29 Jan 2009 08:43:28 GMT
< Vary: Accept-Encoding
< Content-Type: text/javascript
< Content-Length: 15453
< Date: Thu, 29 Jan 2009 11:27:07 GMT

curl -v -HPragma: --compressed http://tomcat6.example.com/web3/js/hbx.js | wc -c

> GET /web3/js/hbx.js HTTP/1.1
> User-Agent: curl/7.16.3 (i686-pc-cygwin) libcurl/7.16.3 OpenSSL/0.9.8i zlib/1.2.3 libssh2/0.15-CVS
> Host: tomcat6.example.com
> Accept: */*
> Accept-Encoding: deflate, gzip

< HTTP/1.1 200 OK
< Server: Apache-Coyote/1.1
< Expires: Thu, 29 Jan 2009 13:26:48 GMT
< ETag: W/"15453-1233218608000-gz"
< Last-Modified: Thu, 29 Jan 2009 08:43:28 GMT
< Vary: Accept-Encoding
< Content-Type: text/javascript
< Transfer-Encoding: chunked
< Content-Encoding: gzip
< Date: Thu, 29 Jan 2009 11:26:48 GMT

Note that the first response also has a Vary header, and the second response has a different ETag.

When this patch is employed, the work-around configured by default in Squid caches (broken_vary_encoding) is no longer necessary.  If you want to avoid this work-around for your server, you might configure a server string that does NOT start with "Apache".
Comment 3 Oliver Schoett 2009-01-29 09:05:04 UTC
Warning: the patch I submitted does not work well in connection with the Akamai CDN.

First, the Akamai edge servers transparently decompress content without changing the ETag (so that compressed and uncompressed versions are sent with the same Etag).

Second, the Akamai servers treat responses with Vary: Accept-Encoding but without Content-Encoding header as uncacheable (ESConfigGuide-Customer, p. 54, Note: TTL and the Vary Header).  My patch triggers this in the case of uncompressed responses (due to missing client capability) that the server would be willing to compress.
Comment 4 Mark Thomas 2009-04-16 12:34:10 UTC
Thanks for the patch. I have applied a modified version of it to trunk that also  extended it to the NIO and APR connectors.

The extended patch has been proposed for 6.0.x
Comment 5 Remy Maucherat 2009-04-16 13:57:58 UTC
I disagree with this. Regardless on what happens with the transport, the entity does not change once it is decoded.
-1 for this "fix".
Comment 6 Mark Thomas 2009-04-16 14:16:56 UTC
The I suggest you read section 14.19 of RFC 2616 that makes it quite clear ETags are per variant not per resource.
Comment 7 Remy Maucherat 2009-04-16 14:35:15 UTC
Well, that does not sound very smart (and I had read that on the httpd bug, sigh ...). But overall, I do think the patch is bad (see status file).
Comment 8 Mark Thomas 2009-04-17 01:44:38 UTC
I've reverted the fix from trunk and withdrawn the backport proposal as whilst it fixed this issue, it introduced others.
Comment 9 Oliver Schoett 2009-04-17 07:59:02 UTC
Yes, sorry, the patch is indeed not sufficient. It fixes the "sending" side of the problem in that it sends out ETags that conform to the spec.  However, I now understand that we also need to fix the "receiving" side of the problem, that is, deal with the ETags wet get back in If-Match, If-None-Match and If-Range header fields and make appropriate responses.

This is much harder to fix, as currently the decision "can I compress or not" is made at a completely different point from the base ETag calculation, and yet we must know the set of ETags of the variants we are capable of producing to handle the If-... Requests involving ETags.

Since my client needed a fix urgently (IE users behind proxy caches did receive unusable JavaScript files), I have removed ETag generation and handling completely from his production Tomcats (can supply patch if there is interest).  This way, the logic for "304 Not Modified" responses relies entirely on If-Modified-Since, which works well enough if you keep the date stamps between identical copies of a resource on different servers of a farm in sync.

--

BTW, the part of RFC2616 that makes it most clear that ETags are per variant is in Section 13.6; for example:

   If an entity tag was assigned to a cached representation,
   the forwarded request SHOULD be conditional and include
   the entity tags in an If-None-Match header field from all
   its cache entries for the resource. This conveys to the
   server the set of entities currently held by the cache,
   so that if any one of these entities matches the requested
   entity, the server can use the ETag header field in its
   304 (Not Modified) response to tell the cache which entry
   is appropriate.

This algorithm makes no sense unless ETags are per variant.  Unfortunately, the section that defines ETags (14.19) says nothing about this.
Comment 10 Remy Maucherat 2009-04-17 08:17:37 UTC
Yes, that's my point, the only solution I see in Tomcat 6 about this is an option to remove the etag if compression is active for the request.

And about your spec quoting, it is great to adhere to specs and stuff, but it might be that clients apparently only really support content-encoding, which is not supposed to be used for on-the-fly compression (but is, since I am very not sure about support for transfer-encoding which is the proper way to do that; originally, I had planned all sorts of filters which would be added according to the T-E header, but in the end, the only thing which was workable then was a hardcoded gzip output filter which used the content-encoding header). You have to do things which work ...
Comment 11 Mark Thomas 2009-04-29 03:00:51 UTC
The current state of T-E support in the browsers is:
- Opera advertises T-E support, works with T-E
- Moziila doesn't advertise T-E support, works with T-E
- IE doesn't advertise T-E support, doesn't work with T-E

My reading of the C-E discussion above is that any solution is a hack that will have an issue somewhere. T-E is the right solution. Moving from the current status quo is as likely or more likely to cause issues compared to the current behaviour which while wrong, is at least understood. We could provide a handful of options to allow users to configure the various hacks but this would add a lot of code (and possibly  complexity) to the critical path.

I would like to use T-E by default and fallback to C-E if T-E is not supported. However, the patchy browser support means that another set of options would be required to give folks a reasonable chance of configuring a 'good' behaviour for most clients.

My inclination is to mark this issue as WONTFIX with the longer term plan being implementing T-E and switching to T-E once the browser support is reasonable.
Comment 12 Remy Maucherat 2009-04-29 05:39:05 UTC
I think I used mostly IE when I tried it back then. Did you test with IE 7 and 8 ?

I agree with this kind of browser support, it is still not doable to use T-E :(
Comment 13 Mark Thomas 2009-04-29 06:01:47 UTC
IE7 and IE8 - no joy with T-E
Comment 14 Remy Maucherat 2009-04-29 10:16:59 UTC
Maybe something could be done when the client advertises the T-E, and drop to C-E if it does not ?
Comment 15 Mark Thomas 2009-09-09 10:00:36 UTC
As suggested in comment 11 I am going to resolve this as WONTFIX.

My reasons are:
- any change to use T-E is as likely or more likely to cause breakage
- patchy browser support means another handful of options would be required to give sys admins a reasonable chance of configuring a working configuration compatible with users and their combination of proxies and/or caches
- I believe the complexity this would add to the critical path isn't worth the benefit

In my view the tipping point will be when IE supports T-E whether or not it advertises support for it. At that point I would be all for switching to the spec compliant way of doing compression.
Comment 16 Oliver Schoett 2009-09-10 07:17:08 UTC
Created attachment 24245 [details]
Disable sending and interpreting ETags (needs to be made into an option)

Not fixing this bug makes it impossible to enable gzip compression on public web sites, because IE6 users behind Squid 2.6 and 2.7 proxies will receive broken content:  IE6 by default does not allow compression behind a proxy, but Squid 2.6+ will deliver gzipped content that it already has in the cache, and which is not accepted by IE.

Squid has implemented the option broken_vary_encoding to work around this, which by default is enabled for servers whose header begins with "Apache".  However, this option is buggy (http://www.squid-cache.org/bugs/show_bug.cgi?id=2574), and Tomcat should not require work-arounds by others for its broken behavior.

Thus, an option is needed to disable ETags to make public sites work reliably.  What needs to be done is contained in the patch, which disables sending and interpreting ETags.  This patch (against 6.0.18) has been used successfully in production since February on a German e-commerce site (90 Mill. PV/month).  There is no performance impact, because 304 responses are still generated according to the "If-Modified-Since" logic. Unfortunately, I do not know Tomcat well enough to make this a configurable option.
Comment 17 Mark Thomas 2009-09-10 14:43:14 UTC
Since ETag handling is wholly within the DefaultServlet, just add an option to that servlet. You can use the DefaultServlet's readOnly option as a template.