Bug 63932 - Content compression breaks contract of ETag
Summary: Content compression breaks contract of ETag
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 9
Classification: Unclassified
Component: Connectors (show other bugs)
Version: 9.0.x
Hardware: All All
: P2 major (vote)
Target Milestone: -----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-18 11:58 UTC by Michael Osipov
Modified: 2021-06-15 16:54 UTC (History)
1 user (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Osipov 2019-11-18 11:58:33 UTC
This is basically the same as Bug 39727.

Consider a huge JSON file, content compression is on in the connector and this simple servlet:

> protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
> 	response.setContentType("application/json");
> 	Path dump = Paths.get("projects.json");
> 	response.setContentLengthLong(Files.size(dump));
> 	response.setHeader("ETag", String.format("\"%d+%s\"", Files.size(dump), Files.getLastModifiedTime(dump)));
> 	Files.copy(dump, response.getOutputStream());
> }

and the following curl requests:

> $ curl http://md11gxtc:8080/awesome/awesome  -H "Accept-Encoding: identity" -I
> HTTP/1.1 200
> ETag: "2571736354+2016-01-07T12:29:06.455824Z"
> vary: accept-encoding
> Content-Type: application/json
> Content-Length: 2571736354
> Date: Mon, 18 Nov 2019 11:50:04 GMT

This one is fine, now lets request compression:
> $ curl http://md11gxtc:8080/awesome/awesome  -H "Accept-Encoding: gzip" -I
> HTTP/1.1 200
> ETag: "2571736354+2016-01-07T12:29:06.455824Z"
> vary: accept-encoding
> Content-Encoding: gzip
> Content-Type: application/json
> Transfer-Encoding: chunked
> Date: Mon, 18 Nov 2019 11:50:46 GMT

RFC 7232, 2.1 + 2.3 + 2.3.1 + 2.3.3 say: ETag generation happens over content negotiation.

RFC 7231, 5.3 these headers are part of content negotiation:
Accept
Accept-Charset
Accept-Encoding
Accept-Language

Basically, Tomcat would require to modify the ETag somehow and on the fly remove the change when If-Match/If-None-Match arrives at the servlet. Fully opaque.
Comment 1 Michael Osipov 2019-11-18 12:01:36 UTC
I think this also applies to the DefaultServlet for weak Etags:

An origin server
   SHOULD change a weak entity-tag whenever it considers prior
   representations to be unacceptable as a substitute for the current
   representation.  In other words, a weak entity-tag ought to change
   whenever the origin server wants caches to invalidate old responses.

but I am not fully sure. One might need to inquire with Julian Reschke.
Comment 2 Remy Maucherat 2019-11-18 12:21:55 UTC
The purpose of the tag is to know if there is an update. Thus, it is ok if compression does not change the etag, regardless of what the specification might imply in its language.
Comment 3 Julian Reschke 2019-11-18 12:49:45 UTC
Hm, no.

If the payload is different, it can't have the same strong etag.

Consider the impact on conditional requests.
Comment 4 Michael Osipov 2019-11-18 13:05:35 UTC
(In reply to Remy Maucherat from comment #2)
> The purpose of the tag is to know if there is an update. Thus, it is ok if
> compression does not change the etag, regardless of what the specification
> might imply in its language.

No, that's wrong.
Comment 5 Mark Thomas 2019-11-18 21:50:05 UTC
Please take care, as Julian did, to be specific about whether you are talking about weak or strong validators.

RFC 7232 states (section 2.1)
<quote>
Likewise, a validator is weak if it is shared by two or more
representations of a given resource at the same time, unless those
representations have identical representation data.  For example, if
the origin server sends the same validator for a representation with
a gzip content coding applied as it does for a representation with no
content coding, then that validator is weak.
</quote>

So the Default servlet (that only ever sets a weak ETag) is fine. As is the WebDAV servlet.

Where things get "interesting" is when resources set their own, strong ETag. It looks to me that the simplest solution would be for the container provided compression to check for the presence of a strong ETag and, if it finds one, prepend the weakness indicator to the ETag if it is going to compress the resource.
Comment 6 Konstantin Kolinko 2019-11-18 23:25:38 UTC
(In reply to Mark Thomas from comment #5)
> Please take care, as Julian did, to be specific about whether you are
> talking about weak or strong validators.
> 
> RFC 7232 states (section 2.1)
> <quote>[...]</quote>
> 
> So the Default servlet (that only ever sets a weak ETag) is fine. As is the
> WebDAV servlet.

+1

> Where things get "interesting" is when resources set their own, strong ETag.
> It looks to me that the simplest solution would be for the container
> provided compression to check for the presence of a strong ETag and, if it
> finds one, prepend the weakness indicator to the ETag if it is going to
> compress the resource.

Interesting idea. But I think such feature has to be documented. A server administrator should be aware that changing a Connector setting will change behaviour of a web application in that way as well. (Changing strong ETag to a weak one may affect performance of caches.)

Another possible solution is to do not compress a response if a strong ETag is encountered. The same what we do when a response is already compressed.

https://github.com/apache/tomcat/blob/9.0.29/java/org/apache/coyote/CompressionConfig.java#L203
Comment 7 Michael Osipov 2019-11-19 09:36:52 UTC
(In reply to Mark Thomas from comment #5)
> Please take care, as Julian did, to be specific about whether you are
> talking about weak or strong validators.
> 
> RFC 7232 states (section 2.1)
> <quote>
> Likewise, a validator is weak if it is shared by two or more
> representations of a given resource at the same time, unless those
> representations have identical representation data.  For example, if
> the origin server sends the same validator for a representation with
> a gzip content coding applied as it does for a representation with no
> content coding, then that validator is weak.
> </quote>
> 
> So the Default servlet (that only ever sets a weak ETag) is fine. As is the
> WebDAV servlet.

I agree. Is there a possibility that a custom resource is added to the context with issues strong validators?
 
> Where things get "interesting" is when resources set their own, strong ETag.
> It looks to me that the simplest solution would be for the container
> provided compression to check for the presence of a strong ETag and, if it
> finds one, prepend the weakness indicator to the ETag if it is going to
> compress the resource.

That would be wrong, imho. The servlet expects the ETag to retain strong and not to be modified in terms of comparison strength.
Comment 8 Michael Osipov 2019-11-19 09:38:46 UTC
(In reply to Konstantin Kolinko from comment #6)
> (In reply to Mark Thomas from comment #5)
> > Please take care, as Julian did, to be specific about whether you are
> > talking about weak or strong validators.
> > 
> > RFC 7232 states (section 2.1)
> > <quote>[...]</quote>
> > 
> > So the Default servlet (that only ever sets a weak ETag) is fine. As is the
> > WebDAV servlet.
> 
> +1
> 
> > Where things get "interesting" is when resources set their own, strong ETag.
> > It looks to me that the simplest solution would be for the container
> > provided compression to check for the presence of a strong ETag and, if it
> > finds one, prepend the weakness indicator to the ETag if it is going to
> > compress the resource.
> 
> Interesting idea. But I think such feature has to be documented. A server
> administrator should be aware that changing a Connector setting will change
> behaviour of a web application in that way as well. (Changing strong ETag to
> a weak one may affect performance of caches.)
> 
> Another possible solution is to do not compress a response if a strong ETag
> is encountered. The same what we do when a response is already compressed.
> 
> https://github.com/apache/tomcat/blob/9.0.29/java/org/apache/coyote/
> CompressionConfig.java#L203

That sounds like an option. WDYT about adding a suffix and removing it on the fly like mod_deflate should do?


I get the feeling that compression configuration must be moved sooner or later to a subelement <Compression> beneath a connector.
Comment 9 Remy Maucherat 2019-11-19 09:52:50 UTC
(In reply to Mark Thomas from comment #5)
> Where things get "interesting" is when resources set their own, strong ETag.
> It looks to me that the simplest solution would be for the container
> provided compression to check for the presence of a strong ETag and, if it
> finds one, prepend the weakness indicator to the ETag if it is going to
> compress the resource.

+1, but this is all academic and I think nobody cares overall.
Comment 10 Michael Osipov 2019-11-19 10:36:39 UTC
(In reply to Remy Maucherat from comment #9)
> (In reply to Mark Thomas from comment #5)
> > Where things get "interesting" is when resources set their own, strong ETag.
> > It looks to me that the simplest solution would be for the container
> > provided compression to check for the presence of a strong ETag and, if it
> > finds one, prepend the weakness indicator to the ETag if it is going to
> > compress the resource.
> 
> +1, but this is all academic and I think nobody cares overall.

Not really. If you see that how long it took to discuss the HTTPd ticket, I would be inclined to disable compression when the strong validator is turned into a weak one w/o my knowledge.
Comment 11 Konstantin Kolinko 2019-11-19 14:48:29 UTC
(In reply to Michael Osipov from comment #8)
> 
> I get the feeling that compression configuration must be moved sooner or
> later to a subelement <Compression> beneath a connector.

Enabling compression globally like that may make one vulnerable to BREACH exploit. Maybe controlling this feature from within a web application is a way to go. (E.g. like sendfile feature can be used by DefaultServlet).

https://en.wikipedia.org/wiki/BREACH

> WDYT about adding a suffix and removing it on the fly like mod_deflate should do?

I do not have a clue what you are talking about here.
Comment 12 Remy Maucherat 2019-11-19 15:03:20 UTC
Doing cosmetic configuration changes like this is not a good idea. When the subelements of Connector were introduced, it was out of necessity to implement SNI, not to beautify. It caused a lot of bugs and still causes a significant amount of mess in the code (which will only be fixed in the next major release).
Comment 13 Michael Osipov 2019-11-19 15:51:52 UTC
(In reply to Konstantin Kolinko from comment #11)
> (In reply to Michael Osipov from comment #8)
> > 
> > I get the feeling that compression configuration must be moved sooner or
> > later to a subelement <Compression> beneath a connector.
> 
> Enabling compression globally like that may make one vulnerable to BREACH
> exploit. Maybe controlling this feature from within a web application is a
> way to go. (E.g. like sendfile feature can be used by DefaultServlet).

I don't understand this?! Transparent compression is already on the Connector? All I am saying is to move those three attributes into a subelement.

Maybe it would be better to move this completely to a valve?! Then you will have full control from the webapp.
 
> https://en.wikipedia.org/wiki/BREACH
> 
> > WDYT about adding a suffix and removing it on the fly like mod_deflate should do?
> 
> I do not have a clue what you are talking about here.

The proposal from mod_deflate was to transform

ETag: "..." to "...-gzip", -br, etc. When the client presents the ETag "...-gzip" the compressor would remove the '-gzip" and the application would not notice that at all.
Comment 14 Michael Osipov 2019-11-19 15:52:44 UTC
(In reply to Remy Maucherat from comment #12)
> Doing cosmetic configuration changes like this is not a good idea. When the
> subelements of Connector were introduced, it was out of necessity to
> implement SNI, not to beautify. It caused a lot of bugs and still causes a
> significant amount of mess in the code (which will only be fixed in the next
> major release).

This is likely true, it was just an idea to consider more flexibility. Eventually moving to context.xml instead of sitting on connector level.
Comment 15 Mark Thomas 2019-11-26 20:41:54 UTC
There is a long discussion on the httpd ticket on the merits of adding "-gzip" or similar to a strong  ETag and removing it when seen on a request. The short version is that unless a server tracks the ETags it edited, the server can't be sure that the Etag that has "-gzip" on the end has it because the server added it.

I'm not sure my "make the strong ETag weak" idea is a good one. I suspect there might be some edge cases where there would be breakage.

Of all the ideas, disabling compression in the presence of a strong ETag seems like the best solution to me. An open question is do we make this configurable or do we just do it? If configurable, I'd argue for enabled by default.
Comment 16 Michael Osipov 2019-11-26 21:04:49 UTC
(In reply to Mark Thomas from comment #15)
> There is a long discussion on the httpd ticket on the merits of adding
> "-gzip" or similar to a strong  ETag and removing it when seen on a request.
> The short version is that unless a server tracks the ETags it edited, the
> server can't be sure that the Etag that has "-gzip" on the end has it
> because the server added it.
> 
> I'm not sure my "make the strong ETag weak" idea is a good one. I suspect
> there might be some edge cases where there would be breakage.

Agreed.

> Of all the ideas, disabling compression in the presence of a strong ETag
> seems like the best solution to me. An open question is do we make this
> configurable or do we just do it? If configurable, I'd argue for enabled by
> default.

I would just do it because anything else would break the RFC for that and this is certainly something we don't want to do.
Comment 17 Konstantin Kolinko 2019-11-26 21:59:59 UTC
(In reply to Michael Osipov from comment #16)
> (In reply to Mark Thomas from comment #15)
> 
> > Of all the ideas, disabling compression in the presence of a strong ETag
> > seems like the best solution to me. An open question is do we make this
> > configurable or do we just do it? If configurable, I'd argue for enabled by
> > default.
> 
> I would just do it because anything else would break the RFC for that and
> this is certainly something we don't want to do.

1. I agree (+1) for it to be enabled by default.

2. I think that it would be better to have it configurable (aka flag), deprecate and remove the option in some later version (Tomcat 10).

We can go without an option as the risk is minimal (clients should be able to deal with non-compressed responses), but someone might care.

3. I think that this feature has to be documented. My thought was to add a section to "Special Features" in the doc, as it outgrows what we usually put into an attributes table. [1]

(In reply to Michael Osipov from comment #13)
> (In reply to Konstantin Kolinko from comment #11)
> > (In reply to Michael Osipov from comment #8)
> > > 
> > > I get the feeling that compression configuration must be moved sooner or
> > > later to a subelement <Compression> beneath a connector.
> > 
> > Enabling compression globally like that may make one vulnerable to BREACH
> > exploit. Maybe controlling this feature from within a web application is a
> > way to go. (E.g. like sendfile feature can be used by DefaultServlet).
> 
> I don't understand this?! Transparent compression is already on the
> Connector? All I am saying is to move those three attributes into a
> subelement.

I think that blindly enabling compression for the whole site (without knowing anything about deployed web applications) may make one vulnerable to BREACH. Whether this configuration is actually exploitable depends on what dynamic responses are generated by a web application.

I think that serving static resources is safe. Thus I think that the feature of serving precompressed resources implemented by the DefaultServlet [2] is a better alternative to enabling this option on a Connector.


[1] http://tomcat.apache.org/tomcat-9.0-doc/config/http.html
[2] http://tomcat.apache.org/tomcat-9.0-doc/default-servlet.html#change
Comment 18 Mark Thomas 2019-11-28 16:33:57 UTC
Fixed in:
- master for 9.0.30 onwards
- 8.5.x for 8.5.50 onwards
- 7.0.x for 7.0.99 onwards
Comment 19 thorsten.meinl 2021-06-15 12:29:47 UTC
We recently stumbled upon this change because after an upgrade from Tomcat 8.5 to the lastet Tomcat 9 compression suddenly didn't work any more. We do use strong ETags because it's the only way to use conditional requests for PUT and POST operations.
For now we can set noCompressionStrongETag=false and make sure in our application code that the ETags change with different Accept-Encoding request headers. What I'm wondering is how this will look in Tomcat 10 when that property doesn't exist any more and responses with strong ETag are never compressed. Does this mean we cannot use compression any more at all? That would be bad since we are sending large JSON payloads which benefit from compression quite a lot.
Comment 20 Michael Osipov 2021-06-15 15:25:58 UTC
(In reply to thorsten.meinl from comment #19)
> We recently stumbled upon this change because after an upgrade from Tomcat
> 8.5 to the lastet Tomcat 9 compression suddenly didn't work any more. We do
> use strong ETags because it's the only way to use conditional requests for
> PUT and POST operations.
> For now we can set noCompressionStrongETag=false and make sure in our
> application code that the ETags change with different Accept-Encoding
> request headers. What I'm wondering is how this will look in Tomcat 10 when
> that property doesn't exist any more and responses with strong ETag are
> never compressed. Does this mean we cannot use compression any more at all?
> That would be bad since we are sending large JSON payloads which benefit
> from compression quite a lot.

There is a rant from the Jackson developer that the filesize overweighs the overhead of compression and decompression.
Comment 21 Mark Thomas 2021-06-15 16:54:38 UTC
(In reply to thorsten.meinl from comment #19)

Can you clarify what role, if any Tomcat's default servlet plays in your app.

Tomcat aims to be specification compliant.

Container provided compression and application provided strong ETags are going to be problematic as there is no API provided for either the application to provide an appropriate ETag (with or without compression) or for the container to indicate if compression will be applied.

With a better understanding of your use case we can look at options for addressing this in 10.0.x onwards. I have a few ideas so I think this is a solvable problem.