Bug 42987 - Weak Etags in Apache are useless and violate RFC 2616, 13.3.3
Summary: Weak Etags in Apache are useless and violate RFC 2616, 13.3.3
Status: RESOLVED LATER
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: Core (show other bugs)
Version: 2.2.4
Hardware: All All
: P2 normal with 2 votes (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords: MassUpdate, RFC
Depends on:
Blocks:
 
Reported: 2007-07-27 05:23 UTC by Werner
Modified: 2018-11-07 21:08 UTC (History)
4 users (show)



Attachments
conditional GET with apache-style weak etag (1.39 KB, text/plain)
2007-07-27 05:26 UTC, Werner
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Werner 2007-07-27 05:23:42 UTC
When Apache cannot create an etag reliably, due to insufficient time resolution,
it returns a weak etag. When a client uses this weak etag in a conditional
request, it will *never* match. (see test cases)

According to RFC 2616, 13.3.3, a weak etag should match, even if the resource
has changed, as long as the changes are semantically insignificant.
Contrary to this, apache-style weak etags have the meaning "never match". This
weak etags are completely useless in any cache validation. No etag would be just
as good, except for the missing confusion.

For standard HTTP: Apache should send *no* etag or a valid one.

For mod_dav: Strong etags are essential for any caching client. As long as the
server does not edit the content of resources, it should always send a valid
strong etag, even if this affects performance. It should also return a strong
etag in a response to PUT.
Comment 1 Werner 2007-07-27 05:26:36 UTC
Created attachment 20556 [details]
conditional GET with apache-style weak etag

These test cases show that a apache-style weak etag in a conditional GET will
never match. An unconditional GET without etag would do the same.
Comment 2 Werner 2007-12-27 12:51:36 UTC
I am referring to
http://mail-archives.apache.org/mod_mbox/httpd-dev/200710.mbox/%3c470E9A9F.8020202@pearsoncmg.com%3e
and would like to propose this solution.

Precondition: Getting a strong Etag immediately (withing 1 second) after
changing a resource is not important for many applications of HTTP, but is very
important for WebDAV. For this reason the bug must be fixed differently in
ap_make_etag() and in mod_dav/mod_dav_fs.

Apache core, ap_make_etag()
---------------------------
If the file mtime is within the same second as the request time:
- do not create and not send an etag
- send header Cache-Control: no-cache (to prevent a fallback to using
  Last-Modified for cache validation)

Note: There are no backward compatibility issues with dropping the weak etag.
The meaning of Apache-style weak etags is "never-matching-etag". It's single
effect is, that a cache might store the entity, only to drop it on the next request.

But this misuse of weak etags has the effect to encourage/force clients to
implement handling of weak etags in a way that violates the RFC and might cause
problems whenever someone uses weak etags in an RFC-compliant way.

WebDAV (mod_dav_fs)
-------------------
WebDAV-clients need a strong etag immediately after a PUT to avoid a lot of
traffic. To avoid race conditions, this strong etag must be returned in the
response to the PUT-request.
Because of the limited time resolution of many file systems this can not be
created with information from the stored file alone. Using some kind of hash
function would consume to much CPU-power and does not guarantee really strong
etags. So I vote for storing the strong etag as a WebDAV-property.

This solution imposes a restriction on the administrator of the server:
- resources must only be changed via WebDAV (to not bypass the etag
  maintainance of mod_dav_fs)
- if the administrator feels the need to change the repository via the local
  file system, serving PUT-requests must be stopped one second before, and
  only be restarted one second after the changes are made.

This restrictions may not be acceptable in some cases, so a configuration option
is needed, to turn this off and fall back to ap_make_etag().

To be reliable and to catch most of the errors of lazy administrators, I propose
this logic:

1. The etag is stored as a WebDAV-property. It consists of two parts.
   Part 1 is the same as created by ap_make_etag().
   Part 2 is a simple counter. It is set to 0, when the mtime changes. If
   the file is changed without change in mtime, it is incremented.

2. on PUT-requests (and other requests that change the file)
   - get the old etag property
   - change the file
   - create Part 1 of the new etag
   - compare the new and the old Part 1:
     if they are equal: increment Part 2
     otherwise take the new Part1 and append Part 2 with value 0

3. on GET, HEAD, PROPFIND and other requests that don't change the file
   (this also applies when checking If-Match and If-None_match headers)
   - get the stored etag property
   - create Part 1 from the file
   - compare Parts 1; if they are different replace the etag property by the
     new Part 1 with Part 2 = 0
   This will catch most of the cases, where resources have been changed via
   the local file system.

There is a small chance to get a wrong etag, if
- a file has been changed via WebDAV
- after this, but within the same second, the file is changed via the local
  file system (neglecting the good advise given in that fine manual)

Additionally:
When strong etags are enabled this way for mod_dav_fs, responses to PUT requests
should always include Etag and Last_modified headers.

To make clear, why this is important for WebDAV:
At the moment, a WebDAV-Client, that wants to save a file and have the file
available after that, has to
- send a LOCK-request
- do a HEAD-request to check for changes or existence
- send the file via PUT
- retrieve the file via GET (unconditional)
- UNLOCK
That's five requests, and there is still a potential danger, as locks are
not guaranteed to work; so the file might be changed between HEAD and PUT.

Having proper etags and working If-Match and If-None-Match, this transaction
would be atomic: just one PUT with If-Match or If-None-Match header, and the
client would have a valid strong etag for use in the next request.

I am not able, to prepare a patch. It would take me at least 3 month to learn
enough about apche and another three month to understand and change mod_dav_fs.

Werner
Comment 3 Henrik Nordstrom 2007-12-29 21:42:24 UTC
It's not true that the meaning of a weak etag sent by Apache is useless and the
object will be dropped on the next request. Only on the next cache validation
after the object has expired.

For as long as the object has not expired the weak etag is sufficient for
client<->cache validations.

There is no real problem in that Apache never matches these weak etags in
If-None-Match.

Yes, it's true that to comply with the RFC the ETag should guarantee within
reasonable doubt that the representations is equal (semantically in case of a
weak etag, octet equal in case of a strong). The only way of guaranteeing this
is by knowing why the object gets updatated and how. This can not be guaranteed
while using a fs backend as direct filesystem access may modify the object and
object timestamps in any manner it likes.

But direct filesystem modifications not modifying mtime, or significant
modifications within the same second where results matter within that same
second is both relatively unlikely to be seen in real life.

But you should probably modify the weak ETag a little more than only making it
weak. In the RFC a strong and weak ETag with the same value compares true for
If-None-Match.
Comment 4 Werner 2007-12-30 04:26:32 UTC
Henrik Nordstrom wrote:
> There is no real problem in that Apache never matches these weak etags in
> If-None-Match.

On
http://mail-archives.apache.org/mod_mbox/httpd-dev/200712.mbox/%3c7002DFA0-43B9-464A-9843-B566D44980AF@gbiv.com%3e
Roy T. Fielding wrote:
> If the weak etags are not being matched to the string etags on
> GET, then that is another bug that must be fixed.  It is not an
> excuse to ignore the HTTP design.

Seems to be time that Apache developers make up their mind on what they want to
achieve, so it could make sense discussing how to do it in an efficient and
comlying way.
I was mislead by the asumption, that what Apache does is what it is intended to do.

Werner
Comment 5 William A. Rowe Jr. 2018-11-07 21:08:21 UTC
Please help us to refine our list of open and current defects; this is a mass update of old and inactive Bugzilla reports which reflect user error, already resolved defects, and still-existing defects in httpd.

As repeatedly announced, the Apache HTTP Server Project has discontinued all development and patch review of the 2.2.x series of releases. The final release 2.2.34 was published in July 2017, and no further evaluation of bug reports or security risks will be considered or published for 2.2.x releases. All reports older than 2.4.x have been updated to status RESOLVED/LATER; no further action is expected unless the report still applies to a current version of httpd.

If your report represented a question or confusion about how to use an httpd feature, an unexpected server behavior, problems building or installing httpd, or working with an external component (a third party module, browser etc.) we ask you to start by bringing your question to the User Support and Discussion mailing list, see [https://httpd.apache.org/lists.html#http-users] for details. Include a link to this Bugzilla report for completeness with your question.

If your report was clearly a defect in httpd or a feature request, we ask that you retest using a modern httpd release (2.4.33 or later) released in the past year. If it can be reproduced, please reopen this bug and change the Version field above to the httpd version you have reconfirmed with.

Your help in identifying defects or enhancements still applicable to the current httpd server software release is greatly appreciated.