Bug 53767

Summary: htcacheclean deletes stale "vary" header files even if cache limits aren't reached
Product: Apache httpd-2 Reporter: andyh <andy.hutson+apache>
Component: mod_cache_disk / mod_disk_cacheAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: RESOLVED LATER    
Severity: normal CC: andy.hutson+apache, dave
Priority: P2 Keywords: MassUpdate
Version: 2.2.21   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Attachments: strace output

Description andyh 2012-08-23 15:28:43 UTC
Created attachment 29268 [details]
strace output

For a resource that contains a "Vary" header, mod_cache creates a .header file containing the "vary" headers, and then creates a subdirectory (<hash>.header.vary), which in turn contains the actual .header and .data files.

Assuming the cache is not above the limit specified by htcacheclean, stale content should remain in place so that it can be while revalidating content or if there's an error (ie CacheStaleOnError behaviour). For non-varying resources, this works fine.

However, for resources that "vary", it appears there is a bug in htcacheclean. Once the content is stale, it deletes the top level .header file (containing the "vary" headers), even if the cache is within limits, meaning mod_cache is unable to serve the stale content (htcacheclean reports "0 entries deleted" though). It would appear that creating an empty .data file alongside the .header file prevents this from happening.

I'm attaching the output of the following, for reference, which shows the unlink call at line 173:
strace -o ~/strace.out -s 512 htcacheclean -r -v -p /data/httpcache/httpd/ -l 1000k
Comment 1 andyh 2012-08-23 16:00:49 UTC
Incidentally, this happens with or without the "-r" switch.
Comment 2 andyh 2012-08-24 10:54:14 UTC
From http://svn.apache.org/repos/asf/httpd/httpd/trunk/support/htcacheclean.c (which is later than the one we're running, but I believe has the same issue); looks like this is the section that's doing it:

        /* single data and header files may be deleted either in realclean
         * mode or if their modification timestamp is not within a
         * specified positive or negative offset to the current time.
         * this handling is necessary due to possible race conditions
         * between apache and this process
         */
        case HEADER:
            current = apr_time_now();
            nextpath = apr_pstrcat(p, path, "/", d->basename,
                                   CACHE_HEADER_SUFFIX, NULL);
            if (apr_file_open(&fd, nextpath, APR_FOPEN_READ | APR_FOPEN_BINARY,
                              APR_OS_DEFAULT, p) == APR_SUCCESS) {
                len = sizeof(format);
                if (apr_file_read_full(fd, &format, len,
                                       &len) == APR_SUCCESS) {
                    if (format == VARY_FORMAT_VERSION) {
                        apr_time_t expires;

                        len = sizeof(expires);

                        if (apr_file_read_full(fd, &expires, len,
                                               &len) == APR_SUCCESS) {
                            apr_finfo_t finfo;

                            apr_file_close(fd);

                            if (apr_stat(&finfo, apr_pstrcat(p, nextpath,
                                    CACHE_VDIR_SUFFIX, NULL), APR_FINFO_TYPE, p)
                                    || finfo.filetype != APR_DIR) {
                                delete_entry(path, d->basename, nodes, p);
                            }
                            else if (expires < current) {
                                delete_entry(path, d->basename, nodes, p);
                            }

                            break;
                        }
                    }


Specifically, this part:
                            else if (expires < current) {
                                delete_entry(path, d->basename, nodes, p);
                            }

This seems to be deleting expired vary header files by design. But this breaks CacheStaleOnError handling, as well as (but to a lesser extent) the stale-while-revalidate function that CacheLock provides, for any resources that vary.

Not sure if there's any reason to be deleting these header files? They'll get deleted later on anyway, in purge(), if the cache is above limits. If there's no other need, should these three lines simply be removed?
Comment 3 William A. Rowe Jr. 2018-11-07 21:08:00 UTC
Please help us to refine our list of open and current defects; this is a mass update of old and inactive Bugzilla reports which reflect user error, already resolved defects, and still-existing defects in httpd.

As repeatedly announced, the Apache HTTP Server Project has discontinued all development and patch review of the 2.2.x series of releases. The final release 2.2.34 was published in July 2017, and no further evaluation of bug reports or security risks will be considered or published for 2.2.x releases. All reports older than 2.4.x have been updated to status RESOLVED/LATER; no further action is expected unless the report still applies to a current version of httpd.

If your report represented a question or confusion about how to use an httpd feature, an unexpected server behavior, problems building or installing httpd, or working with an external component (a third party module, browser etc.) we ask you to start by bringing your question to the User Support and Discussion mailing list, see [https://httpd.apache.org/lists.html#http-users] for details. Include a link to this Bugzilla report for completeness with your question.

If your report was clearly a defect in httpd or a feature request, we ask that you retest using a modern httpd release (2.4.33 or later) released in the past year. If it can be reproduced, please reopen this bug and change the Version field above to the httpd version you have reconfirmed with.

Your help in identifying defects or enhancements still applicable to the current httpd server software release is greatly appreciated.