Bug 32147 - provide finer grained control over enabling/disabling cache logic
Summary: provide finer grained control over enabling/disabling cache logic
Status: RESOLVED WORKSFORME
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_cache (show other bugs)
Version: 2.1-HEAD
Hardware: All All
: P3 enhancement (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords: PatchAvailable
Depends on:
Blocks:
 
Reported: 2004-11-10 07:47 UTC by Jesse Sipprell
Modified: 2011-11-16 22:32 UTC (History)
0 users



Attachments
fine-grained enable/disable enhancements to mod_cache (43.86 KB, patch)
2004-11-10 07:49 UTC, Jesse Sipprell
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jesse Sipprell 2004-11-10 07:47:59 UTC
This is undoubtably a corner-case, however I've experienced some flexibility
issues configuring mod_cache to work well with a combination of mod_proxy
and mod_rewrite.  Specifically, the issue related to the inability
(without resorting to cumbersome URI rewriting kludges) of mod_cache to be
enabled/disabled on more than simply the basis of the leading portion of
a URI path.

Additionally, due to the fact that mod_cache uses the quick_handler hook, it
interrupts (if deciding to return cached content) most down-stream modules so
that they cannot make decisions about caching or non-caching content.
post_read_handler is the obvious exception, however due to the nearly
unconditional way in which mod_cache intercepts requests, it's rather
non-elegant to resort to intercept and avoidance trickery via post_read_handler
algorithms.  I do understand that using quick_handler, in the majority
of minimal configuration caching needs, is a performance win.

Certainly, this could be worked around with subrequests, however I would
prefer not to have to deal with the overhead of a subreq on every transaction
(which is what would be necessary in _my_ particular case, others may have
better solutions).

With that being stated, the attached patch to 2.1-HEAD was my solution to
this issue.  The following are a list of changes, some of which may be beyond
the scope of what was necessary and violate various development API
integrity rules.  If this is the case, I would be happy to remove/alter
certain portions (and I'll mention some discomforts I have below as well).

Changes:

1. Added two optional hooks, cache_check_enabled and cache_check_disabled:

   A. cache_check_enabled is run from ap_cache_get_providers in order to
   determine if a particular uri (or other condition) is cause to
   enable caching.  The default handler for this hook implements the
   original functionality by iterating the cacheenable list and
   adding each entry whose left-most portion of the uri path matches.

   B. cache_check_disabled is run from ap_cache_get_provider in order
   to dermine if caching should be disabled.  The first hook to return
   DECLINED causes mod_cache to discontinue trying to find a provider.
   Again, the default handler performs the original functionality by
   iterating the cachedisable list.  In addition, a check_disable hook may
   return CACHE_DEFER, which results in mod_cache refusing to return cached
   content if in the quick_handler hook.  Instead it tries again from a
   regular content hook (see below).

2. New optional function: ap_cache_request_enable_provider.  Intended
   to be used by those who hook check_enabled to add a provider name
   ("type" seems to be the parlance in mod_cache at that level)
   and optional version number to the list of providers that
   ap_cache_get_providers() will try to lookup.

   Using ap_cache_request_enable_provider is a module's way of telling
   mod_cache to attempt caching.  The func name is tad cumbersome,
   the "request" is only in there to give some indication that it is
   a per-request call, not a general-use function for enabling providers.

   Perhaps this should be an optional, because it's functionally identical
   to a normal API call.  If that is the case, then check_enabled and
   check_disabled shouldn't be optional hooks either.

3. Added a content handler to mod_cache so that it (or others) can choose,
   selectively, to handle a request _after_ other modules have taken
   their turn.  Particularly useful for mod_rewrite.  Additionally, the
   request handler  must be set to "cache-server", which is done
   automatically if a check_disabled handler returns CACHE_DEFER inside
   the context of cache_url_handler.  mod_rewrite can also enable caching
   this way by setting the content handler during a rewrite rule.
   In my case, this is useful for enabling both reverse proxy and
   caching for requests that meet certain header constraints.

4. Added a new directive "CacheDefer", which when toggled on forces
   the above behavior (handling from the content_handler) to be the
   default.  This was completely arbitrary, however it provided the
   functionality I needed and was useful for testing.  Obviously,
   with the above changes this could be done from anywhere.  Not crazy
   about the name either, it is .. non-intuitive for those unfamiliar
   with the code.

5. The majority of mod_cache.h internals were moved to cache_private.h, due
   to the fact that there now exist some intentionally public exports.
   The now highly minimalized mod_cache.h added to $top_srcdir/Makefile.in
   for the install-include target.  All mod_cache related sources that
   previously referenced mod_cache.h changed to cache_private.h.
   Might need some dependancy fixups, I didn't go that far.

Thank you for your time.  I hope this will be of some use.  If there are
any questions or requested changes, please feel free to let me know
(or just have bugzilla do it =P)

Jesse Sipprell
Comment 1 Jesse Sipprell 2004-11-10 07:49:07 UTC
Created attachment 13375 [details]
fine-grained enable/disable enhancements to mod_cache
Comment 2 Graham Leggett 2009-10-03 08:31:47 UTC
httpd-trunk supports the CacheQuickHandler directive, which allows you to run the cache as a normal handler, which means most of the cases described below should now work.

Assuming this problem still exists, can you verify that CacheQuickHandler helps the issues below?

The attached patch seems to attempt to do a number of things at the same time, which is difficult to review.
Comment 3 Kai Krakow 2010-02-02 06:20:57 UTC
Is this going to be backported or available as a single patch I could try to apply to 2.2.14 sources?

We have a problem where using mod_cache and mod_rewrite together and enabling Expires header in a php application, mod_cache always returns the same content on subsequent requests no matter which URL was requested from apache. mod_cache seems to only see the index.php in it's cache while rewrite rules route all URLs through this file. Example setup:

RewriteEngine on
RewriteRule ^(favicon\.ico|robots\.txt) - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-l
RewriteRule .* index.php

On the first request (cold cache) when we request /real/url.html from the server it returns the right content. On the next request it serves the content from cache as intended. Next we request /another/url.html and the cache simply servers the content of the first url without going through the index.php (which uses PATH_INFO to extract the URL data).

I think this is because mod_cache only hashed "index.php" as the URL which had in both cases no query string and it thus handles both different requests as equal. As far as I understood this should be solvable by using "CacheQuickHandler off".
Comment 4 Ruediger Pluem 2010-02-02 08:40:55 UTC
(In reply to comment #3)
> Is this going to be backported or available as a single patch I could try to
> apply to 2.2.14 sources?
> 
> We have a problem where using mod_cache and mod_rewrite together and enabling
> Expires header in a php application, mod_cache always returns the same content
> on subsequent requests no matter which URL was requested from apache. mod_cache
> seems to only see the index.php in it's cache while rewrite rules route all
> URLs through this file. Example setup:
> 
> RewriteEngine on
> RewriteRule ^(favicon\.ico|robots\.txt) - [L]
> RewriteCond %{REQUEST_FILENAME} !-f
> RewriteCond %{REQUEST_FILENAME} !-d
> RewriteCond %{REQUEST_FILENAME} !-l
> RewriteRule .* index.php
> 
> On the first request (cold cache) when we request /real/url.html from the
> server it returns the right content. On the next request it serves the content
> from cache as intended. Next we request /another/url.html and the cache simply
> servers the content of the first url without going through the index.php (which
> uses PATH_INFO to extract the URL data).
> 
> I think this is because mod_cache only hashed "index.php" as the URL which had
> in both cases no query string and it thus handles both different requests as
> equal. As far as I understood this should be solvable by using
> "CacheQuickHandler off".

This sounds strange and IMHO should not happen with 2.2.14. Please set your loglevel to debug and provide the error log output for

1. Startup with a cold cache
2. Request /real/url.html
3. Request /real/url.html (from cache)
4. Request /another/url.html
Comment 5 William A. Rowe Jr. 2011-11-16 22:32:49 UTC
No reply to info request; closing.