Bug 42404

Summary: Filename/info for disk-cached proxied remote URLs not available.
Product: Apache httpd-2 Reporter: D. Stussy <software+apache-httpd>
Component: mod_cacheAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: RESOLVED LATER    
Severity: enhancement CC: software+apache-httpd
Priority: P4 Keywords: MassUpdate
Version: 2.2.4   
Target Milestone: ---   
Hardware: PC   
OS: Linux   

Description D. Stussy 2007-05-13 12:32:20 UTC
I'm not certain whether what I'm reporting is actually a bug, a feature 
request, or both.  Therefore, I've submit it as a low priority enhancement.

Normally, Apache has the ability to return the filename for a local resource 
via PHP's apache_lookup_uri() function call.  When Apache is acting as a proxy 
server, there is no local filename for remote URLs.  However, there may be a 
local file created should mod_disk_cache capture and hold a working copy of a 
proxied resource (should it be told to do so via the configuration file).  
Therefore, if a local filename is available, I question whether it should be 
returned.  Currently, the function returns "proxy:REMOTE_URL" for all proxied 
requests, not a local filename from the disk cache.  I would like a local 
filename returned if one is available.

What I'm trying to do:  I'm trying to return certain information from my 
server's disk cache about proxied resources, such as when they were last 
updated (i.e. fetched), last accessed, etc.  This information is available from 
filesystem calls for the file in the cache holding the data portion of the 
URL.  Sometimes, some but not all of the information is available in the cached 
HTTP headers.  The PHP code I've wrote is:

$URL is the disk-cached, proxied URL to retrieve information about.

    $QS = (strstr($URL, "?") == FALSE) ? "?" : "";    /* FOOTNOTE */
    $parse = get_object_vars(apache_lookup_uri("/cache?".$URL.$QS));
    $file = $parse['filename'];

$file should be a local filename for a local file, but seems to 
be "proxy:REMOTE_URL" for all proxied resources.  I'd like this to be the file 
pathname from the local disk cache if such is available.  This way, I could 
report the created, last-modified, and last-accessed times to a user (for 
statistical purposes).

"/cache" is defined in the server configuration file to do a proxied rewrite 
rule on its QUERY_STRING variable (using "RewriteRule ... [P]") should the 
appropriate security conditions be met.

I've looked at mod_disk_cache.c, and from what I can determine, this line 
inserted at the end of "open_entity()" should do it (about line 480):

+    r->filename = apr_pstrdup(r->pool, dobj->datafile);
     return OK;

Of course, one should probably free any prior assignment of "r->filename" else 
there would be a small memory leak.  However, as is, this does not seem to do 
the requested job.  (For locally served files, the true local filename should 
be used, not the disk cache's filename, but that can be handled by prefixing 
the code with a conditional - which wasn't necessary for the test.)  What am I 
missing, or what can be done to make this work?

-------
Footnote:  $QS is necessary, else any QUERY_STRING passed to the PHP script 
would be inherited by the lookup of the proxied URL.  The only way I found to 
suppress the inheritance was to append a "null query string" to the URL, which 
should only be done if it lacks its own query string.
Comment 1 Graham Leggett 2009-10-03 08:36:26 UTC
The problem with gaining access to a file is that the file only makes sense if there is a single file, and in the case of the disk cache, there are multiple separate files that represent the body of each variant, and the headers of each variant.

A better approach would be to expose a more formal API to query the status of cached URLs.
Comment 2 Graham Leggett 2010-10-18 19:07:14 UTC
htcacheclean in httpd-trunk how has the ability to list entries in the cache (-a flag), and list entries in the cache along with a complete dump of cached metadata (-A flag).

Will this be enough?
Comment 3 D. Stussy 2010-10-19 02:38:09 UTC
RE - Comment #1:  I am aware that the cache generates two files (header and body) per request cached, and additional files when the request varies.  The file would be the body file returned according to the headers given in the subrequest after variance is applied.  This reduces the subrequest to a single file, if one exists.  If one does not exist, then "proxy:URL" (or other appropriate substitute string) that is currently returned will still be returned.

Clarification:  The concept is akin to the ability to retrieve the last cached copy of a resource like certain search engines keep around (e.g. google).

RE - Comment #2:  No.  My idea concerned making the local filename (from the cache when the URL is remote) available to the HTTP request itself (as a subrequest if appropriate).  Therefore, availability to the htcacheclean program does not help.  (However, it may be helpful for maintenance purposes.)

Does that make more sense about what I am suggesting?
Comment 4 William A. Rowe Jr. 2018-11-07 21:08:26 UTC
Please help us to refine our list of open and current defects; this is a mass update of old and inactive Bugzilla reports which reflect user error, already resolved defects, and still-existing defects in httpd.

As repeatedly announced, the Apache HTTP Server Project has discontinued all development and patch review of the 2.2.x series of releases. The final release 2.2.34 was published in July 2017, and no further evaluation of bug reports or security risks will be considered or published for 2.2.x releases. All reports older than 2.4.x have been updated to status RESOLVED/LATER; no further action is expected unless the report still applies to a current version of httpd.

If your report represented a question or confusion about how to use an httpd feature, an unexpected server behavior, problems building or installing httpd, or working with an external component (a third party module, browser etc.) we ask you to start by bringing your question to the User Support and Discussion mailing list, see [https://httpd.apache.org/lists.html#http-users] for details. Include a link to this Bugzilla report for completeness with your question.

If your report was clearly a defect in httpd or a feature request, we ask that you retest using a modern httpd release (2.4.33 or later) released in the past year. If it can be reproduced, please reopen this bug and change the Version field above to the httpd version you have reconfirmed with.

Your help in identifying defects or enhancements still applicable to the current httpd server software release is greatly appreciated.