Bug 49786

Summary: Range header incorrectly handled when entitiy is not cached
Product: Apache httpd-2 Reporter: Ryan Hope <rhope>
Component: mod_cacheAssignee: Apache HTTPD Bugs Mailing List <bugs>
Status: RESOLVED DUPLICATE    
Severity: normal CC: rhope
Priority: P2    
Version: 2.2-HEAD   
Target Milestone: ---   
Hardware: All   
OS: All   

Description Ryan Hope 2010-08-19 15:57:06 UTC
If client sends an HTTP requests with a Range header to a caching proxy using mod_cache where the source content is on separate origin server, and the entitiy is not cached, the content-range request is cached as if it were the full document.  Subsequent requests for that cached content are then pulling content from the "partial" response that was cached.

Steps to reproduce
1. Find a file that is not cached on the caching proxy.
2. Request a range of that file using the Range header (for example with BITS)
3. Request another request of the file where the range is different (or not specified.)

Actual Results
The 206 response from the second request is the same as the first request.

Expected Results
The 206 resposne should include just the range that was requested.

Build Date & Platform
2.2.16 on a CentOS box

Additional Information:
If the first request (e.g. when the cache is populated) is a full file request, then subsequent Range requests are handled properly.  

Also, if the origin server adds a "Vary: content-range" header to the response, the client is returned the appropriate content.  The big downside to this is practically every range request is stored in a separate file in the cache (using file cache).  And if there are multiple request for different overalapping Ranges, the cache could contain a lot of redundant information.

This looks very similar to bug ID 44579 (https://issues.apache.org/bugzilla/show_bug.cgi?id=44579).  That bug was specific to expired cache content, and making Range request.  However, from looking at the fix, it does not address when the first request for a piece of non-cached content contains a Range header.  It looks to me like there needs to be call at the end of cache_storage.c -> cache_select() in the non-matched case to remove the Range header from the request:

    }
    apr_table_unset(r->headers_in, "Range");
    return DECLINED;
}

Either that, or unset the Range header in the function that calls cache_select()
Comment 1 Ryan Hope 2010-08-19 17:21:05 UTC
In looking at the code more closely, it looks like removing the Range header at the bottom of cache_select() might not be the best idea, since there are other cache miss return statments in other places.  Looks like mod_cache.c inside cache_url_handler() somewhere after the cache lock is obtained (maybe right before the CACHE_URL_REMOVE filter is added) would be a better place.
Comment 2 Ruediger Pluem 2010-08-20 02:45:36 UTC

*** This bug has been marked as a duplicate of bug 49113 ***
Comment 3 Ryan Hope 2010-08-23 16:37:46 UTC
It looks like the fix in bug 49113 was to simply not cache partial requests/responses.  While that fix does make sure that the client gets what it's expecting, it also makes the caching mostly useless for an application like ours where most of the requests are partial.

What I'd like to see is that a non-conditional request for a partial range is actually translated into a non-conditional full request when it is proxied out to the origin server (assumign it misses the cache), then the full response is cached, but only the partial response is sent to the client.

Since full documents that are stored in the cache are handled correctly when there's a partial request, it seems like it would be pretty straighforward (or at least the fix would be similar to bug ID 44579) to just translate a partial cache miss into a full request to the origin, then break it up into partial on the response.