Some clients (for ex.: wap browser in motorola phone) insert spaces around "=" in "Range:" header, and send something like "Range: bytes = 0-100", but apache understands that header only if it's written without this spaces. Apache 2.0 has the same bug.
That is a possible patch: --- apache_1.3.29/src/main/http_protocol.c.old 2004-03-16 18:59:27.000000000 +0300 +++ apache_1.3.29/src/main/http_protocol.c 2004-03-16 19:01:08.000000000 +0300 @@ -303,10 +303,12 @@ if (!(range = ap_table_get(r->headers_in, "Range"))) range = ap_table_get(r->headers_in, "Request-Range"); - if (!range || strncasecmp(range, "bytes=", 6)) { + if (range && strncasecmp(range, "bytes=", 6)==0) + range += 6; + if (range && strncasecmp(range, "bytes = ", 8)==0) + range += 8; + else return 0; - } - range += 6; /* Check the If-Range header for Etag or Date. * Note that this check will return false (as required) if either
In 14.35.1 of RFC2616 I'm not seeing it... ranges-specifier = byte-ranges-specifier byte-ranges-specifier = bytes-unit "=" byte-range-set byte-range-set = 1#( byte-range-spec | suffix-byte-range-spec ) byte-range-spec = first-byte-pos "-" [last-byte-pos] first-byte-pos = 1*DIGIT last-byte-pos = 1*DIGIT But in the be liberal in what you parse, strict in what you send, I'd accept LWS parsing within Range tags. Other comments? I've reclassed as a 2.0 bug to draw attention and interest :)
+1 to change it in 2.1.
The space parsing problem also affects how the range is determined. If one specifies a negative range like -10, to fetch the last 10 bytes, and it is preceeded by a space, the current code treats the space as a 0 so instead of the last 10 bytes, you get the first 10. So this request: GET /small.html HTTP/1.1 Host: localhost Range: bytes= -10 Connection: close returns bytes 0-9 instead of the the last 10. A patch to fix both issues is: --- http_protocol.c.orig 2005-10-20 11:54:44.000000000 -0400 +++ http_protocol.c 2005-10-20 11:55:57.000000000 -0400 @@ -3033,7 +3033,8 @@ static int ap_set_byterange(request_rec *r) { - const char *range; + const char *tmphdr; + char *range; const char *if_range; const char *match; const char *ct; @@ -3053,8 +3054,14 @@ * Navigator 2-3 and MSIE 3. */ - if (!(range = apr_table_get(r->headers_in, "Range"))) { - range = apr_table_get(r->headers_in, "Request-Range"); + if (!(tmphdr = apr_table_get(r->headers_in, "Range"))) { + tmphdr = apr_table_get(r->headers_in, "Request-Range"); + } + + range = NULL; + if (tmphdr) { + range = apr_palloc(r->pool, strlen(tmphdr)); + apr_collapse_spaces(range, tmphdr); } if (!range || strncasecmp(range, "bytes=", 6) || r->status != HTTP_OK) {
Reconsidering 14.35.1 Byte Ranges in RFC2616, and looking at the recent issues with request/response splitting/spoofing; -1 on any patch to permit this grammer, however in comment 4 we have a clear issue, we can't be treating empty space as a zero-value. Therefore, we should treat the invalid characters (any of them, including unpermited lws) as a flaw. Now, back to the original report, how to handle. I suggest we treat any flawed bytes= sequence as a noop, unset the corresponding input Range/IfRange header, and provide the complete response. The broken client will need to be fixed (it sucks down more data than it intended) but it should be aware of servers which don't support [it's broken] range syntax, and therefore even the broken client shouldn't fail. Sorry if I led in the opposite direction nearly 2 years ago, but in hindsite, allowing invalid grammer proved to be the fatal flaw in the entire class of proxy splitting vulnerabilities, since each server was 'differently permissive' and their permissive and strict interpretations of the headers clashed.
The current code is not compliant with 2616 since it doesn't allow the implied LWS. I cannot imagine why you'd say that *fixing* a 2616 compliance issue could create a security issue. (nor how any particular interpretation of the Range header would create a security issue in the first place) Patch looks fine to me.
Note on Joe's comment, '-' is -NOT- defined as a seperator, so I agree that implied *LWS permits 'bytes = 500-600,601-999' ... or 'bytes=500-600 , 601-999' But implied *LWS apparenty does not permit 'bytes=500 - 600,601 - 999' as the minus symbol/dash is not in the list of 'seperators' defined by RFC 2616.
So what problem is created in following the usual "be liberal in what you accepte" rule and stripping all whitespace? What security issues are you talking about above?
The grammar is already liberal, as you and I agreed. I'm noting only the one specific exclusion. After the Watchfire paper, forgive me for being a bit more paranoid about being excessively liberal with input. Unless we care to canonicalize the input, if it can be passed through the proxy, we should stick to RFC2616. +1 on fixing the implied LWS where permitted, which is everywhere except between digits, or between the digits and the hyphen.