Created attachment 29069 [details] Use uppercase hexadecimal digits in mod_rewrite Apache mod_rewrite encodes special characters using lowercase hexadecimal digits, for example Chráněná becomes Chr%c3%a1n%c4%9bn%c3%a1 instead of Chr%C3%A1n%C4%9Bn%C3%A1. The use of a non-canonical URL breaks our caching system. We can't use lowercase hexadecimal digits as our canonical URLs because no browser sends URLs like that, so the cache would be even more badly broken. Please use uppercase hexadecimal digits in URLs.
In RFC 1738, about Uniform Resource Locators (URL) (http://www.rfc-editor.org/rfc/rfc1738.txt) it is written that : >>> 2.2. URL Character Encoding Issues [...] In addition, octets may be encoded by a character triplet consisting of the character "%" followed by the two hexadecimal digits (from "0123456789ABCDEF") which forming the hexadecimal value of the octet. (The characters "abcdef" may also be used in hexadecimal encodings.) [...] <<< So, I guess that httpd is correct when encoding with lower case. I left the report open, just in case, but I think that it should be marked as FIXED, WONTFIX.
(In reply to comment #1) > In RFC 1738, about Uniform Resource Locators (URL) > (http://www.rfc-editor.org/rfc/rfc1738.txt) > > > it is written that : > > >>> > 2.2. URL Character Encoding Issues > > [...] > In addition, octets may be encoded by a character triplet consisting > of the character "%" followed by the two hexadecimal digits (from > "0123456789ABCDEF") which forming the hexadecimal value of the octet. > (The characters "abcdef" may also be used in hexadecimal encodings.) > [...] > > <<< > > > So, I guess that httpd is correct when encoding with lower case. > > > I left the report open, just in case, but I think that it should be marked > as FIXED, WONTFIX. I think the RFC is pretty clear about which encoding is preferred, and it's not the one httpd is using. You seem to be using a very loose definition of "correct". There are two ways of doing it: one is preferred, the other is idiosyncratic and breaks caching. It is a simple change and the patch is attached.
Apache is not incorrect here; the cache is not performing its job as well as it could: a well-written cache would compare URLs more intelligently than just a simple string compare. The RFC does say that software should encode URLs with upper-case hex encoding, though, and many clients do have bugs like this one when it comes to comparing URLs, so I think it would be reasonable for apache to change its behavior here. ("Be strict in what you produce, but liberal in what you accept", and all that.) http://tools.ietf.org/html/rfc3986#section-6.2 has more discussion on URL comparison and normalization.