Bug 53554 - Wrong case for hexadecimal percent encoding [patch]
Summary: Wrong case for hexadecimal percent encoding [patch]
Status: NEW
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_rewrite (show other bugs)
Version: 2.5-HEAD
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-07-16 23:55 UTC by Tim Starling
Modified: 2013-03-19 23:15 UTC (History)
0 users



Attachments
Use uppercase hexadecimal digits in mod_rewrite (490 bytes, patch)
2012-07-16 23:55 UTC, Tim Starling
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Starling 2012-07-16 23:55:18 UTC
Created attachment 29069 [details]
Use uppercase hexadecimal digits in mod_rewrite

Apache mod_rewrite encodes special characters using lowercase hexadecimal digits, for example Chráněná becomes Chr%c3%a1n%c4%9bn%c3%a1 instead of Chr%C3%A1n%C4%9Bn%C3%A1. The use of a non-canonical URL breaks our caching system. We can't use lowercase hexadecimal digits as our canonical URLs because no browser sends URLs like that, so the cache would be even more badly broken. Please use uppercase hexadecimal digits in URLs.
Comment 1 Christophe JAILLET 2012-09-30 06:26:35 UTC
In RFC 1738, about Uniform Resource Locators (URL)
(http://www.rfc-editor.org/rfc/rfc1738.txt)


it is written that :

>>>
2.2. URL Character Encoding Issues

[...]
In addition, octets may be encoded by a character triplet consisting
of the character "%" followed by the two hexadecimal digits (from
"0123456789ABCDEF") which forming the hexadecimal value of the octet.
(The characters "abcdef" may also be used in hexadecimal encodings.)
[...]

<<<


So, I guess that httpd is correct when encoding with lower case.


I left the report open, just in case, but I think that it should be marked as FIXED, WONTFIX.
Comment 2 Tim Starling 2012-10-01 04:54:06 UTC
(In reply to comment #1)
> In RFC 1738, about Uniform Resource Locators (URL)
> (http://www.rfc-editor.org/rfc/rfc1738.txt)
> 
> 
> it is written that :
> 
> >>>
> 2.2. URL Character Encoding Issues
> 
> [...]
> In addition, octets may be encoded by a character triplet consisting
> of the character "%" followed by the two hexadecimal digits (from
> "0123456789ABCDEF") which forming the hexadecimal value of the octet.
> (The characters "abcdef" may also be used in hexadecimal encodings.)
> [...]
> 
> <<<
> 
> 
> So, I guess that httpd is correct when encoding with lower case.
> 
> 
> I left the report open, just in case, but I think that it should be marked
> as FIXED, WONTFIX.

I think the RFC is pretty clear about which encoding is preferred, and it's not the one httpd is using. You seem to be using a very loose definition of "correct". There are two ways of doing it: one is preferred, the other is idiosyncratic and breaks caching. It is a simple change and the patch is attached.
Comment 3 Wim Lewis 2013-03-19 23:15:22 UTC
Apache is not incorrect here; the cache is not performing its job as well as it could: a well-written cache would compare URLs more intelligently than just a simple string compare.

The RFC does say that software should encode URLs with upper-case hex encoding, though, and many clients do have bugs like this one when it comes to comparing URLs, so I think it would be reasonable for apache to change its behavior here. ("Be strict in what you produce, but liberal in what you accept", and all that.)

http://tools.ietf.org/html/rfc3986#section-6.2 has more discussion on URL comparison and normalization.