Bug 61351

Summary: Non-US-ASCII letters in url-mapping
Product: Tomcat 8 Reporter: Martin Nybo Andersen <tweek>
Component: UtilAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 8.5.15   
Target Milestone: ----   
Hardware: PC   
OS: Linux   
Attachments: Servlet that logs url-mappings (maven project)

Description Martin Nybo Andersen 2017-07-27 08:06:45 UTC
Created attachment 35182 [details]
Servlet that logs url-mappings (maven project)

Hi,

Starting with revision 1793440 (introduced 8.5.15) I can no longer use non-US-ASCII letters in url-mappings i web.xml. This is still true for the latest revision (revision 1803056).

This affects my setup, where I have a servlet mapped to /mælk/data .

If I URL-encode the url-mapping (/m%C3%A6lk/data) then it works again.

I've attached a simple servlet that does nothing except logging it's name and mappings to catalina.out. I have added some non-US-ASCII letters to the servlet name just to make sure that the web.xml is parsed correctly.

Kind regards,
Martin
Comment 1 Mark Thomas 2017-07-27 10:30:49 UTC
The requirement the URL patterns in web.xml must be decoded dates back to Servlet 2.3 (see r285186).

In more recent times this has been tweaked so the the charset used to do the decoding is consistent with the charset used for the web.xml file (see r1758423).

However, the expectation from the Java EE XSD is that:
<quote>
This pattern is assumed to be in URL-decoded form and must not contain CR(#xD) or LF(#xA)
</quote>

The Servlet specification also references RFC 3986 although it doesn't offer a view on where that RFC applies and where it does not.

Those do not appear to be entirely consistent.

Given the above, it is also worth noting the rare edge cases where a literal '*' or '%' needs to be used in the url-pattern.

So, where to go from here?

My current thinking is that Tomcat needs to assume the url-patterns may be partially decoded. i.e. they may contain characters not permitted by RFC 3986 and they may also contain %nn sequences that need to be decoded. Therefore, r1793440 needs to be reverted / rewritten on that basis.

I'm going to start work in this direction but if folks disagree with my analysis or think I have missed one or more important points, please do speak up.
Comment 2 Konstantin Kolinko 2017-07-27 12:00:39 UTC
Interesting analysis.

A servlet-mapping can be created by a tool. E.g. JspC:

https://svn.apache.org/viewvc/tomcat/trunk/java/org/apache/jasper/JspC.java?revision=1800816&view=markup#l1092

o.a.j.JspC.generateWebMapping()

Encoding of generated web.xml file is configurable ("-webxmlencoding" switch), but the pattern itself is simply written as

> mappingout.write(file.replace('\\', '/'));

If we are to require that url-mapping pattern is urlencoded, JspC should be adjusted for that.
Comment 3 Martin Nybo Andersen 2017-07-27 12:03:20 UTC
Hi Mark,

If the 'pattern is assumed to be in URL-decoded form', why decode it again?

Kind regards,
Martin
Comment 4 Mark Thomas 2017-07-27 16:40:07 UTC
The requirement the URL patterns in web.xml must be decoded dates back to Servlet 2.3 (see r285186).
Comment 5 Mark Thomas 2017-07-27 19:17:46 UTC
Thanks for the report. This has been fixed in trunk (for 9.0.0.M26) and 8.5.x (for 8.5.20 onwards).
Comment 6 Martin Nybo Andersen 2017-07-28 08:11:04 UTC
Thanks Mark,

Both for the explanation and the quick fix.
My url-mappings work again from r1803226. :-)

Kind regards,
Martin