Bug 57215

Summary: Regression in Tomcat 7.0.54 after Bug 56501 with urls starting with //
Product: Tomcat 7 Reporter: David <dcoles>
Component: CatalinaAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: regression    
Priority: P2    
Version: 7.0.54   
Target Milestone: ---   
Hardware: All   
OS: All   

Description David 2014-11-14 23:18:32 UTC
After upgrading from 7.0.52 of Tomcat to 7.0.54 we found that our application was now returning 404 resource not found errors when the request uri starts with //.
eg. We have an embedded server created and started something like:

 org.apache.catalina.startup.Embedded embedded = new Embedded();
 org.apache.catalina.Engine engine engine = embedded.createEngine();
 engine.setName("");
 embedded.setContainer(engine);
 embedded.addEngine(engine);

 ...

 String startPathContextRoot = "c:\website\data\startPath";
 org.apache.catalina.Context startPathContext = embedded.createContext("/startPath",startPathContextRoot);

 embedded.start()

Then a request to http://host:port//startPath returns 404.
Whereas at Tomcat 7.0.52 it returns what we would expect from a request to 
http://host:port/startPath.
The same behaviour is seen with requests to extended URLs eg:
  http://host:port//startPath/anotherPath.
where they end up at the servlet as expected with 7.0.53 and not with 7.0.54

Debugging this a bit I found that the problem was introduced at 7.0.53 and by the changes under 
https://issues.apache.org/bugzilla/show_bug.cgi?id=56501
which for Tomcat 7 were revision 
http://svn.apache.org/viewvc?view=revision&revision=1594028
If I run our app without these changes in at 7.0.54 then it works fine.

Looking at the changes in the revision I saw some tests were added and so I tried adding some new tests to tomcat/trunk/test/org/apache/catalina/connector/TestRequest.java which I think example the problem:

    @Test
    public void testBug56501p() throws Exception {
        doBug56501("/path", "//path", "/path");
    }

    @Test
    public void testBug56501q() throws Exception {
        doBug56501("/path", "//path/", "/path");
    }

    @Test
    public void testBug56501r() throws Exception {
        doBug56501("/path", "//path/bob", "/path");
    }

    @Test
    public void testBug56501s() throws Exception {
        doBug56501("/path", "//path/bob/", "/path");

If I run these at 7.0.53 they pass.
and running at 7.0.54 they fail with:

Testcase: testBug56501p took 0.307 sec
	FAILED
expected:</[path]> but was:</[]>

Testcase: testBug56501q took 0.275 sec
	FAILED
expected:</[path]> but was:</[]>

Testcase: testBug56501r took 0.246 sec
	FAILED
expected:</[path]> but was:</[]>

Testcase: testBug56501s took 0.32 sec
	FAILED
expected:</[path]> but was:</[]>


I can try and create this with a simple servlet/setup if required if the test additions are not enough.

David
Comment 1 Mark Thomas 2014-11-15 16:37:11 UTC
This is going to get messy.

The Javadoc for HttpServletRequest.getContextPath() says the container should not decode the returned value.

Where this gets 'interesting' is when the URI is not normalized and is encoded. For example, what gets returned for a request to "%2Ffoo%2F%2E%2E%2Fpath"?

Is it:
"%2Fpath" ?
"%2Ffoo%2F%2E%2E%2Fpath" ?

Something else?

We know (from the mapper) how many '/' characters to include in the context path. The current approach of simply searching that many '/' characters down the request URI ignores issues of normalization and encoding. Doing that counting in a normalization and encoding aware manner is probably the answer but that is non-trivial to say the least.

Fixing this bug might not solve the problem you are seeing - particularly since the unit tests you provided are using the incorrect value for the expected context path. You should probably be using ServletContext.getContextPath().
Comment 2 Mark Thomas 2014-11-17 07:41:07 UTC
I have applied a fix to this to Tomcat 9. As I suspected it is a little messy. I'm goign to leave it afew days for folks to review and comment before I back-port it to 8.0.x and 7.0.x.
Comment 3 Konstantin Kolinko 2014-11-19 00:37:05 UTC
(In reply to Mark Thomas from comment #1)
> For example, what gets returned for a request to
> "%2Ffoo%2F%2E%2E%2Fpath"?

RFC7230 2.7.3. "http and https URI Normalization and Comparison" says about http and https URIs:

   ...
   such URIs are normalized and compared according to the
   algorithm defined in Section 6 of [RFC3986]


http://tools.ietf.org/html/rfc3986
RFC3986 2.3. Unreserved Characters [1]

   Characters that are allowed in a URI but do not have a reserved
   purpose are called unreserved.  These include uppercase and lowercase
   letters, decimal digits, hyphen, period, underscore, and tilde.

      unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

   URIs that differ in the replacement of an unreserved character with
   its corresponding percent-encoded US-ASCII octet are equivalent: they
   identify the same resource.  However, URI comparison implementations
   do not always perform normalization prior to comparison (see Section
   6).  For consistency, percent-encoded octets in the ranges of ALPHA
   (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
   underscore (%5F), or tilde (%7E) should not be created by URI
   producers and, when found in a URI, should be decoded to their
   corresponding unreserved characters by URI normalizers.

RFC3986 6.2.2.2. Percent-Encoding Normalization

   The percent-encoding mechanism (Section 2.1) is a frequent source of
   variance among otherwise identical URIs.  In addition to the case
   normalization issue noted above, some URI producers percent-encode
   octets that do not require percent-encoding, resulting in URIs that
   are equivalent to their non-encoded counterparts.  These URIs should
   be normalized by decoding any percent-encoded octet that corresponds
   to an unreserved character, as described in Section 2.3.


So it looks that RFC3986 says to url-decode the above listed "unreserved" characters before performing normalization, but only them.

"%2Ffoo%2F%2E%2E%2Fpath" becomes "%2Ffoo%2F..%2Fpath" but nothing more as %2F is not decoded.

In regards to r1640083 the "canonicalContextPath.equals(candidate)" comparison looks fragile.
Comment 4 Mark Thomas 2014-11-19 07:59:06 UTC
Worth noting here that we have the system property org.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH

Regarding the fragility of canonicalContextPath.equals(candidate), better suggestions welcome.
Comment 5 Konstantin Kolinko 2014-12-01 18:53:51 UTC
(In reply to Mark Thomas from comment #4)
> 
> Regarding the fragility of canonicalContextPath.equals(candidate), better
> suggestions welcome.

The code that was added to Request class is located far from the code that performs decoding and mapping (CoyoteAdapter) and one that performs URL-decoding (UDecoder) and it is hard to compare those and keep in sync.

Comparing the code highlighted an issue -> 1.

1. Using UDecoder.URLDecode(candidate) + canonicalContextPath.equals(candidate) is broken, as URLDecode() without second argument uses ISO-8859-1 charset. The equals() may return false.

2. Move the code to CoyoteAdapter.postParseRequest(). Evaluate the value there only once.

3. In unexpected situations, error out (400) instead of falling through.

4. Maybe add an utility methods to UDecoder to search for next decoded '/' in a ByteChunk?


5. In CoyoteAdapter.postParseRequest() when decodedURI.getType() is not bytes (e.g. when requestURI is changed by RewriteValve), normalization is skipped. I think that it should not be skipped.
Comment 6 Mark Thomas 2014-12-01 22:02:46 UTC
I've fixed the issue identified in 1.

Regarding 2, that would cause the code to be executed for every request when it is only likely to be used for a small percentage.

3 makes sense if we do 2 but I don't think 2 is the way to go.

4 I'm neutral on.

5 I believe that was a deliberate implementaion decision. I don'tthink we need to revisit it as part of this unless suggestion 2 is followed.


The key issue is whether or not to follow suggestion 2. I'm currently leaning towards not because of performance but am prepared to be convinced otherwise.
Comment 7 Christopher Schultz 2014-12-01 22:18:42 UTC
Any reason not to have code available that does (2) but in a lazy way? That is, a utility method that can do the work to produce the result and also cache the value in the request in case it's requested again? Then, only call that utility method when the value is actually needed?
Comment 8 Konstantin Kolinko 2014-12-02 09:33:01 UTC
(In reply to Konstantin Kolinko from comment #5)
> 5. In CoyoteAdapter.postParseRequest() when decodedURI.getType() is not
> bytes (e.g. when requestURI is changed by RewriteValve), normalization is
> skipped. I think that it should not be skipped.

Skipping url-decoding step is also wrong. If RewriteValve provides a non-encoded requestUri, it means that there is a bug in RewriteValve.

Web Application should assume that requestURI needs url-decoding. It cannot find out that url-decoding shall be skipped. Implementation of Request.getContextPath() in r1640083/r1642766 is an example of a victim of this bug. It always performs url-decoding.

>> 3. In unexpected situations, error out (400) instead of falling through.
>
> 3 makes sense if we do 2 but I don't think 2 is the way to go.

I do not like that Request.getContextPath() falls through to returning requestUri. It may result in security issues.
Comment 9 Mark Thomas 2014-12-10 21:31:39 UTC
In theory issue 3 should never happen. Therefore, I have changed the code to throw an ISE rather than return the uri.
Comment 10 Mark Thomas 2014-12-12 15:59:31 UTC
I've added some unit tests to trunk for 5 and made the necessary fixes so that they pass. I believe these fixes are now ready for back-port.
Comment 11 Mark Thomas 2014-12-12 16:42:58 UTC
I have back-ported this fix to 8.0.x (for 8.0.16 onwards) and to 7.0.x (for 7.0.58 onwards).