Bug 62459

Summary: mod_jk: Forwarding URLs containing escaped slashes (e.g. for REST services) fail with syntactical-wrong double-escaping
Product: Tomcat Connectors Reporter: Guido Jäkel <G.Jaekel>
Component: mod_jkAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED WONTFIX    
Severity: normal    
Priority: P2    
Version: 1.2.42   
Target Milestone: ---   
Hardware: Other   
OS: Linux   
Attachments: proposed patch to avoid double encoding of an encoded slash

Description Guido Jäkel 2018-06-15 10:58:53 UTC
The RFC defines that it's not forbidden to use a slash ('/') itself within the entity payload of a path element of an URL, but this have to be encoded in the well known manner ("%2F").

One (like us) will need the feature for an implementation of REST-Service.


For reasons of security, one have to enable the processing of slash as a allowed character of the URL, one have explicit to enable this inside a VHost of the Apache httpd, first. This option is named 'AllowEncodedSlashes'. With the default ('Off'), an URL containing an encoded slash will be rejected (with an RC 404). A value of "On" will allow the processing, but the incoming '%2F' will be unescaped with no care of the semantic, which is pointless if it's inside a path element. The value "NoDecode" will allow "%2F" an keep it encoded.

Now, it's up to mod_jk to forwarded this URL to the application server. There are four implemented methods, choosen by '+ForwardURIxxx'. The current default (since 1.2.24) is called "+ForwardURIProxy" which may be treated as an successor of the former "ForwardURICompatUnparsed" (1.2.23) and "ForwardURICompat" (1.2.22)

I think the current "+ForwardURIProxy" is meant to address and solve such problems. But it don't catch this here. It's said that ["the forwarded URI will be partially reencoded after processing inside Apache and before forwarding"]. And this will be done without care about the semantic part of the URL. Here, the   precent character of the "saved" representation of the slash (as '%2F') will be re-encoded as '%25'. This result into forwarding an syntactical wrong URL containing the string '%252F' to the application server.

As a "unacceptable" workaround, one may use the mode "+ForwardURICompatUnparsed" for mod_jk. This will advise mod_jk to forward the original unprocessed copy of the incoming URL instead of the httpd enginge current representation. This override all other URL processing done so far by other modules; with other words you can't use any other tooling like mod_rewrite in this case. 

It may be difficult to detect at the code path of mod_jk that some levels before some code within the httpd have "saved" the escaped slash. But I think it should be correct by heuristic to *not* encode a percent ('%') if (and only if) it's followed by the hexdigits '2F' at this place, because there should be no other source for this as from the httpd option "AllowEncodedSlashes NoDecode". A correct implementation will take account of the semantic parts of the URL, the value of this option and will avoid the wrong double-escaping ('%252F') just in the right case.
Comment 1 Guido Jäkel 2018-06-26 14:17:26 UTC
I think the issue have to be handled within  native/common/jk_url.c , L.116ff. at function jk_canonenc().

Here, the percent sign isn't in "allowed", therefore it will be escaped.
 
But concerning the issue, the incoming string have the sequence '%2F', which MUST NOT be not re-encoded. I'm not sure to claim that ANY legal '%hh' MUST NOT be re-encoded at this point. Maybe this will be more the truth if one take care of the actual semantic position inside the URL pattern.

If I'm able to get spare time, i'll try to propose a patch.

/*
 * Convert a URL-encoded string to canonical form.
 * It encodes those which must be encoded, and does not touch
 * those which must not be touched.
 * String x must be '\0'-terminated.
 * String y must be pre-allocated with len maxlen
 * (including the terminating '\0').
 */
int jk_canonenc(const char *x, char *y, int maxlen)
{
    int i, j;
    int ch = x[0];
    char *allowed;  /* characters which should not be encoded */
    char *reserved; /* characters which much not be en/de-coded */

/*
 * N.B. in addition to :@&=, this allows ';' in an http path
 * and '?' in an ftp path -- this may be revised
 */
    allowed = "~$-_.+!*'(),;:@&=";
    reserved = "/";

    for (i = 0, j = 0; ch != '\0' && j < maxlen; i++, j++, ch=x[i]) {
/* always handle '/' first */
        if (strchr(reserved, ch)) {
            y[j] = ch;
            continue;
        }
/* recode it, if necessary */
        if (!jk_isalnum(ch) && !strchr(allowed, ch)) {
            if (j+2<maxlen) {
                jk_c2hex(ch, &y[j]);
                j += 2;
            }
            else {
                return JK_FALSE;
            }
        }
        else {
            y[j] = ch;
        }
    }
    if (j<maxlen) {
        y[j] = '\0';
        return JK_TRUE;
    }
    else {
        return JK_FALSE;
    }
}
Comment 2 Guido Jäkel 2018-06-27 08:25:29 UTC
Created attachment 35991 [details]
proposed patch to avoid double encoding of an encoded slash

I propose the attached patch. It will skip over the encoding of the percent character ('%') to '%25', if it is followed by the string '2F' or it's lower case representation. Because this following char's are "legal", just passing over the '%' unchanged and continue the loop will do the job. The test condition is robust against an incomplete '%xx' as long as the incoming string is '\0'-terminated as also assumed by the for loop.


Applying this patch allows me to switch to the mode 'ForwardURIProxy' again (instead of using 'ForwardURICompatUnparsed' as a workaround) for our REST-usecase  with use a data representation which contain encoded slashes in path elements.
Comment 3 Mark Thomas 2018-08-21 17:45:00 UTC
Can you please provide an example of a URI presented by a client that demonstrates this issue.
Comment 4 Guido Jäkel 2018-08-21 20:46:54 UTC
(In reply to Mark Thomas from comment #3)
> Can you please provide an example of a URI presented by a client that
> demonstrates this issue.
Dear Mark,

A real world example is

  http://nbn-resolving.org/urn:nbn:de:gbv:7-11858%2F00-1735-0000-0001-B8C0-F-4

, it will answer with a redirect to

  http://ediss.uni-goettingen.de/handle/11858/00-1735-0000-0001-B8C0-F

Without the patch, this will forwarded as

  /urn:nbn:de:gbv:7-11858%252F00-1735-0000-0001-B8C0-F-4


This example is taken from real production which uses "JkOptions +ForwardURICompatUnparsed" at the moment as a workaround. But with the suggested patch (as proved in our test stage), one may use "+ForwardURIProxy", again.

In addition, you have to use "AllowEncodedSlashes NoDecode" in the corresponding VHost, too. I was not able to use this as a global option, this seems to be another issue.
Comment 5 Mark Thomas 2018-08-22 09:38:02 UTC
Thanks. I see what you are trying to do now. This is going to be an interesting problem to solve. I suspect that it will require fixes / changes in multiple components.

For those following along, take a look at the examples here:
https://restfulapi.net/resource-naming/

The issue at hand is that if a resource ID contains a "/" it needs to be encoded else the URI will not be interpreted correctly.

Other relevant information:
RFC 3986 states (section 2.2) that a %nn encoded delimiter is NOT equivalent to the decoded form.

That begs the question at what point - if any - should Tomcat decode these sequences.
Comment 6 Guido Jäkel 2018-08-22 10:47:39 UTC
(In reply to Mark Thomas from comment #5)
> Thanks. I see what you are trying to do now. This is going to be an
> interesting problem to solve. I suspect that it will require fixes / changes
> in multiple components.

For my requirements, no other fixes as the attached patch to mod_jk are needed. I don't take a look on Tomcat because i'm using Wildfly as the backend. Also, i', using AJP as the transport and i don't look on HTTP here.

> For those following along, take a look at the examples here:
> https://restfulapi.net/resource-naming/

IMHO the RFC states that an URL/URI consist of (others and) path elements. This path elements are separated/delimited by a slash character ('/'). Therefore, if the payload of a path element contain a slash, this MUST be percent-encoded.

The current semantic behavior of mod_jk is to escape "everything that need this". And that's correct for the most cases. The Apache httpd framework have already "normalized" the URL many steps before. If mod_jk find a special character here, it have to encode it for a proper transport to the downstream server. If there is a single percent sign here (after decoding by the httpd!), it have to be escaped to '%25'. 

The exception here is that the httpd is instructed (by AllowEncodedSlashes NoDecode) to keep some slashes (only within path elements) undecoded. There is no special "information tagging" about this, but the only source for the apperance of the character sequence '%2F' at this point within the mod_jk code is that it is the result of this intended unescaping of a slash. Note that this holds only for this entity 'slash' used as the path element separator and not for and %XX at common.

For the convenience to see what i'm talking about, I embed here the content of the short patch:

--- native/common/jk_url.c.20150101-212250	2015-01-01 21:22:50.000000000 +0100
+++ native/common/jk_url.c	2018-06-27 09:12:28.250361091 +0200
@@ -88,6 +88,11 @@
             y[j] = ch;
             continue;
         }
+/* don't double-escape a following '%2F' ('/'), just pass the '%' and continue */
+        if (ch == '%' && x[i+1] == '2' && strchr("fF", x[i+2])) {
+            y[j] = ch;
+            continue;
+        }
 /* recode it, if necessary */
         if (!jk_isalnum(ch) && !strchr(allowed, ch)) {
             if (j+2<maxlen) {
Comment 7 Mark Thomas 2018-08-22 11:45:11 UTC
The proposed patch enables a directory traversal vulnerability with the default configuration. Therefore, it can't be applied in its current form.

Even if the patch's behaviour is restricted to only be enabled with:

AllowEncodedSlashes NoDecode

mod_jk needs to differentiate between %252F and %2F in the original URI to correctly re-encode the processed (mod_rewrite etc.) URI which it is not going to be able to do in all circumstances. The problem is that both "%252F and "%2F" are identical in decoded form if "%2F" is not decoded and there is no way to tell them apart.

The only viable option is to use:

JkOptions     +ForwardURICompatUnparsed

which has the significant disadvantage that mod_rewrite etc. cannot be used. Unfortunately, I don't see any other alternatives at this point.
Comment 8 Guido Jäkel 2018-08-22 12:15:30 UTC
(In reply to Mark Thomas from comment #7)

Thank you a lot for discussion!

> The proposed patch enables a directory traversal vulnerability with the
> default configuration. Therefore, it can't be applied in its current form.
> 
> Even if the patch's behaviour is restricted to only be enabled with:
> 
> AllowEncodedSlashes NoDecode
> [...]

To point this out: You see a "directory traversal" issue IF AllowEncodedSlashes is OFF? Maybe here, in this function. But then, an URL containing slashes will rejected by the http anyway. Please explain, if i'm wrong with this.


> mod_jk needs to differentiate between %252F and %2F in the original URI to
> correctly re-encode the processed (mod_rewrite etc.) URI which it is not
> going to be able to do in all circumstances. The problem is that both "%252F
> and "%2F" are identical in decoded form if "%2F" is not decoded and there is
> no way to tell them apart.

Yes, this is the core point. I'm not familiar with this software, but (IMHO) there  CANNOT a '%252F' appear here. And if, then it's garbage-in because there incoming URL can be expected to be fully decoded here.

If you think (or even know), this CAN and is valid -- e.g. for some other part of the URL than the path elements section, THEN my suggestion may be extended to have the "syntactical knowledge" to act only on path elements.


> The only viable option is to use:
> 
> JkOptions     +ForwardURICompatUnparsed
> 
> which has the significant disadvantage that mod_rewrite etc. cannot be used.
... and is a non-option for us because we need exactly this other things, too. And in the meanwhile, it's urgent.
Comment 9 Mark Thomas 2018-08-22 14:41:53 UTC
(In reply to Guido Jäkel from comment #8)
> To point this out: You see a "directory traversal" issue IF
> AllowEncodedSlashes is OFF?

Yes.

> Maybe here, in this function. But then, an URL
> containing slashes will rejected by the http anyway.

JkMount /examples/*

URI /examples/foo/..%252F../docs

Now gets passed as /examples/foo/..%2F../docs

What happens then depends on what back-end the URI is passed to and how that back-end is configured. Some will reject it. Some will process it. For those that process it, this is a directory traversal.


> Yes, this is the core point. I'm not familiar with this software, but (IMHO)
> there  CANNOT a '%252F' appear here.

I see no reason why a user agent would be unable to present "%252F". Keep in mind that any solution has to be completely generic, not specific to any one use case and there may well be an ID scheme somewhere where a valid, unencoded ID includes the sequence "%2F" which would then be encoded as "%252F" when it appears in the path.

> > The only viable option is to use:
> > 
> > JkOptions     +ForwardURICompatUnparsed
> > 
> > which has the significant disadvantage that mod_rewrite etc. cannot be used.
> ... and is a non-option for us because we need exactly this other things,
> too. And in the meanwhile, it's urgent.

What you are asking for is logically impossible. If mod_jk sees the sequence "%2F" it has no way to determine if this is the result of decoding "%252F" or not decoding "%2F". Therefore it cannot correctly reverse the encoding.
Comment 10 Guido Jäkel 2018-08-22 19:24:36 UTC
(In reply to Mark Thomas from comment #9)
> (In reply to Guido Jäkel from comment #8)
> > To point this out: You see a "directory traversal" issue IF
> > AllowEncodedSlashes is OFF?
> 
> Yes.
> 
> > Maybe here, in this function. But then, an URL
> > containing slashes will rejected by the http anyway.
> 
> JkMount /examples/*
> 
> URI /examples/foo/..%252F../docs
> 
> Now gets passed as /examples/foo/..%2F../docs
> 
> What happens then depends on what back-end the URI is passed to and how that
> back-end is configured. Some will reject it. Some will process it. For those
> that process it, this is a directory traversal.

I fully agree: It's in the concern of the backend - in the same way it's in the concern of the httpd frontend parser to deal with it.


> What you are asking for is logically impossible. If mod_jk sees the sequence
> "%2F" it has no way to determine if this is the result of decoding "%252F"
> or not decoding "%2F". Therefore it cannot correctly reverse the encoding.

Maybe this shows us the core problem and and a solution: "AllowEncodedSlashes" must *replace* the encoded slash ('%2F') by a *distinguishable* special representation and this special representation have to bee re-encoded later (before any proxying by mod_jk or others) to the encoded slash.

Perhaps this special representation might be coded as an "illegal percent encoding" using other digits as the hexadecimals, e.g. '%%/' 


Mark, may you please support me by a code snipped to check the actual value of "AllowEncodedSlashes" within the mod_jk's jk_canonenc function or in the context of it's caller?
Comment 11 Rainer Jung 2018-08-22 20:18:47 UTC
Hi Guido,

I didn't have the time to follow the discussoin in detail, but would using

AllowEncodedSlashes NoDecode

help in any way?

I think we as just one module can not simply define our special sequence as e,.g. %%. But if you want to do experiments, you can check for the current AllowEncodedSlashes settings like this:

...
#include http_core.h
...

core_dir_config *cfg;
cfg = (core_dir_config *)ap_get_core_module_config(r->per_dir_config);

if (!cfg->allow_encoded_slashes) {
    /* AllowEncodedSlashes Off */
} else if (!cfg->decode_encoded_slashes) {
    /* AllowEncodedSlashes Nodecode */
} else {
    /* AllowEncodedSlashes On */
}

The snippet does not work in jk_canonenc, because that function is a standalone function called by Apache httpd and by IIS, so it does not know about Apache specifics. But you can use it for example in mod_jk.c before calling jk_canonenc there.
Comment 12 Guido Jäkel 2018-08-22 22:20:42 UTC
Dear Mark,

I miss something important about the "directory traversal": If is set "AllowEncodedSlashes NoEncode" and "JkOptions +ForwardURICompatUnparsed", then with the example setup, the URI

  /examples/foo/..%2F../doc

is also passed as is to the backend ad it's also up to the backend to do the right thing, i.e. not to treat '%2F' in a path element as a '/'.

Therefore, my patch don't "introduce" this "challenge" for the backend, it just prevent mod_jk from breaking the URL with "JkOptions +ForwardURIProxy".
Comment 13 Guido Jäkel 2018-08-22 22:24:45 UTC
(In reply to Rainer Jung from comment #11)
> Hi Guido,
> 
> I didn't have the time to follow the discussoin in detail, but would using
> 
> AllowEncodedSlashes NoDecode
> 
> help in any way?
> 

Dear Rainer,

i would you *please* to take the time to read it carefully. This "NoDecode" is a prerequisite at all.

yours
Guido
Comment 14 Rainer Jung 2018-08-23 07:44:01 UTC
(In reply to Mark Thomas from comment #9)
> What you are asking for is logically impossible. If mod_jk sees the sequence
> "%2F" it has no way to determine if this is the result of decoding "%252F"
> or not decoding "%2F". Therefore it cannot correctly reverse the encoding.

It might become too complex, but httpd copies the original URI to r->unparsed_uri and I think that one isn't decoded in any way. So we could in theory check, whether there's a "%25" or "%25F" or "%25f" sequence in the original URI. e.g. if there's no "%25" it seems we should be safe in terms of double decoding, if there's no "%25f" or "%25F" we should at least be safe of double decoding a slash.

There can be some holes in this attempt, e.g. a RewriteRule might change the URL and introduce "%25" (or "%25F" or "%25f") in the rewritten decoded URL, which will not change the original unparsed_uri, but the one we need to jk_canonenc(). So the bahavior to check unparsed_uri and rely on it might need to be an optional one, off by default.

Is this a direction we should try? Or do we open a new the directory traversal problem here?
Comment 15 Mark Thomas 2018-08-23 08:43:15 UTC
Please stop changing the resolution of this issue. The correct resolution is WONTFIX.
Comment 16 Guido Jäkel 2018-08-23 09:06:41 UTC
(In reply to Mark Thomas from comment #15)
> Please stop changing the resolution of this issue. The correct resolution is
> WONTFIX.

Sorry, but I don't change it by intention.

(In reply to Guido Jäkel from comment #8)
> If you think (or even know), this CAN and is valid -- e.g. for some other
> part of the URL than the path elements section, THEN my suggestion may be
> extended to have the "syntactical knowledge" to act only on path elements.

I examine the code at  apache-2.0/mod_jk.c::init_ws_service()  , where  commmon/jk_url.c::jk_canonenc()  is called: The stringbuffer passed here to encode consist of the pure path section only, the other URI elements like the query sting or host name part are separate. Therefore, there is no need to implement a syntactical knowledge of the URI parts into  jk_cononenc()  at the moment -- it will act on the path and a '%2F' appear here, it must be the result of a "encoded slash". And this is either enabled by AllowEncodedSlashes or rejected upstream before.

It is also used once just here (and in the same matter for iis and netscape) and in the case that 

[...]
    case JK_OPT_FWDURICOMPATUNPARSED:
        s->req_uri = r->unparsed_uri;
        if (s->req_uri != NULL) {
            char *query_str = strchr(s->req_uri, '?');
            if (query_str != NULL) {
                *query_str = 0;
            }
        }

        break;

    case JK_OPT_FWDURICOMPAT:
        s->req_uri = r->uri;
        break;

    case JK_OPT_FWDURIPROXY:
        size = 3 * (int)strlen(r->uri) + 1;
        s->req_uri = apr_palloc(r->pool, size);
        jk_canonenc(r->uri, s->req_uri, size);       <------
        break;

    case JK_OPT_FWDURIESCAPED:
        s->req_uri = ap_escape_uri(r->pool, r->uri);
        break;

    default:
        return JK_FALSE;
    }
[...]

In this case, the '%2F' is intentional allowed (keeping in mind the potential directory traversal issue of a buggy backend). And the task of this patch is to prevent mod_jk from breaking this from '%2F' into '%252F' by replacing the first percent char of by '%25'


A remaining question is: What's about the sequence '%252F' in the path section of an incoming URL. This issue pointed out by Mark is in fact an unresolved, but IMHO not in the scope of the mod_jk part but in the httpd parser.

I agree with Mark that the semantic of this literal should represent an encoded <percent sign> followed by <digit 2><letter F>. And not an encoded slash. But the starting '%25' is decoded by the upstrem parser to an '%' and from that, jk_canonenc() get a '%2F'. Without the patch, this would be encoded into '%252F' again. With the patch, it would left as '%2F', and the semantics changes. 

Said that, I think the upstream parser in addition SHOULD NOT decode a '%25' if 'AllowEncodedSlashes' is enabled. Because then  jk_canonenc  gets a '%252F' and the patch might be extended to leave '%25' untouched, too.

Grüße

Guido
Comment 17 Michael Osipov 2019-03-11 23:02:57 UTC
Guys, where is the explanation for the WONTFIX status?