Bug 62273 - Add support for alternate URL specification
Summary: Add support for alternate URL specification
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 8
Classification: Unclassified
Component: Specification APIs (show other bugs)
Version: 8.5.32
Hardware: PC All
: P1 enhancement (vote)
Target Milestone: ----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
: 62398 62415 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-04-09 14:09 UTC by Piotr Suwała
Modified: 2018-07-03 11:38 UTC (History)
3 users (show)



Attachments
HttpParser patch (3.54 KB, patch)
2018-04-11 09:55 UTC, Remy Maucherat
Details | Diff
URL patch (5.41 KB, patch)
2018-04-12 11:03 UTC, Remy Maucherat
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Piotr Suwała 2018-04-09 14:09:30 UTC
Browsers won't escape '{|}' in parameters area.
https://bugzilla.mozilla.org/show_bug.cgi?id=1451347
They state that they don't have to comply.

This link "http://localhost:8090/jira/browse/SSP-15?jql=%20text%20=%20{ksi|delta}" is completely legit for firefox and chrome.

Who should adjust?
Comment 1 Remy Maucherat 2018-04-09 15:09:23 UTC
Reading RFC 3986, the query part of the URL is the same as the rest, I don't understand the answer that it is different. I guess the assertion that the query string is different makes this worth being a separate BZ. Is RFC 3986 now out of date ?

   URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
   hier-part     = "//" authority path-abempty
                 / path-absolute
                 / path-rootless
                 / path-empty

   path-abempty  = *( "/" segment )
   path-absolute = "/" [ segment-nz *( "/" segment ) ]
   path-noscheme = segment-nz-nc *( "/" segment )
   path-rootless = segment-nz *( "/" segment )
   path-empty    = 0<pchar>

   segment       = *pchar
   segment-nz    = 1*pchar
   segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
                 ; non-zero-length segment without any colon ":"

   pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

   query         = *( pchar / "/" / "?" )

   fragment      = *( pchar / "/" / "?" )

   pct-encoded   = "%" HEXDIG HEXDIG

   unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
   reserved      = gen-delims / sub-delims
   gen-delims    = ":" / "/" / "?" / "#" / "[" / "]" / "@"
   sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="
Comment 2 Julian Reschke 2018-04-09 15:18:49 UTC
(In reply to Remy Maucherat from comment #1)
> Reading RFC 3986, the query part of the URL is the same as the rest, I don't
> understand the answer that it is different. I guess the assertion that the
> query string is different makes this worth being a separate BZ. Is RFC 3986
> now out of date ?
> ...

No, RFC 3986 is the proper spec. The problem is that some browser vendors thought it would be ok to write their own spec.
Comment 3 Mark Thomas 2018-04-09 15:38:21 UTC
No, RFC 3986 is not out of date. It appears that the browsers are working to a different spec: https://url.spec.whatwg.org/#query-state

It is the cookie debacle all over again. Sigh. At least this time there appear to be some efforts to work to a spec - even if it isn't the RFC.

The six characters where there appears to be a difference of opinion between the RFC 3986 and the spec the browsers are working to are:
"\", "^", "`", "{", "|", "}"

I guess we could consider (optionally) relaxing the query string parsing to allow these but I don't like it. We have RFCs for a reason.
Comment 4 Remy Maucherat 2018-04-09 15:50:42 UTC
Ok ! :(

In that other spec, the query string is indeed different from the path, but the path itself may be a bit different from our validation as well.

For path:
The C0 control percent-encode set are the C0 controls and all code points greater than U+007E (~).
The fragment percent-encode set is the C0 control percent-encode set and U+0020 SPACE, U+0022 ("), U+003C (<), U+003E (>), and U+0060 (`).
The path percent-encode set is the fragment percent-encode set and U+0023 (#), U+003F (?), U+007B ({), and U+007D (}). 

For the query:
"If byte is less than 0x21 (!), greater than 0x7E (~), or is 0x22 ("), 0x23 (#), 0x3C (<), or 0x3E (>), append byte, percent encoded, to url’s query"

+1 to allow this validation optionally.
Comment 5 Remy Maucherat 2018-04-10 09:27:50 UTC
Reopening as a high priority enhancement.
Comment 6 Piotr Suwała 2018-04-10 13:36:20 UTC
There is tomcat.util.http.parser.HttpParser.requestTargetAllow={|} for 8.5.x but it was dropped for 9.0.x. However it ignores these characters possibly everywhere.

There should be flag that just allows for these characters to be allowed in params query.
Comment 7 Remy Maucherat 2018-04-11 09:55:34 UTC
Created attachment 35856 [details]
HttpParser patch

According to my understanding on that URL spec wording, here's what the HttpParser could look like. Oddities include '\\'. A system property would likely be used to configure.
Comment 8 Mark Thomas 2018-04-12 10:04:32 UTC
From memory, the previous parser was lenient to the extent that it only checked for characters that were not permitted anywhere in the request target rather than checking each part individually.

My initial thoughts were something pluggable on the connector that delegated most of the work to the HttpParser which would almost certainly need some new methods. My thinking hasn't got much beyond that though.

I'm planning on putting together something to map request target part, characters allowed and specification as I want to get a clearer understanding of what is and isn't allowed where.
Comment 9 Remy Maucherat 2018-04-12 11:03:30 UTC
Created attachment 35860 [details]
URL patch

No problem if you have bigger plans, here's the last version of my patch.
Comment 10 Mark Thomas 2018-04-12 14:17:58 UTC
I've been spending some time looking at whatwg.org URL spec. It appears, from https://bugzilla.mozilla.org/show_bug.cgi?id=1451347 that the browsers consider any string that passes the URL parser as valid and that the result of that parsing is passed to the server.

The rules for writing URLs are different but they don;t appear to apply in this case since if they did, everything we expect to be encoded in a query string would be encoded.

I've been through the path parsing defined by the whatwg.org URL spec and I believe it is the same as RFC7230 / RFC3986.

I've also been over the query part of the whatwg.org URL spec and I now believe there are 8 characters we expect to be encoded that the whatwg.org URL spec does not. They are:
"[", "]", "\", "^", "`", "{", "|", "}"

I still need to check the authority part.
Comment 11 Remy Maucherat 2018-04-12 15:27:17 UTC
Given the spec test, I thought the request target was a bit different, allowing unencoded '\\', '|' and '^'.
Same conclusion for the query part.
Comment 12 Mark Thomas 2018-04-12 19:47:58 UTC
That wasn't my reading but it isn't the easiest spec to read. It is much closer to an implementation description than a spec.

I decided to do some testing to see what the current behaviour is. Results here:
https://cwiki.apache.org/confluence/display/TOMCAT/Encoding+and+URIs

It looks like tightening up the path parsing to limit '[' and ']' would cause problems. I don't see where the whatwg.org spec allows this though.

I didn't see the automatic conversion of '\' to '/' when I read the spec either.

The query string behaviour is close to what I expected although IE is worse.

The authority part of the whatwg.org spec is very relaxed for the userinfo compared to RFC 7230 / RFC 3986 but since Tomcat just ignores the userinfo I'm not too concerned about that.

Sigh.

I'm still thinking about the best way to approach this. If we have configurable parser instances they are going to have to be per connector as the URI is parsed before it can be mapped. I'm wondering if it is worth it. Different settings per connector is likely to be unusual but on the other hand it isn't that hard to implement.

One advantage of validating each part separately is that the query string may have to allow some characters we really don't want in the path like '\'.
Comment 13 Mark Thomas 2018-04-27 08:57:42 UTC
Adding extra code to Tomcat to account for specification non-compliance of other components is the wrong solution. The right solution is to open bugs against the non-compliant components. Unfortunately, in this case, those other components are all the major browser vendors and they do not accept that their behaviour is incorrect. I have yet to see a convincing argument as to why the browsers should not implement RFC 7230 and RFC 3986.

Working around the specification non-compliant browser behaviour just encourages vendors to continue to ignore specifications and leads to greater interoperability issues in the long term. However, the alternative is to break lots of applications for lots of users. Therefore, it is with regret that I have implemented this enhancement for all currently supported Tomcat versions.

Fixed in:
- trunk for 9.0.8 onwards
- 8.5.x for 8.5.31 onwards
- 8.0.x for 8.0.52 onwards
- 7.0.x for 7.0.87 onwards
Comment 14 Trevor Robinson 2018-05-17 18:53:15 UTC
This is not acceptable to be put into a bugfix version. We use square brackets for tag filtering on query parameters, causing a lot of our queries to be suddenly bounced back with a 400. Luckily this only occurred to us on our test environment, but if we had built our production at any point since this was released, because we only pin the major and minor versions, we would have had a crisis.

At worst it should be a minor, and considering that it breaks everything (as was already commented in this thread, every major browser does not adhere to this standard. In my own testing, cURL, Lynx, Node and Perl do not either!) it should probably be a major. We should not have to carefully monitor each bugfix version of the myriad software in our stack to see if basic functionality has changed.
Comment 15 Christopher Schultz 2018-05-18 22:09:40 UTC
(In reply to Trevor Robinson from comment #14)
> This is not acceptable to be put into a bugfix version.

Um... this should UN-BREAK things, not break things. I don't think you want to wait until Tomcat 10 for your brackets to start working again, do you?
Comment 16 Mark Thomas 2018-05-22 14:12:58 UTC
*** Bug 62398 has been marked as a duplicate of this bug. ***
Comment 17 Mark Thomas 2018-05-29 09:51:35 UTC
*** Bug 62415 has been marked as a duplicate of this bug. ***
Comment 18 Trevor Robinson 2018-05-31 20:12:29 UTC
(In reply to Christopher Schultz from comment #15)
> Um... this should UN-BREAK things, not break things. I don't think you want
> to wait until Tomcat 10 for your brackets to start working again, do you?

Hmm, I think I misinterpreted your comment #13. So if I have this right, this patch implements an option called 'USE_URL_LIVING_STANDARD' that allows the previous parsing behavior to continue, right? Does it default to true? From my perspective, we didn't change anything about our Tomcat setup, it just suddenly started rejecting our URLs that included square brackets when we upgraded from 8.5.29 to 8.5.31.

If it wasn't your intention to change the behavior, then I'm sorry for my initial comment's strong reaction.
Comment 19 remmeier 2018-06-28 10:01:23 UTC
what is now the conclusion of this topic? since the topic has been resolved(?). The change is still breaking and not aligned with what the major browser vendors do as far as I understand?
Comment 20 Mark Thomas 2018-06-28 10:14:24 UTC
The status is:
- Browser behaviour is inconsistent between browsers
  https://cwiki.apache.org/confluence/display/TOMCAT/Encoding+and+URIs
- Browser behaviour appears to be inconsistent with the spec the browsers are
  working to although it isn't the easiest spec to read
- Tomcat implements RFC7230 / RFC3986 by default
- Users can configure relaxedPathChars and/or relaxedQueryChars on the
  Connector as required
Comment 21 remmeier 2018-06-29 10:07:39 UTC
are there any security implications when relaxing a char like [ and ]?
Because there are other specifications like JSON API making heavy use these two characters and if so, Tomcat just becomes no option anymore for such use cases.
Comment 22 Matthew Davie 2018-06-29 10:11:24 UTC
Still unusable, tried a number of changes but still fails to work due to the RFC change. Makes Tomcat worthless to our current and future usage, surely this change should be turned off until browsers make the change.
Comment 23 Remy Maucherat 2018-06-29 10:37:18 UTC
(In reply to Matthew Davie from comment #22)
> Still unusable, tried a number of changes but still fails to work due to the
> RFC change. Makes Tomcat worthless to our current and future usage, surely
> this change should be turned off until browsers make the change.

We can examine your urls vs the new configuration options on the user list.
Comment 24 Christopher Schultz 2018-06-30 15:33:19 UTC
(In reply to remmeier from comment #21)
> are there any security implications when relaxing a char like [ and ]?
> Because there are other specifications like JSON API making heavy use these
> two characters and if so, Tomcat just becomes no option anymore for such use
> cases.

Surely, you aren't using JSON in your URLs and query strings, are you? Le's move this discussion to the users' mailing list.
Comment 25 remmeier 2018-06-30 18:29:01 UTC
no, just good to have that statement in terms of security for other people because this thing here causes a gigantic amount of issues since it affects in our case two dozen application with some beeing a bit larger and quite security sensitive. All that for something nobody else seem to care much, browser vendors included.
Comment 26 Markus Schlegel 2018-07-03 11:05:10 UTC
The Patch provided in the attachments has a USE_URL_LIVING_STANDARD switch implemented.
However, when I look at the source code of my downloaded Tomcat 8.5.32, I can not see this change (and my webapp using [ ] in the path does not work anymore...).
Am I missing something ? How should I be able to get the old behavior, where [ and ] are working as part of the URL ?
Comment 27 Mark Thomas 2018-07-03 11:38:34 UTC
(In reply to Markus Schlegel from comment #26)
> The Patch provided in the attachments has a USE_URL_LIVING_STANDARD switch
> implemented.
> However, when I look at the source code of my downloaded Tomcat 8.5.32, I
> can not see this change (and my webapp using [ ] in the path does not work
> anymore...).
> Am I missing something ? How should I be able to get the old behavior, where
> [ and ] are working as part of the URL ?

Comment #20. Final bullet point. For full details see
http://tomcat.apache.org/tomcat-8.5-doc/config/http.html#Standard_Implementation