Bug 54703 - Nullpointer exception in HttpParser.parseMediaType
Summary: Nullpointer exception in HttpParser.parseMediaType
Alias: None
Product: Tomcat 7
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 7.0.37
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
Depends on:
Reported: 2013-03-15 14:09 UTC by Sandy Meier
Modified: 2014-09-16 23:36 UTC (History)
3 users (show)


Note You need to log in before you can comment on or make changes to this bug.
Description Sandy Meier 2013-03-15 14:09:19 UTC
input for parseMediaType() is

multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-256; 

exception trace

        at org.apache.tomcat.util.http.parser.HttpParser.parseMediaType(HttpParser.java:215)
        at org.apache.tomcat.util.http.parser.MediaTypeCache.parse(MediaTypeCache.java:54)
        at org.apache.catalina.connector.Response.setContentType(Response.java:806)
        at org.apache.catalina.connector.Response.checkSpecialHeader(Response.java:1119)
        at org.apache.catalina.connector.Response.setHeader(Response.java:1446)
        at org.apache.catalina.connector.ResponseFacade.setHeader(ResponseFacade.java:535)
Comment 1 Mark Thomas 2013-03-15 14:56:55 UTC
New lines are not permitted in header values.
Comment 2 g.dimitrov 2013-04-16 13:42:16 UTC
Hi, please check the HTTP/1.1 RFC, http://www.ietf.org/rfc/rfc2616.txt, section 2.2. Here is a cite from it: "HTTP/1.1 header field values can be folded onto multiple lines if the continuation line begins with a space or horizontal tab".

Therefore, I reopen the bug, as I am also experiencing this issue.
Comment 3 Mark Thomas 2013-04-16 20:13:30 UTC
When an application calls setHeader (or any of the related methods) is is required to pass in the header value, not the value that will appear on the wire. Any wrapping, escaping etc. is the container's responsibility (not the application's) and new lines are not permitted.
Comment 4 Jarek Gawor 2013-05-06 19:34:31 UTC

We ran into the same problem while upgrading Geronimo (while running TCK). It would be great if you can reconsider fixing this for the following reasons:

1) This can be considered as a regression. A header value with LWS was accepted fine in Tomcat 7.0.27, for example. This, of course, makes things harded for the users as now they might be forced to change their code to work on the latest version of Tomcat.

2) The Servlet 3.0 spec does not talk about (one way or the other) how the header value should look like when using addHeader()/setHeader() API.

3) The HTTP/1.1 spec in section 2.1 talks about "implied *LWS":

      "The grammar described by this specification is word-based. Except
      where noted otherwise, linear white space (LWS) can be included
      between any two adjacent words (token or quoted-string), and
      between adjacent words and separators, without changing the
      interpretation of a field."

So that would allow LWS within almost any header value - or at least addHeader()/setHeader() should be able to deal with it.

4) The HTTP/1.1 spec in section 2.2 talks about replacing LWS with a single SP before interpreting values. I think that mostly talks about the folding LWS (on the wire format) but since Tomcat is interpreting the values it maybe should follow the same logic in addHeader()/setHeader() API.
Comment 5 Mark Thomas 2013-05-06 20:54:13 UTC
Short version:
The arguments present here aren't valid but they did cause me to look at this again and there is a case for changing the current behaviour.

Long version:
1. A regression is when valid behaviour is broken. Using CR or LF in setHeader() is not valid behaviour (see point 2) so this is not a regression. The question here is how tolerant should Tomcat be when an application presents invalid input. Those goalposts may be moved and any changes in behaviour will not get treated as regressions (I usually withdraw an in progress release if we find a regression).

2. If a HTTP header is folded, getHeader() returns the unfolded value. If it did anything else, lots of applications would break. Logically, setHeader() and getHeader() should be symmetric. Therefore, they work with the header value not the "on the wire" representation. If this was different, I'd expect to see it called out in the specification. It isn't. As an aside, cookies values are handled the same way to avoid a bunch of security issues. See CVE-2007-3385, CVE-2007-3382 & CVE-2007-5333.

3. The format used on the wire is irrelevant. See point 2.
Note that proxies are free to unfold headers if they wish. There is no point an application trying to specify that a header should be folded because a proxy can unfold it.

4. The idea here is along the right lines but there is a better way to implement it. The skipping of LWS can be pulled out into a separate function and CR and LF added to the characters that are skipped. Currently they are not handled as for input Tomcat has already unfolded all headers and for output the application shouldn't use using them.

I took a careful look at the Tomcat code while I was researching this response and I believe there is an argument for taking a different approach. Generally, Tomcat doesn't validate what gets passed to setHeader() but it does in a few cases where the header value has an impact on Tomcat's processing (Content-Type, Content-Length). Tomcat refers to these as special headers.

If the value passed to setHeader() for a special header is invalid (e.g. non-numeric content length) then Tomcat simply ignores it and passes it to the client as is on the basis the application really does know what it is doing. It is arguable that Tomcat should not do this (servers should be strict in what they send and tolerant in what they accept) but it does mean that applications are given some leeway to bend the HTTP spec if they need to on the understanding that if they shoot themselves in the foot then any resulting mess is an application responsibility (and with headers there are likely to be security implications). How far application servers should go to stop developers shooting themselves in the foot - particularly from a security perspective - is a matter of debate. Certainly validating all header values and dropping invalid ones should be more secure but it would come at a price both in code complexity (writing parsing code for all the headers in RFC2616 would not be fun) and performance.

Getting back to the original point, it is clear from the current code that the intention is to parse headers where they can be parsed and ignore them otherwise - not to validate them and reject invalid headers. Therefore a CR or LF in a header value should not cause the response to fail.

The question is whether to just ignore the invalid header (and pass it through to the client as-is) or to do as suggested in point 4 and treat all LWS as a single space. I can't see any down side to the latter approach and it has the upside if that there is data there Tomcat needs it can extract it. Therefore I will look at implementing this for the next 7.0.x release. Note that this means that any folded content-type header values passed to setHeader() and friends will be unfolded by Tomcat before it is written to the wire.
Comment 6 Mark Thomas 2013-05-07 15:55:21 UTC
Fixed in trunk and 7.0.x and will be included in 7.0.41 onwards.

As an added bonus the extra test cases identified some errors in the existing white space parsing that have also been fixed.
Comment 7 Jarek Gawor 2013-05-08 15:46:12 UTC
Thanks for reconsidering and changing the code. Things look better on our end now. Thanks again!
Comment 8 Greg Wilkins 2014-09-16 23:36:34 UTC
Note that such folded headers have been deprecated by RFC7230: http://tools.ietf.org/html/rfc7230#section-3.2.4