Bug 62912 - Tomcat adds a space character in the Content-Type header if this one has a ; character right after
Summary: Tomcat adds a space character in the Content-Type header if this one has a ; ...
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 8
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 8.5.x-trunk
Hardware: PC All
: P2 enhancement with 5 votes (vote)
Target Milestone: ----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-15 11:05 UTC by Franos
Modified: 2020-04-03 18:24 UTC (History)
0 users



Attachments
The war to be deployed (4.91 KB, application/zip)
2018-11-15 16:51 UTC, Franos
Details
Tomcat 9 patch to retain app provided content-type (2.46 KB, patch)
2019-01-23 21:48 UTC, Mark Thomas
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Franos 2018-11-15 11:05:41 UTC
Hello,

I have written a servlet answering to an HTTP GET which set a response Content-Type having a value with a semi-column (eg ;).
The Content-Type looks like: application/xxx.yyy-data;version=1.0
In the servlet code I have response.setContentType("application/xxx.yyy-data;version=1.0");

Tried on both 8.5.35 and 9.0.13, in both environments, the real Content-Type sent back by the server is: application/xxx.yyy-data; version=1.0
So Tomcat adds a space character right after the semi-column.

I think it's a bug.

Could you please provide a fix to prevent Tomcat to add a space character after the semi-column ?

Best Regards.
Comment 1 Mark Thomas 2018-11-15 11:52:54 UTC
Tomcat has to separate any charset from the provided content-type and store the charset and content-type (minus charset) separately. Extraction of the charset can be tricky. Historically there were a few bugs in this area until we switched to using a full parser. Similarly, generating content-type minus the charset had difficulties. Therefore, this value is generated by the parser from the constituent parts.

From RFC 7231:

     Content-Type = media-type
     media-type = type "/" subtype *( OWS ";" OWS parameter )
     type       = token
     subtype    = token

White space after the semi-colon is optional but valid. If a user-agent is unable to parse this correctly then that is a bug in the user-agent, not in Tomcat.

Normally, I'd close bugs like this as WONTFIX but recalling a similar issue I dug into the history a little:
bug 53814
bug 52811

The original intention was not to include a space after the semi-colon. It was added as a work-around for a popular but buggy client (Adobe Reader 9 on IE). It should be possible to remove the work-around but bitter experience makes me fear what else this might break.

I'd like to remove the work-around, mainly as it reduces (very marginally) the network traffic per request and removes a few lines of code but that is probably something to consider for Tomcat 10 given the possibility of user-agent breakage.

At this point, fixing the bug in the user-agent parsing this header so it can handle the header with or without the optional white space looks like the best solution.
Comment 2 Franos 2018-11-15 16:40:41 UTC
Hello Mark,

Not sure to have really understand what you said.
You talk about charset but the content I want to deliver to the client is a binary content.

My disappointment is that I set a content type for the response that seems to be modified by Tomcat. Indeed, I set it to "application/xxx.yyy-data;version=1.0" and  the client get a Content-Type set to "application/xxx.yyy-data; version=1.0": a space character has been introduced right after the ; character.

I have tested this with different User-Agent s.

Curl:
Request headers:
       User-Agent: curl/7.40.0
       Host: localhost:8080
       Accept: */*
Response headers:
       HTTP/1.1 200
       Content-Type: application/xxx.yyy-data; version=1.0
       Transfer-Encoding: chunked
       Date: Thu, 15 Nov 2018 15:16:06 GMT

Firefox:
Request headers:
       User-Agent: Mozilla/5.0 (Windows NT 6.1; W…) Gecko/20100101 Firefox/62.0
       Host: localhost:8080
       Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Response headers:
       HTTP/1.1 200
       Content-Type: application/xxx.yyy-data; version=1.0
       Transfer-Encoding: chunked

Chrome:
Request headers:
       User-Agent:  Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
       Host: localhost:8080
       Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Response headers:
       HTTP/1.1 200
       Content-Type: application/xxx.yyy-data; version=1.0
       Transfer-Encoding: chunked
       Date: Thu, 15 Nov 2018 16:20:20 GMT

So whatever the User-Agent used, the Content-Type is application/xxx.yyy-data; version=1.0 (with a space character right after the ; character) whereas I have written in my servlet code response.setContentType("application/xxx.yyy-data;version=1.0"); (no space character after the ; character).

Is it clearer ?

Best Regards.
Comment 3 Franos 2018-11-15 16:43:52 UTC
Hello,

I can provide you a test case if you want ? Just tell me.

Best Regards.
Comment 4 Mark Thomas 2018-11-15 16:50:30 UTC
I'll be clearer.

There is no Tomcat bug here.

Tomcat's response is consistent with what the Servlet requested and compliant with RFC 7201.

You haven't explained what the problem is but the working assumption is that the user agent (client) you are using can't handle the space Tomcat inserts. That is a bug in the user agent and should be fixed there.
Comment 5 Franos 2018-11-15 16:51:42 UTC
Created attachment 36262 [details]
The war to be deployed

Then target http://localhost:8080/MyServletReturningContentTypeWithComma/test
Comment 6 Franos 2018-11-15 17:13:44 UTC
Hello Thomas,

I try to figure out what you're saying.

If I have well understood:
     - If, in the Content-Type value you have a semi-column followed by a string, you can have or not (because you mention it is optional) after the semi-column a space character, right ?
     - The current Tomcat implementation is to add a space character after the semi-column in any case.

What is quite disturbing to me is:
     - space character is optional as you mentioned
     - so why if, in the servlet code, you set a Content-Type with a ; character   
       and no space after, even if the space character is optional, you add this  
       one just after the ;

Yes you are right when saying it could be fixed at client-side but sometimes there are some situations where some client versions couldn't be updated. This is our case: those client versions are already deployed on the field and no easy way to update them unfortunately. 

So is there any way to have a Tomcat (hidden) setting, in order to not have a space character after a ; character.

Best Regards.
Comment 7 Christopher Schultz 2018-11-15 20:23:25 UTC
(In reply to Franos from comment #6)
> What is quite disturbing to me is:
>      - space character is optional as you mentioned
>      - so why if, in the servlet code, you set a Content-Type with a ;
> character   
>        and no space after, even if the space character is optional, you add
> this  
>        one just after the ;

This is being added because the header you are adding must be parsed by Tomcat *just in case there is a character set present* so it can be specially-handled. Since Tomcat parses the content-type (as set by the application), it can re-assemble the content-type header from the parsed values.

> Yes you are right when saying it could be fixed at client-side but sometimes
> there are some situations where some client versions couldn't be updated.
> This is our case: those client versions are already deployed on the field
> and no easy way to update them unfortunately. 

Okay, so we have some clients that vitally depend upon the space being added and other clients that vitally depend upon the space *not* being added. Why should your clients win over the others?

> So is there any way to have a Tomcat (hidden) setting, in order to not have
> a space character after a ; character.

The real question is "why is Tomcat bothering to re-format the content-type header when it does not have to do so?".

I could see an argument for a "don't mutate content-type headers when no charset is present", but that's not what you asked for.
Comment 8 Mark Thomas 2018-11-15 23:32:02 UTC
Chris, you make a good point about not mutating the value unnecessarily. I did look briefly at what would be involved. It looks simple to do. My concern is that it has broadly the same potential to trigger regressions as removing the space - although at least ones that could be fixed by an application change. Probably something else to consider for Tomcat 10 alongside removing the space.
Comment 9 Franos 2018-11-16 08:51:11 UTC
Hello,

> Okay, so we have some clients that vitally depend upon the space being added 
> and other clients that vitally depend upon the space *not* being added. Why 
> should your clients win over the others?

I think that in most cases the provider of a server side solution validates their developments with a fixed list of clients (eg list of User-Agent s). In such case we can control the response (HTTP header included) to be sent to those clients depending of the value read from the request in the User-Agent HTTP header.
So why Tomcat is changing a value we have set at server side and we know it works with the list of User-Agent s  we want to support ?
More, in the tests I have performed, if the content sent to the client is character based, Tomcat appends automatically ";charset=ISO-8859-1", for example, with no space. So why sometimes, we have space and sometimes we haven't ?


> The real question is "why is Tomcat bothering to re-format the content-type 
> header when it does not have to do so?".

Yes, this is indeed the question I have.

> I could see an argument for a "don't mutate content-type headers when no 
> charset is present", but that's not what you asked for.

So consider I ask for that.

Best Regards.
Comment 10 Christopher Schultz 2018-11-16 17:07:21 UTC
Note that this is being discussed on the dev list a bit. See r1846691.
Comment 11 Franos 2018-11-19 08:14:01 UTC
Hello,

I have looked at that.
Thanks.

My worry is that it's scheduled for Tomcat 10.
We really need that on Tomcat 8.5.x and 9.x branches.

Best Regards.
Comment 12 romain.manni-bucau 2018-11-21 13:03:27 UTC
Hi guys,

isn't it possible to fix this issue since the space has this comment (org.apache.tomcat.util.http.parser.MediaType#toString):

> // Workaround for Adobe Read 9 plug-in on IE bug
> // Can be removed after 26 June 2013 (EOL of Reader 9)
> // See BZ 53814

Guess we are after 2013 so maybe time to drop it?

Since space or not is valid from a spec perspective and the servlet spec let the developper responsible to set the right content type (5.2 of servlet 4) then the dev should be the one able to decide to use a space or not. For the case tomcat has to build the content type (character encoding usage) I guess the delimiterString (";" or "; ") can be configured?

Romain
Comment 13 Mark Thomas 2019-01-23 21:19:05 UTC
Moving this to an enhancement. The bug is in the user agent, not Tomcat.

Given that:
- we have examples of broken clients that fail both with a space and without a space
- both with a space and without a space a spec compliant

If we pick one option and hard-code it, experience tells us that it will break for someone at some point. While we could simply respond "Fix the broken user agent" I think Chris's idea is worth exploring.

Assuming the code is minimal, it should be doable for Tomcat 10. The bigger question is whether the fix is considered safe enough for back-port.

I'll put together a patch and start a discussion on the dev list.
Comment 14 Mark Thomas 2019-01-23 21:48:52 UTC
Created attachment 36389 [details]
Tomcat 9 patch to retain app provided content-type

The application provided content-type is only retained if no charset is present.
Comment 15 Mark Thomas 2020-04-03 18:24:43 UTC
Applied to master for 10.0.0-M5