Bug 42119 - request.getCharacterEncoding misparses charset=UTF-8; xyz=3
Summary: request.getCharacterEncoding misparses charset=UTF-8; xyz=3
Alias: None
Product: Tomcat 5
Classification: Unclassified
Component: Connector:Coyote (show other bugs)
Version: 5.5.23
Hardware: All other
: P3 normal (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
: 49960 (view as bug list)
Depends on:
Reported: 2007-04-13 12:01 UTC by Leigh L. Klotz, Jr.
Modified: 2010-09-22 05:00 UTC (History)
1 user (show)


Note You need to log in before you can comment on or make changes to this bug.
Description Leigh L. Klotz, Jr. 2007-04-13 12:01:02 UTC
(This bug is also present in Coyote source 6.0.10.)

If there is an HTTP header
 Content-Type: text/abc; charset=UTF-8; xyz=3
request.getCharacterEncoding() returns "UTF-8; xyz=3" but Tomcat 4.1.24 returns

In Tomcat 4.1.24, request.getCharacterEncoding uses parseCharacterEncoding
defined in

and it correctly handles the case of other Content-Type parameters.

In Tomcat 5.5.23, however, request.getCharacterEncoding uses
getCharsetFromContentType defined in 

which does not search for a possible terminating semicolon in the charset, thus
erroneously including additional characters in the charset.

The code in 5.5.23 has a comment begins
     // Basically return everything after ";charset="

Please consider using the code from 4.1.24

This problem showed up when Content-Type was multipart/mixed and a client
specified a charset parameter to Content-Type; however, it will occur in any
Content-Type where charset is specified and is not the last parameter.
Comment 1 Mark Thomas 2007-04-14 17:26:32 UTC
This has been fixed in svn for 5.5.x and 6.0.x and will be included in the next
releases of each.

Thanks for the report.
Comment 2 Tim Whittington 2010-09-22 05:00:54 UTC
*** Bug 49960 has been marked as a duplicate of this bug. ***