Bug 42119

Summary: request.getCharacterEncoding misparses charset=UTF-8; xyz=3
Product: Tomcat 5 Reporter: Leigh L. Klotz, Jr. <leigh.klotz>
Component: Connector:CoyoteAssignee: Tomcat Developers Mailing List <dev>
Severity: normal CC: nigelw
Priority: P3    
Version: 5.5.23   
Target Milestone: ---   
Hardware: All   
OS: other   

Description Leigh L. Klotz, Jr. 2007-04-13 12:01:02 UTC
(This bug is also present in Coyote source 6.0.10.)

If there is an HTTP header
 Content-Type: text/abc; charset=UTF-8; xyz=3
request.getCharacterEncoding() returns "UTF-8; xyz=3" but Tomcat 4.1.24 returns

In Tomcat 4.1.24, request.getCharacterEncoding uses parseCharacterEncoding
defined in

and it correctly handles the case of other Content-Type parameters.

In Tomcat 5.5.23, however, request.getCharacterEncoding uses
getCharsetFromContentType defined in 

which does not search for a possible terminating semicolon in the charset, thus
erroneously including additional characters in the charset.

The code in 5.5.23 has a comment begins
     // Basically return everything after ";charset="

Please consider using the code from 4.1.24

This problem showed up when Content-Type was multipart/mixed and a client
specified a charset parameter to Content-Type; however, it will occur in any
Content-Type where charset is specified and is not the last parameter.
Comment 1 Mark Thomas 2007-04-14 17:26:32 UTC
This has been fixed in svn for 5.5.x and 6.0.x and will be included in the next
releases of each.

Thanks for the report.
Comment 2 Tim Whittington 2010-09-22 05:00:54 UTC
*** Bug 49960 has been marked as a duplicate of this bug. ***