Bug 49960

Summary: HttpServletRequest.getCharacterEncoding does not break up the Content-Disposition header well.
Product: Tomcat 5 Reporter: Nige <nigelw>
Component: CatalinaAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED DUPLICATE    
Severity: normal    
Priority: P2    
Version: 5.5.23   
Target Milestone: ---   
Hardware: All   
OS: All   

Description Nige 2010-09-20 06:07:58 UTC
This probably affects all versions, not just 5.5.23.

It comes down to the utility class org.apache.tomcat.util.http.ContentType

It has the code:

    // Basically return everything after ";charset="
    // If no charset specified, use the HTTP default (ASCII) character set.
    public static String getCharsetFromContentType(String type) {

Which is basically lazy. It can't use everything after ";charset=". It *must* parse it and see if there is another ";"

Because if someone sets the Content-Disposition header in an XHR, it might BY NO FAULT OF THAT DEVELOPER end up as

Content-Type: multipart/form-data; charset=UTF-8; boundary=--------------------ext-ux-upload-boundary

That's taken directly from the Fiddler debugging tool. The browser (Firefox) INSERTED its encoding name (UTF-8 is mandated for XHRs) within the header which I specified. It did not append it, it INSERTED it.

Now, the character set ends up as "UTF-8; boundary=--------------------ext-ux-upload-boundary"
Comment 1 Tim Whittington 2010-09-22 05:00:53 UTC
This was fixed in #42119 (in 2007) - updating to a recent 5.5.x (or preferrably to a 6.0.x) will resolve the issue.

*** This bug has been marked as a duplicate of bug 42119 ***