Summary: | DefaultServlet and CharacterEncoding | ||
---|---|---|---|
Product: | Tomcat 6 | Reporter: | Felix Schumacher <felix.schumacher> |
Component: | Catalina | Assignee: | Tomcat Developers Mailing List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | enhancement | CC: | zhouyanming |
Priority: | P2 | ||
Version: | unspecified | ||
Target Milestone: | default | ||
Hardware: | All | ||
OS: | All |
Description
Felix Schumacher
2010-06-18 08:13:17 UTC
I've been digging into this and I think the situation is a little more complicated. There are three scenarios to consider: a) directly returning a file b) including a file into an output stream c) including a file into a writer a) is the simple case. We can set the character encoding to be the effective value of fileEncoding (i.e. the value or system default it not set) b) and c) are trickier. In both cases we need to read the input as characters (conversion form bytes via fileEncoding). Then for b) we need to write it out again using whatever output encoding has been set on the response. c) we can just write the characters and let the write handle it. I think that covers all the cases although some edge cases may emerge as I dig into this. As far as I can see this can all be done without any additional configuration options. I'm not so sure it can be done without changing some method signatures. While those methods are protected and internal to Tomcat, the default servlet is something that tend to get 'tweaked' by users so we'll need to tread carefully if we back-port any of this. (In reply to Mark Thomas from comment #1) > I've been digging into this and I think the situation is a little more > complicated. > > There are three scenarios to consider: > a) directly returning a file > b) including a file into an output stream > c) including a file into a writer > > a) is the simple case. We can set the character encoding to be the effective > value of fileEncoding (i.e. the value or system default it not set) What if web.xml contains a <mime-type> which includes a charset parameter? I think respecting that parameter would be good if possible. > b) and c) are trickier. In both cases we need to read the input as > characters (conversion form bytes via fileEncoding). Then for b) we need to > write it out again using whatever output encoding has been set on the > response. c) we can just write the characters and let the write handle it. I'm assuming that binary file types are basically out-of-scope here, right? > I think that covers all the cases although some edge cases may emerge as I > dig into this. > > As far as I can see this can all be done without any additional > configuration options. I'm not so sure it can be done without changing some > method signatures. While those methods are protected and internal to Tomcat, > the default servlet is something that tend to get 'tweaked' by users so > we'll need to tread carefully if we back-port any of this. +1 If the response character encoding is set (via any of the available means to do so) then the patch will respect that. Correct, binary files are out of scope. I'll double check the patch doesn't impact them. Fixed in: - trunk for 9.0.0.M23 onwards - 8.5.x for 8.5.17 onwards - 8.0.x for 8.0.46 onwards - 7.0.x for 7.0.80 onwards Whoops. Binary files are caught in this. That needs fixing. Thanks for the hint. And fixed. Same versions as above. Re-opening. The first attempt at fixing this triggered a series of regressions. The fix has therefore been reverted in 7.0.x, 8.0.x and 8.5.x. This needs more careful consideration. The end result may be that it is only fixed for 9.0.x With a significant increase in the number of unit tests and a number of additional regressions fixed, this is now fixed again for 9.0.x. Given the history of regressions, I do not propose back-porting this to earlier versions as this time. *** Bug 62971 has been marked as a duplicate of this bug. *** It has been almost 2 years without further regressions so I have back-ported this to 8.5.x for 8.5.43 onwards. |