|Summary:||SSI-Servlet produces invalid character encoding information|
|Product:||Tomcat 4||Reporter:||Jos <jmontiel>|
|Component:||Servlets:SSI||Assignee:||Tomcat Developers Mailing List <dev>|
tgzed org.apache.catalina.ssi src-package with quick fix
servlets-ssi.jar bin-package with quick fix
Description Jos 2002-07-01 20:54:06 UTC
On 2001-11-06 Bug 4674 was submitted " ... the 'Content-Encoding' header SSI-Servlet generates has always the value 'UTF-8'. ... " It was fixed on 2001-11-12 ------- Additional Comments From Amy Roh 2001-11-12 16:36 ------- Fixed. Should be available in the next nightly build. However, if you check the source code of this servlet on releases 4.03 and 4.04 the code still shows Line 251: res.setContentType("text/html;charset=UTF-8");
Comment 1 Remy Maucherat 2002-07-01 21:19:13 UTC
Nightlies != 4.0.x. This bug won't be addressed in the 4.0.x releases, as they do not include the refactored SSI code. Please try the 4.1.6 test release instead to check the progress of the SSI code.
Comment 2 Adam W. 2003-05-27 07:09:33 UTC
I tried to use polish character encoding on page http://pa22.katowice.sdi.tpnet.pl:23/fortune/example.shtml and the result is - the characters in "Strona g³ówna" are displayed wrong - it looks they are converted to unicode. The header is set properly <meta http-equiv="content-type" content="text/html; charset=iso-8859-2"> and the characters are also ok - you can see source of the file in http://pa22.katowice.sdi.tpnet.pl:8101/fortune/example.shtml.src The line with res.setContentType("text/html;charset=UTF-8"); is still present in the code, propably this is the reason, but simple replacing it with my desired encoding doesnt solve the problem completely. What should I do to make it work at least temporary?
Comment 3 Tomislaw Kitynski 2003-06-26 15:57:06 UTC
I haven't found an existing solution to this problem, so I played a bit with the source and I have working fix for that. First of all I am not very familiar with the procedure of applying patches to CVS (I mean I don't know if shall I report it before commiting anything or ask for a permission or anything else), so I didn't put it into the repository. Instead I will give out the source and/or binaries if somebody asks. I'll be happy if the patches would hit the repository anyway. Okay, here's the trick: now SSIServlet handles two more init-parameters, ie. defaultInputEncoding and defaultOutputEncoding. First one tells the SSIInclude command to treat all processed (and included) files as they were written in this charset (by creating appriopriate readers). The second sets Content- Type's charset attribute to given value and thus allow to create proper writer. This forced me to add two methods to SSIExternalResolver interface: getDefaultInputEncoding and getDefaultOutputEncoding. Both return objects of the type java.nio.charset.Charset, that hold appropriate charsets. If happens, that certain included file is in different charset than the rest, then it's charset can be entered after the file name. I was thinking of using separate parameter, but it would break NCSA standard, besides <!--#include> command allows any number of file/virtual parameters, so it would have to be written like this: <!--#include file="foo.txt" charset="iso-8859-2" file="bar.txt" charset="iso-8859-1"--> and so on. Well, maybe it's not bad, but as I've written, it breaks NCSA standard. So instead I've used the same syntax as in mail headers. So now we shall write: <!--#include file="foo.txt;charset=iso-8859-2" file="bar.txt; charset = iso-8859-1"--> a.s.o. I hope this will not break any rule, and I know---it's questionable. This, however, solves my problems with incorrect output, and if we have all the files in the same charset, we do not have to use "...;charset=X" construction (to be honest, I haven't tested the charset stuff just mentioned). Default encodings works however flawlessly. If anyone is interrested in this patch, please contact me. If Tomcat developers find this patch usefull or not too dirty/nasty, then I gladly add my .02 to the contribution.
Comment 4 Tomislaw Kitynski 2003-06-27 08:55:04 UTC
I misused WORKSFORME resolution (ehh, I should have read FM ;-), so I am setting the status back to REOPEN. I'm sorry, guys!
Comment 5 Tomislaw Kitynski 2003-06-27 09:15:28 UTC
Created attachment 7011 [details] tgzed org.apache.catalina.ssi src-package with quick fix
Comment 6 Tomislaw Kitynski 2003-06-27 09:22:31 UTC
Created attachment 7012 [details] servlets-ssi.jar bin-package with quick fix
Comment 7 Mark Thomas 2005-04-03 21:07:01 UTC
I have committed a partial fix for TC4.1.x that should resolve the original issue. I am still looking at the attached patches.
Comment 8 Mark Thomas 2005-04-04 22:59:37 UTC
I have not committed the proposed patch as it would introduce Tomcat specific SSI syntax. Whilst there is no offical SSi spec I do not believe that Tomcat should differ from the SSI syntax supported by Apache Web Server. I have committed an alternative patch that introduces 2 new servlet parameters. See the docs/source for details. I have also ported the changes to TC5.5.x