|Summary:||jkmanager generates non-well-formed XML for certain server timezones|
|Product:||Tomcat Connectors||Reporter:||Jochen Schwarze <Jochen.Schwarze>|
|Component:||mod_jk||Assignee:||Tomcat Developers Mailing List <dev>|
Description Jochen Schwarze 2012-11-20 15:53:35 UTC
When you call http://server.example/jkmanager?mime=xml on a server with a German locale setting and timezone MET (for "Mitteleuropäische Zeit", note the umlaut "ä"), the result XML ist non-well-formed because of that umlaut character: <?xml version="1.0" encoding="UTF-8" ?> ... <jk:time datetime="20121120164838" tz="Mitteleurop�ische Zeit" unix="1353426518" /> ... The umlaut character in the tz attribute is sent not in UTF-8 encoding as declared but in ISO-8859-1 encoding. Conforming (=all) XML processors refuse to parse this.
Comment 1 Christopher Schultz 2012-11-20 22:29:18 UTC
Yuck. Looks like static int status_strftime(time_t clock, int mime, char *buf_time, char *buf_tz, jk_logger_t *l) needs to be static int status_strftime(time_t clock, int mime, char *buf_time, wchar_t *buf_tz, jk_logger_t *l) ...and associated ripple effects. Any other cases where non-US-ASCII characters are used in jk-status? Wide characters are such a pain in C...
Comment 2 Jochen Schwarze 2012-11-21 08:14:39 UTC
I haven't seen other cases so far. (Using Non-US-ASCII characters in worker names might be regarded unwise and can be avoided easily ... ;-)
Comment 3 Rainer Jung 2014-12-29 14:51:04 UTC
The time formatting is done usinf strftime(). That in turn uses LC_TIME (resp. LC_ALL, LANG) to check the locale and charset to use. The Umlaut conmes form the fact, that your system has a probably german locale set in one of those variables. The fact that the umlaut is not UTF8 encoded as declared by the jk-status XML header indicates, that you LC_TIME (or LC_ALL or LANG) points to a 8859 de-Locale. Unfortunately it is not safe to set LC_TIME in the jk status worker before calling strftime(), because this would not be safe if the surrounding web server runs in a multi-threaded way as is typical for today's use. Instead of trying to guess the locale and transforming the characters fro that locale to the UTF-8 charset, I have changed the formatting to use pure numeric date strings in r1648352. Will be part of version 1.2.41. As you said: non-ASCII data coming from configuration can and should be avoided. If you find other places were the status worker produces non-well-formed output, please let us know. Thanks and regards, Rainer
Comment 4 Rainer Jung 2014-12-29 16:27:40 UTC
For the sake of completeness: Chris has found an API, strftime_l(), which allows to format using a chosen call-specific locale. This API is pretty recent though, it is only part of the Open Group Unix Specification Version 4, which seems to have been published 2013. We could detect presence of strftime_l() during "configure", then use this modern API to format a more human-friendly time format string using the "C" locale, thus ending up with an ASCII representation. I'm not planning to do that work currently, but patches are welcome.