Bug 47686 - Setting SSI variables appear to assume ISO-8859-1 character encoding
Summary: Setting SSI variables appear to assume ISO-8859-1 character encoding
Status: RESOLVED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_include (show other bugs)
Version: 2.2.13
Hardware: All All
: P2 regression with 6 votes (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-08-12 05:34 UTC by Björn Wiberg
Modified: 2010-09-19 09:03 UTC (History)
4 users (show)



Attachments
Testcase that shows the problem, SSI has to be enabled! (214 bytes, text/plain)
2009-11-02 11:34 UTC, Rafael Gattringer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Björn Wiberg 2009-08-12 05:34:34 UTC
Setting an SSI variable to a value containing UTF-8 characters and then echoing it yields output with HTML entities for each of the two bytes of each UTF-8 character.

E.g.:

<!--#set var="DEFAULT_APPLICATION_DESCRIPTION" value="den CAS-tjänst du besökte" -->
<!--#set var="APPLICATION_DESCRIPTION" value="${DEFAULT_APPLICATION_DESCRIPTION}" -->
Du &auml;r nu utloggad ur <!--#echo var="APPLICATION_DESCRIPTION" -->.

...yields (in browser):

Du är nu utloggad ur den CAS-tjänst du besökte.

...instead of (in browser):

Du är nu utloggad ur den CAS-tjänst du besökte.

...and in HTML:

Du &auml;r nu utloggad ur den CAS-tj&#195;&#164;nst du bes&#195;&#182;kte.

...instead of in HTML:

Du &auml;r nu utloggad ur den CAS-tjänst du besökte.


This was not so in Apache 2.2.11.

The .shtml page is in UTF-8 (with BOM) and has:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

In httpd.conf, AddDefaultCharset is NOT set (so uses default value).
Comment 1 Ruediger Pluem 2009-08-12 11:51:56 UTC
I guess this is caused by the fix for PR25202 (r730296, r732583). This might be tricky to fix.
Comment 2 Joe Orton 2009-08-13 03:34:59 UTC
The mod_include docs do clearly state that, for echo,

a) encoding=entity is the default, and
b) encoding=entity will not work correctly if a character encoding other than ISO-8859-1 is in use

For the example given here, at least, using 

<!--#echo encoding="none" var="..."

would be the correct use, surely?

But the potential for breaking backwards compat in bug 25202 does seem to have been clearly called.  Iwan Stanley's approach in bug 25202 comment 7 seems vastly preferable to the change which got committed.
Comment 3 Anders Kaseorg 2009-08-17 07:48:05 UTC
(In reply to comment #2)
> For the example given here, at least, using 
> <!--#echo encoding="none" var="..."
> would be the correct use, surely?

That doesn’t work, though, when the variable may contain <>&".
Comment 4 Rafael Gattringer 2009-11-02 11:34:05 UTC
Created attachment 24464 [details]
Testcase that shows the problem, SSI has to be enabled!
Comment 5 Rafael Gattringer 2009-11-02 11:41:24 UTC
I would like to confirm this problem. See my attached SSI file for an example.
Comment 6 Graham Leggett 2010-09-19 09:03:34 UTC
Fixed on trunk in r998651.