Bug 14513 - AddDefaultCharset should apply to application/xhtml+xml
Summary: AddDefaultCharset should apply to application/xhtml+xml
Status: RESOLVED WONTFIX
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: Core (show other bugs)
Version: 2.0.43
Hardware: Other other
: P3 minor (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2002-11-13 17:03 UTC by Joseph Walton
Modified: 2004-12-10 22:31 UTC (History)
1 user (show)



Attachments
A XHTML document which use UTF-8 (567 bytes, text/html)
2003-09-05 12:43 UTC, Greg
Details
A XHTML document which use UTF-8 (the previous was not XHTML, sorry!) (553 bytes, text/html)
2003-09-05 12:54 UTC, Greg
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Joseph Walton 2002-11-13 17:03:09 UTC
AddDefaultCharset is currently only applied to a small list of MIME types. XHTML
should also be included.

As far as I can see, this simply involves adding "application/xhtml+xml" to the
'needcset' array in server/protocol.c; ideally this array would be replaced with
configurable settings, as a number of other XML-based MIME types would also
qualify for this treatment (text/xml and image/svg, for example).

(Whilst XML may well include its charset in the header, I believe it should also
be sent as part of the MIME type; perhaps the file could be checked for the
correct encoding to declare before imposing the site's default?)
Comment 1 André Malo 2003-03-20 16:40:53 UTC
Yes we need some more generic solution for this, such as AddCharsetByType or
something. Changing to assigned until someone feels to code it up.

Thanks for your report and thanks for using Apache.
Comment 2 Greg 2003-09-05 12:27:58 UTC
Hello!

Here are other informations related to this problem. I'll try to make it easy to
understand.

* Context : I have a website running Apache 2.0.47. My pages are all valid
according to XHTML 1.1 strict.

* What I did : I specified the
content-type="application/xhtml+xml; charset=utf-8"
in the meta tag of my pages.
I also indicated the optional XML prolog :
<?xml version="1.0" encoding="UTF-8"?>
So, as a XHTML author, I did my best to use UTF-8.

* What happens : my UTF-8 characters appear badly in Mozilla 1.4 and Opera 7.11
because the ISO-8859-1 charset is used! Only Konqueror 3.1 displays UTF-8 correctly.

* Why I suspect Apache : the problem only happens if the line "AddDefaultCharset
ISO-8859-1" is active in "commonhttpd.conf". If I remove this line, then the
files encoded with UTF-8 appear correctly in all three browsers.

* My explanation : by default, Konqueror 3.1 may not use Apache's charset
information, if it finds later the charset information in the XHTML meta tag.
Opera and Mozilla, on the other hand, trust Apache, and do not check the XHTML
code, which leads to pages being displayed as ISO-8859-1.

* What can we do :
1st solution : comment out the "AddDefaultCharset" directive in
"commonhttpd.conf" by default, so that the browsers are forced to use the XHTML
meta tag.

2nd solution : replace the AddDefaultCharset value, ISO-8859-1, with UTF-8.
However, this could cause a similar problem with other languages which do not
use UTF-8 but another charset... However, on the long term, everything will be
encoded as UTF-8, and not as ISO8859-1, so it would still be more acceptable
than the current setting.

3rd solution : make Apache read the XHTML document so that it sets the MIME type
correctly, depending on the web page and not on the server. The problem is all
there : Apache does not check the character set specified inside the XHTML code
at all, and the browsers trust Apache way too much.

4th solution : leave the thing as is, and tell the browsers companies that their
software should preferably use the XHTML meta tag than Apache's information,
which can be wrong.

Greg
Comment 3 Greg 2003-09-05 12:43:39 UTC
Created attachment 8074 [details]
A XHTML document which use UTF-8
Comment 4 Greg 2003-09-05 12:54:07 UTC
Created attachment 8075 [details]
A XHTML document which use UTF-8 (the previous was not XHTML, sorry!)
Comment 5 Greg 2003-09-05 18:55:09 UTC
The last line of this attachment should display :
    "Tiu pag^o, la mas^inlingvo."
with the "^" sign above the "g" and the "s".

NOTE : if you click this attachment in Mozilla or Opera, it will appear
correctly. This does not mean that my comment is wrong, it simply means that
"nagoya.apache.org" has been reconfigured. You should copy this attachment to a
new Apache server, and access it with Mozilla or Opera, in order to reproduce
the problem.

Thanks.
Comment 6 Greg 2003-09-05 22:03:07 UTC
I finally found the light :
http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.4.1
That's HTTP 1.1 about missing charsets. Quote:

<q>Some HTTP/1.0 software has interpreted a Content-Type header without charset
parameter incorrectly to mean "recipient should guess." Senders wishing to
defeat this behavior MAY include a charset parameter even when the charset is
ISO-8859-1 and SHOULD do so when it is known that it will not confuse the recipient.

Unfortunately, some older HTTP/1.0 clients did not deal properly with an
explicit charset parameter. HTTP/1.1 recipients MUST respect the charset label
provided by the sender; and those user agents that have a provision to "guess" a
charset MUST use the charset from the content-type field if they support that
charset, rather than the recipient's preference, when initially displaying a
document. See section 3.7.1.</q>

The second paragraph clearly says that recipients MUST respect the charset label
provided by Apache. So Mozilla and Opera do their job. On the other hand, Apache
should NOT force a default charset if it cannot be sure that it will not
"confuse the recipient". Here, the default character set is definitely confusing
for the recipient. So Apache should not use any charset.

I correct my first comment :
- Solution #1 cannot be applied because Apache must provide a default charset
for some recipients who would not be able to detect the charset by themselves.
- Solution #2 cannot be applied because ISO-8859-1 is HTTP 1.1 default charset,
UTF-8 is not (maybe a future HTTP will correct that).
- Solution #4 cannot be applied because the browsers are fully HTTP 1.1 compliant.

So, only the third solution can solve this problem: Apache should look at the
XHTML header and sets the charset accordingly. But I guess it would be too slow
and people would complain.

Well, I guess I will have to use the "header" PHP function:
  header("Content-type: application/xhtml+xml; charset=utf-8");
That's the "do it yourself" solution.

Greg
Comment 7 Martin Dürst 2003-09-25 18:53:02 UTC
The right solution is to comment out AddDefaultCharset from httpd.conf,
because sending no charset info is still better than sending the wrong
info. I do not know ANY http client that follows the iso-8859-1 default
for HTTP (although some personal configurations of clients may do so).
Adding application/xhtml+xml into the influence of AddDefaultCharset
without further configurability would be a step in the wrong direction.
Doing it without removing AddDefaultCharset from httpd.conf as shipped
would be a disaster.
Comment 8 Martin Dürst 2003-09-25 20:21:34 UTC
see also bug 23421
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23421
Comment 9 Roy T. Fielding 2004-12-11 07:31:34 UTC
The original report was to add application/xhtml+xml to the list
of types to apply AddDefaultCharset.  That request has been considered
and rejected because AddDefaultCharset is only intented for adding charset
to legacy content.  The charset for XHTML can be set internally, as
preferred, or by use of the AddCharset directive and a filename
extension specific to the actual charset in use.

The rest of the report is people adding comments about issue 23421
which has been resolved.  AddDefaultCharset will not be in the default
configuration for future releases.