Bug 57878 - Using UTF-8 for all languages, and avoiding html-entities.
Summary: Using UTF-8 for all languages, and avoiding html-entities.
Status: NEW
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: Documentation (show other bugs)
Version: 2.4-HEAD
Hardware: All All
: P2 normal (vote)
Target Milestone: ---
Assignee: HTTP Server Documentation List
Depends on: 57879
  Show dependency tree
Reported: 2015-04-30 22:19 UTC by Tom Fredrik Blenning
Modified: 2018-08-06 13:51 UTC (History)
1 user (show)

Removes all HTML-entities and uses UTF-8 for all languages (88.64 KB, patch)
2015-04-30 22:26 UTC, Tom Fredrik Blenning
Details | Diff
Script to fix this issue (879 bytes, application/x-sh)
2015-04-30 22:30 UTC, Tom Fredrik Blenning

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Fredrik Blenning 2015-04-30 22:19:04 UTC
UTF-8 is used for all but Spanish and Portuguese. Also avoid using pure HTML entities.
Comment 1 Tom Fredrik Blenning 2015-04-30 22:26:07 UTC
Created attachment 32707 [details]
Removes all HTML-entities and uses UTF-8 for all languages
Comment 2 Tom Fredrik Blenning 2015-04-30 22:30:12 UTC
Created attachment 32708 [details]
Script to fix this issue

This is the script used to generate the previous patch. Apply in docs/error
Comment 3 Tom Fredrik Blenning 2015-04-30 22:31:21 UTC
This is a rather invasive patch, I've done the whole process with a script. But I would advise that someone proficient in the different languages, reviews the changes.
Comment 4 Takashi Sato 2015-06-08 08:58:40 UTC
+1 for concept.
I don't like HTML-entities.
Comment 5 Sierk Bornemann 2018-02-26 11:55:23 UTC
Any progress on this issue?

On Tue Dec 5 11:21:21 2017 UTC (Revision 1817175) error docs have been touched firstly since years for russian translations, see

Why not fix the above issue (Bug 57878), which is open and untouched since several years, simultaneously or shortly after?
Please fix.
Comment 6 William A. Rowe Jr. 2018-08-06 13:51:24 UTC
Entirely agree on html entities for native alpha, they can be represented in any applicable ISO set.

Entirely agree on utf-8 for editing.

Two different charseta can coexist for one language. If we feel any desire to retain 8bit charseta at the user agent's priority/preference, then 8bit should be generated and appended based on some maplist of languages.

Question - does anyone for see user agens in active use which have a reason to still prefer a 8859 or 2022 mapping in this day and age?