Issue 19514

Summary: html export: charset utf-8 uses named entities
Product: Writer Reporter: Regina Henschel <rb.henschel>
Component: uiAssignee: AOO issues mailing list <issues>
Status: ACCEPTED --- QA Contact:
Severity: Trivial    
Priority: P5 (lowest) CC: che, issues, mey.wer, pavel, pmladek, xslf
Version: OOo 1.1 RC2Keywords: oooqa
Target Milestone: ---   
Hardware: PC   
OS: All   
Issue Type: ENHANCEMENT Latest Confirmation in: ---
Developer Difficulty: ---
Attachments:
Description Flags
example produced HTML
none
how the document should be
none
This patch fixes the problem for ISO8859-1 and MS 1250 none

Description Regina Henschel 2003-09-13 02:02:04 UTC
In "Importing and Exporting in HTML Format" the online-help says,
"When exporting to HTML, the character set selected in Tools - Options -
Load/Save - HTML Compatibility is used. Characters not present there are written
in a substitute form, which is displayed correctly in modern web browsers. When
exporting such characters, you will receive an appropriate warning."

But when I use utf-8, my "ä","ü" and so on are turned to named entities although
they are presented in utf-8 and no 'warning' appeares.
I would like it, when not the help-text is changed, but the export. In utf-8 you
need no named entities and no &#xnn;. Even in ISO-8859-1 it is not nessecary for
"ä" and "ü". When someone likes such substitute, he can choose ASCII/US.

kind regards
Regina
Comment 1 utomo99 2003-10-03 04:23:58 UTC
Please Attach the documents which make this problem, so we can test
it/faster to confirm.  
(Without the documents, we cannot confirm the problem easily/need more
time)
Don't forget to cut other part of the documents, so the file size is
small, but we still able to see the problem. 
Comment 2 Regina Henschel 2003-10-03 09:18:56 UTC
Created attachment 9949 [details]
example produced HTML
Comment 3 Regina Henschel 2003-10-03 09:22:38 UTC
The attached file was produced by 'new HTML-DOcument'. The behavior is
indepent of the field 'Export' in the 'HTML Compatibility'-dialog.
'Character Set' in that dialog was set to 'UTF-8'.
Comment 4 Regina Henschel 2003-10-03 09:29:17 UTC
Created attachment 9950 [details]
how the document should be
Comment 5 Regina Henschel 2003-10-03 09:32:45 UTC
The document testspecialcharacter_correct shows the correct coding of
umlaut 'ü' in UTF-8.
Comment 6 lohmaier 2003-10-03 13:19:37 UTC
confirming. But since this doesn't produce broken HTML, I set this to
Prio5
(changed subject, OS to ALL)
original summary: html export: information in help doesn't fit to behavior
Comment 7 h.ilter 2003-10-08 15:23:39 UTC
Reassigned to ES
Comment 8 eric.savary 2003-12-16 18:04:47 UTC
ES->MIB: Please evaluate
Comment 9 michael.brauer 2004-01-07 09:20:04 UTC
To offer as much compatibility as possible, the HTML export in fact uses (named)
entities for as much characters as possible. One can consider this to be a bug
or a feature ...
Comment 10 pmladek 2004-06-01 14:12:50 UTC
Created attachment 15618 [details]
This patch fixes the problem for ISO8859-1 and MS 1250
Comment 11 pmladek 2004-06-01 14:16:11 UTC
A similar patch which fixes the problem for ISO 8859-7 and MS 1253 is mentioned
in the Issue #28241.
Comment 12 lohmaier 2005-08-18 20:52:28 UTC
*** Issue 53483 has been marked as a duplicate of this issue. ***
Comment 13 meywer 2005-11-30 15:30:31 UTC
Problem occures still in OOo2.0

And there is yet another problem - maybe should be a new bug/issue:

"To offer as much compatibility as possible, the HTML export in fact uses
(named) entities for as much characters as possible. One can consider this to be
a bug or a feature ..."
That's not true at all!
OO replaces &bdquo; &ldquo; and other by " (99 down) and " (66 up)!

I don't like (I hate), that OOo replaces code, which was created manually before.