Apache OpenOffice (AOO) Bugzilla – Issue 971
Save as HTML causes charset problems with Russian
Last modified: 2013-08-07 14:38:26 UTC
If I save document with Russian as HTML I get the document totally unreadable even in OO itself. First of all charset is set to iso8859-1. Second, body of HTML is in strange coding. Attemts to set charset=utf-8 or something similar is not success.
Created attachment 263 [details] OO file with Russian
Created attachment 264 [details] Wrong encoded HTML with Russian
Reassigned to Eric.
- HTML: currently, OOo only supports UTF-8 encoding. Set this under "Tools - Options - Load/Save - HTML compatibility - Charcter set" - Writer 6.0: I can see the cyrillic text ([etot text na russkom]...). So I assume you don't have Unicode fonts installed on your system to *display* cyrillic in OOo.
I set UTF-8 in Filters/HTML and now can see Russian. Thank you! But it should set charset=utf-8 by default in HTML header while OO uses only Unicode.
The problem described here is the same as in issue #471, which was marked as "resolved" after introducing the "charset" option for the html export filter. In my opinion, three things still have to be changed: 1. What happens to characters in the document that are not part of the character set which is set in the HTML export options? (E.g., I have a document containing both German and Russian text, and I try to save it as iso-8859-1)? I think, those characters have to be saved in the HTML file as numeric HTML entities (Ӓ). 2. By now, when I try to save the German-Russian document mentioned above in HTML and the HTML export charset is not set to utf-8, the document is saved quietly without any error message. - Only after reopening the saved document I have to mention that all non-Latin1 characters are destroyed. I think, there should be at least a warning displayed to the user, saying that the selected export charset does not match all characters in the document. 3. In my opinion, "Tools - Options - Load/Save - HTML compatibility - Character set" is not the right place for the character set option of the HTML export filter, because if I have documents in several languages I have to change the setting for every document. I think the charset setting should be possible directly in the "Save as.." dialog as it is for the file type "Text (encoded)". The default value should be set to a charset that best matches the characters in the actual document (e.g. iso8859-1 if there are only Western characters in the document, KOI8-R if there are only Russian characters, UTF-8 if there are characters from more than one 8- bit charset and so on...)
>I think the charset setting should be possible directly in the "Save as.." >dialog as it is for the file type "Text (encoded)". The default value should >be >set to a charset that best matches the characters in the actual document (e.g. >iso8859-1 if there are only Western characters in the document, KOI8-R if >there >are only Russian characters, UTF-8 if there are characters from more than one >8-bit charset and so on...) The trouble is in default charset. For Russian KOI8-R is used in UNIX, WIN1251 in Windows, ISO8859-5 sometimes in commercial Unices. If OO is cross-platform what is default charset? IMHO default charset should be UTF-8. Since OO now uses it and only it (see previous comments) the only problem is to properly set charset=utf-8 tag and may be to disallow encoding selection ability to not confuse users. But recoding to 8-bit charsets is a nice feature...
>IMHO default charset should be UTF-8 I agree to you.
Ok so, let's rewrite it the way Dimitry does: IMHO default charset should be UTF-8. Since OO now uses it and only it. The only problem is to properly set charset=utf-8 tag and may be to disallow encoding selection ability to not confuse users.
Falko, please take care of this one. Are there any compatibility issues with old StarOffice versions?
If this is true this is a bug not a RFE
The circle closes... Reassigned to Eric.
And the circle reopens! ;-) Falko: it is a RFE because OOo doesn't save default to UTF-8 for it hasn't been *planed*. So it doesn't work because it had not to be inplemented :).
Will be fix in 6.0 final
Falko: which *OOo* build do you mean? Good morning! >;-) For this task we could add the comments of Christoph (#471 - ------- Additional Comments From christoph.singer@heindl.de 2001-06-06 02:29 -------). What do you think about this?
*** Issue 471 has been marked as a duplicate of this issue. ***
changing QA contact from bugs@ to issues@
set to OOo 2.0
*** Issue 18140 has been marked as a duplicate of this issue. ***
*** Issue 17923 has been marked as a duplicate of this issue. ***
We will address this problem in 2.0. But since I have no issue yet I re-assign this issue to Bettina to be set to duplicate once the PCD issue is opened.
Hello Dmitry, this issue is already covered by an internal issue. It will be implemented in OO.o 2.0. Due to technical reason it is not possible to set this issue as duplicate to an other issue-trackingssystem. Please check the implementation in the upcoming version OO.o 2.0. Thank you.
Reassign issue to owner of selected subcomponent
re-assigned to ES.
ES->BH: I couldn't find any duplicate of this nor in Bt+ neithzer in iBIS. Please find out which task is duplicate ofthis one and make a child of it. Thanx
according to http://www.openoffice.org/servlets/ReadMsg?list=releases&msgNo=7690 this issue will be set to OOoLater
*** Issue 62704 has been marked as a duplicate of this issue. ***
To grep the issues easier via "requirements" I put the issues currently lying on my owner to the owner "requirements".