Issue 971 - Save as HTML causes charset problems with Russian
Summary: Save as HTML causes charset problems with Russian
Status: CONFIRMED
Alias: None
Product: Writer
Classification: Application
Component: code (show other issues)
Version: 627
Hardware: PC Windows 98
: P3 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL: prevoir
Keywords:
: 471 17923 18140 62704 (view as issue list)
Depends on:
Blocks:
 
Reported: 2001-05-29 05:43 UTC by issues@www
Modified: 2013-08-07 14:38 UTC (History)
3 users (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
OO file with Russian (4.85 KB, application/octet-stream)
2003-12-06 14:52 UTC, issues@www
no flags Details
Wrong encoded HTML with Russian (819 bytes, text/html)
2003-12-06 14:52 UTC, issues@www
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description issues@www 2001-05-29 05:43:32 UTC
If I save document with Russian as HTML I get the document totally unreadable 
even in OO itself.
First of all charset is set to iso8859-1. Second, body of HTML is in strange 
coding. Attemts to set charset=utf-8 or something similar is not success.
Comment 1 issues@www 2001-05-29 05:45:30 UTC
Created attachment 263 [details]
OO file with Russian
Comment 2 issues@www 2001-05-29 05:47:19 UTC
Created attachment 264 [details]
Wrong encoded HTML with Russian
Comment 3 stefan.baltzer 2001-05-29 12:36:24 UTC
Reassigned to Eric.
Comment 4 eric.savary 2001-05-29 15:06:59 UTC
- HTML: currently, OOo only supports UTF-8 encoding. Set this under 
"Tools - Options - Load/Save - HTML compatibility - Charcter set"
- Writer 6.0: I can see the cyrillic text ([etot text na russkom]...). 
So I assume you don't have Unicode fonts installed on your system to 
*display* cyrillic in OOo.
Comment 5 issues@www 2001-05-29 17:47:47 UTC
I set UTF-8 in Filters/HTML and now can see Russian. Thank you!
But it should set charset=utf-8 by default in HTML header while OO uses only 
Unicode.
Comment 6 issues@www 2001-06-02 20:57:36 UTC
The problem described here is the same as in issue #471, which was marked 
as "resolved" after introducing the "charset" option for the html export filter.

In my opinion, three things still have to be changed:

1. What happens to characters in the document that are not part of the 
character set which is set in the HTML export options? (E.g., I have a document 
containing both German and Russian text, and I try to save it as iso-8859-1)? 
I think, those characters have to be saved in the HTML file as numeric HTML 
entities (Ӓ).

2. By now, when I try to save the German-Russian document mentioned above in 
HTML and the HTML export charset is not set to utf-8, the document is saved 
quietly without any error message. - Only after reopening the saved document I 
have to mention that all non-Latin1 characters are destroyed.
I think, there should be at least a warning displayed to the user, saying that 
the selected export charset does not match all characters in the document.

3. In my opinion, "Tools - Options - Load/Save - HTML compatibility - Character 
set" is not the right place for the character set option of the HTML export 
filter, because if I have documents in several languages I have to change the 
setting for every document.
I think the charset setting should be possible directly in the "Save as.." 
dialog as it is for the file type "Text (encoded)". The default value should be 
set to a charset that best matches the characters in the actual document (e.g. 
iso8859-1 if there are only Western characters in the document, KOI8-R if there 
are only Russian characters, UTF-8 if there are characters from more than one 8-
bit charset and so on...)
Comment 7 issues@www 2001-06-03 06:15:26 UTC
>I think the charset setting should be possible directly in the "Save as.." 
>dialog as it is for the file type "Text (encoded)". The default value should 
>be 
>set to a charset that best matches the characters in the actual document (e.g. 
>iso8859-1 if there are only Western characters in the document, KOI8-R if 
>there 
>are only Russian characters, UTF-8 if there are characters from more than one 
>8-bit charset and so on...)
The trouble is in default charset. For Russian KOI8-R is used in UNIX, WIN1251 
in Windows, ISO8859-5 sometimes in commercial Unices. If OO is cross-platform 
what is default charset?
IMHO default charset should be UTF-8. Since OO now uses it and only it (see 
previous comments) the only problem is to properly set charset=utf-8 tag and 
may be to disallow encoding selection ability to not confuse users.
But recoding to 8-bit charsets is a nice feature...
Comment 8 issues@www 2001-06-05 12:41:30 UTC
>IMHO default charset should be UTF-8
I agree to you.
Comment 9 eric.savary 2001-06-05 14:13:03 UTC
Ok so, let's rewrite it the way Dimitry does:

IMHO default charset should be UTF-8. Since OO now uses it and only it. The only 
problem is to properly set charset=utf-8 tag and may be to disallow encoding 
selection ability to not confuse users.
Comment 10 lutz.hoeger 2001-06-05 15:07:14 UTC
Falko, please take care of this one. Are there any compatibility issues with 
old StarOffice versions?
Comment 11 falko.tesch 2001-06-13 11:03:20 UTC
If this is true this is a bug not a RFE
Comment 12 stefan.baltzer 2001-06-18 16:09:12 UTC
The circle closes... Reassigned to Eric.
Comment 13 eric.savary 2001-06-19 09:45:10 UTC
And the circle reopens! ;-)

Falko: it is a RFE because OOo doesn't save default to UTF-8 for it hasn't been 
*planed*. So it doesn't work because it had not to be inplemented :).
Comment 14 falko.tesch 2001-07-02 07:57:55 UTC
Will be fix in 6.0 final
Comment 15 eric.savary 2001-07-09 09:14:36 UTC
Falko: which *OOo* build do you mean? Good morning! >;-)
For this task we could add the comments of Christoph (#471 - ------- Additional 
Comments From christoph.singer@heindl.de 2001-06-06 02:29 -------).
What do you think about this?
Comment 16 eric.savary 2001-07-09 09:20:50 UTC
*** Issue 471 has been marked as a duplicate of this issue. ***
Comment 17 Unknown 2001-11-08 23:11:48 UTC
changing QA contact from bugs@ to issues@
Comment 18 eric.savary 2003-06-27 22:59:24 UTC
set to OOo 2.0
Comment 19 tamblyne 2003-08-12 04:10:18 UTC
*** Issue 18140 has been marked as a duplicate of this issue. ***
Comment 20 tamblyne 2003-08-19 05:09:21 UTC
*** Issue 17923 has been marked as a duplicate of this issue. ***
Comment 21 falko.tesch 2003-09-11 16:24:08 UTC
We will address this problem in 2.0. But since I have no issue yet I
re-assign this issue to Bettina to be set to duplicate once the PCD
issue is opened.
Comment 22 bettina.haberer 2003-11-11 15:52:39 UTC
Hello Dmitry, this issue is already covered by an internal issue. It
will be implemented in OO.o 2.0. Due to technical reason it is not
possible to set this issue as duplicate to an other
issue-trackingssystem. Please check the implementation in the upcoming
version OO.o 2.0. Thank you.
Comment 23 stx123 2004-03-22 08:54:36 UTC
Reassign issue to owner of selected subcomponent
Comment 24 michael.ruess 2004-03-22 12:16:43 UTC
re-assigned to ES.
Comment 25 eric.savary 2004-04-15 16:58:09 UTC
ES->BH: I couldn't find any duplicate of this nor in Bt+ neithzer in iBIS.
Please find out which task is duplicate ofthis one and make a child of it.
Thanx
Comment 26 Martin Hollmichel 2004-08-09 14:02:40 UTC
according to http://www.openoffice.org/servlets/ReadMsg?list=releases&msgNo=7690
this issue will be set to OOoLater
Comment 27 eric.savary 2006-03-02 10:38:51 UTC
*** Issue 62704 has been marked as a duplicate of this issue. ***
Comment 28 bettina.haberer 2010-05-21 14:46:47 UTC
To grep the issues easier via "requirements" I put the issues currently lying on
my owner to the owner "requirements".