Issue 37610 - Automatically charset recognition in HTML.
Summary: Automatically charset recognition in HTML.
Status: CONFIRMED
Alias: None
Product: Writer
Classification: Application
Component: open-import (show other issues)
Version: OOo 1.1.3
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-11-22 11:47 UTC by ilya_evseev
Modified: 2013-08-07 14:38 UTC (History)
2 users (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description ilya_evseev 2004-11-22 11:47:04 UTC
When Writer opens HTML with russian characters (in default Windows-1251
encoding) that contains <META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html;
charset=windows-1251"> header tag, all is displayed fine.

But when this META-tag is omitted, html document is not detected as cyrillic,
i.e, it's displayed like "ñèñòåìíûé" (central-european characters having same
ASCII codes with russian chars in CP1251) and "Cyrillic Document" menu (provided
by CyrillicTools 1.2 library) is hidden from the main menubar.

My default language in Windows XP is russian, my OO build is russuan too
(1.1.3_ru_msi, downloaded via http://www.openoffice.ru)
Comment 1 ilya_evseev 2004-11-22 11:54:12 UTC
Ctrl+A, Format, Characters, Language = Russian doesn't helps...
Comment 2 michael.ruess 2004-11-22 12:05:57 UTC
Reassigned to ES.
Comment 3 eric.savary 2004-11-22 13:16:56 UTC
There is no automatic charset recognition in OOo.
HTML documents usually contain encoding information which OOo reads.
One could think about a mechanism to "guess" the encoding from the file content.

ES->BH: see 102270 for an internal version of this enhancement.
Comment 4 ilya_evseev 2004-11-22 13:32:52 UTC
I don't need automatic charset recognition, but I strongly need two features:

First, when webpage contain no META CHARSET tag, assign default OS charset
($LANG under Linux, HKLM\System\CurrentControlSet\Control\Nls\CodePage under
Windows?), or default OO charset (Service-Settings-Languages-?what's?more?).

Second, characters charset for entire document or selected block should be
easily switched like Word does (Ctrl+A, Font: 'Arial'->'Arial Cyrillic' or
something like).
Comment 5 bettina.haberer 2010-05-21 14:54:44 UTC
To grep the issues easier via "requirements" I put the issues currently lying on
my owner to the owner "requirements".