Apache OpenOffice (AOO) Bugzilla – Issue 37610
Automatically charset recognition in HTML.
Last modified: 2013-08-07 14:38:26 UTC
When Writer opens HTML with russian characters (in default Windows-1251 encoding) that contains <META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=windows-1251"> header tag, all is displayed fine. But when this META-tag is omitted, html document is not detected as cyrillic, i.e, it's displayed like "ñèñòåìíûé" (central-european characters having same ASCII codes with russian chars in CP1251) and "Cyrillic Document" menu (provided by CyrillicTools 1.2 library) is hidden from the main menubar. My default language in Windows XP is russian, my OO build is russuan too (1.1.3_ru_msi, downloaded via http://www.openoffice.ru)
Ctrl+A, Format, Characters, Language = Russian doesn't helps...
Reassigned to ES.
There is no automatic charset recognition in OOo. HTML documents usually contain encoding information which OOo reads. One could think about a mechanism to "guess" the encoding from the file content. ES->BH: see 102270 for an internal version of this enhancement.
I don't need automatic charset recognition, but I strongly need two features: First, when webpage contain no META CHARSET tag, assign default OS charset ($LANG under Linux, HKLM\System\CurrentControlSet\Control\Nls\CodePage under Windows?), or default OO charset (Service-Settings-Languages-?what's?more?). Second, characters charset for entire document or selected block should be easily switched like Word does (Ctrl+A, Font: 'Arial'->'Arial Cyrillic' or something like).
To grep the issues easier via "requirements" I put the issues currently lying on my owner to the owner "requirements".