Issue 21019 - Read-out and assign language of IME to CJK+CTL input text
Summary: Read-out and assign language of IME to CJK+CTL input text
Alias: None
Product: ui
Classification: Code
Component: code (show other issues)
Version: OOo 1.0.0
Hardware: Other Windows XP
: P3 Trivial with 2 votes (vote)
Target Milestone: OOo 2.0
Assignee: stefan.baltzer
QA Contact: issues@ui
: 5966 19848 (view as issue list)
Depends on:
Blocks: 16354
  Show dependency tree
Reported: 2003-10-10 15:07 UTC by falko.tesch
Modified: 2007-06-20 17:17 UTC (History)
6 users (show)

See Also:
Latest Confirmation in: ---
Developer Difficulty: ---


Note You need to log in before you can comment on or make changes to this issue.
Description falko.tesch 2003-10-10 15:07:12 UTC
MS Windows IME provides extensive information about the current text input that
can be read out by OO.o already.
OO.o should make use of these entended information by read-out language (and
direction if possible) and assigning it to iput text.
Example 1: 
A user chose Thai as the default language for CTL languages but types in Arabic.
Now all Thai locale will be assigned to this Arabic text. This leads to problems
in certain cases where Thai language uses different formatting than Arabic (like
last line text distribution in justfied text).
With this new feature OO.o would auto-detect that Arabic text is input (opposed
to the defined Thai locale) and would override this setting by formatting the
input (Arabic) text in Arabic instead.
Example 2:
When typing in compbinations of RTL, LTR and weak characters (like numbers,
hyphens and Arabic) OO.o currently often assign the wrong text direction to the
weak character, resulting in construts like '-5CIBARA' instead of '5-CIBARA'.
Again this erranous behaviour can be avoided if OO.o would "know" which language
(-> text directuion) is used for weak characters.

For the moment is only makes sense to support CTL languages for this only.
It still has to be cleared if this also should apply to CJK languages.
(It does not makes sense for Western languages since for example we cannot
differentiate in between German/German and German/Austrian input)

Note: Since Unix IMEs do not report any language this feature con only be
implemented under Windows.
Comment 1 falko.tesch 2003-10-15 09:02:08 UTC
*** Issue 5966 has been marked as a duplicate of this issue. ***
Comment 2 falko.tesch 2003-10-15 09:03:12 UTC
*** Issue 19848 has been marked as a duplicate of this issue. ***
Comment 3 insount 2003-10-18 16:10:06 UTC
Issue 19848 is NOT a duplicate of this issue. 
Issue 19848 discusses internal representation and imported/legacy
texts, while this issue discusses a particular input method -- these
are mostly orthogonal.
To witness, issue 18024 gives an alternative resolution to this issue
(namely, manual insertion of RLM and LRM characters).
Comment 4 falko.tesch 2003-10-28 09:38:51 UTC
*** Issue 1035 has been marked as a duplicate of this issue. ***
Comment 5 prognathous 2004-05-03 16:56:35 UTC
Unicode 4.0.1 has recently been released with changes to the properties of
several characters. Once OO (and some other projects) will be updated to comply
with these changes, the HebrewLetter+Hyphen+Number issue will finally be solved.
See for Mozilla's take on the

Note that this bug 19848 has wrongly been marked as a duplicate of this bug
(which has nothing to do with the hyphenation issue), so just to make that this
important update isn't missed, I'm posting it in both bugs. Sorry for the spam.

Please consider reopening bug 19848, or post a new one specifically for
compliance with the aforementioned changes in Unicode.

Comment 6 Oliver Specht 2004-05-19 14:17:43 UTC
Fixed in cws os30
Comment 7 Oliver Specht 2004-05-19 14:18:01 UTC
Comment 8 Oliver Specht 2004-06-03 12:16:44 UTC
Comment 9 Oliver Specht 2004-06-03 12:17:03 UTC
Comment 10 Oliver Specht 2004-06-03 12:17:22 UTC
Comment 11 michael.ruess 2004-06-03 13:46:57 UTC
Comment 12 michael.ruess 2004-06-03 13:47:34 UTC
reassigned to SBA.
Comment 13 michael.ruess 2004-06-03 13:48:20 UTC
Comment 14 stefan.baltzer 2004-06-16 15:42:55 UTC
SBA: "Example 1" works now: The language for CJK and CTL input now gets set
according to the chosen Input method. To be seen like this:

 - Enable CJK and CTL support
 - Switch Keyboard to Hebrew, type something
 - Switch Keyboard to Arabic, type something
 - Select some Hebrew letters 
 - Format-Character, tabpage "Font" -> The CJK language for the Hebrew text is
set to Hebrew.

Note: Works also with CJK languages

This enables (for example) the linguistic components to check only the
respective language in CJK-CJK or CTL-CTL mixed text without the user having to
set the language that is NOT the default CJK/CTL language.

It was dropped to do this for different Western languages. Most of these can be
written with the same keyboard layout and this is what most users do: When I
write an English part in a document, I don't change the keyboard layout. I am
sure most "bilingual western writers" do the same. 

If this would have been implemented accordingly, the input language would always
override the western default language, thus it would be impossible to write in
the default language of a document without having the keyboard set accordingly. 

A scenario: A multilingual manual with different paragraph styles, each style
having a different  western language set. As soon as I type in one of them, the
respective language is kept WITHOUT having to change the keyboard from German to
English, French, Italian. 

In cases of "newly written multilingual text" one has to set the Western
language other than the default one manually. Some hints to ease this:
1. Via character styles (tab page "fonts")
2. Via paragraph styles (tab page "fonts")
3. Via context menu in a misspelled word the online spellchecker has detected as
a word existing in another installed language. Then the context menu offers to
change the language of the word or the entire paragraph (this hard-sets the
character attribute "language").
About Example 2: See issue 18042 for future enhancements of weak characters.

Set to verified.
Comment 15 prognathous 2004-08-05 09:57:54 UTC
The HyphenMinus+Number problem is not fixed. Please re-open this bug, or more
fitting, re-open/undupe bug 19848.

Tested with Writer 1.9.m49

Comment 16 aehrlich 2004-09-12 12:32:07 UTC
SBA, PLEASE think again on "It was dropped to do this for different Western
languages." Look at the 1035 or 5966 that hav been marked duplicate, it's
_different_ keyboard layouts example there (Russian and English), the same being
for e.g. Greek and English. If such IME-reading functionality has already been
developed it should be configurable (switch on/off) to allow users to choose the
right way for them (even for western-only languages, e.g. to define English and
French and assign "United States - Internatlional" to both).
Comment 17 stefan.baltzer 2004-09-13 17:55:30 UTC
SBA: OK in 680m52. Closed.
Note: I reopened issue 1035 because of the unsolved Greek and Russian keyboard
input problem.