Issue 22396

Summary: Can't Use both Hindi and Arabic Numerals
Product: Internationalization Reporter: aminm <persiantools>
Component: BiDiAssignee: stefan.baltzer
Status: CLOSED FIXED QA Contact: issues@l10n <issues>
Severity: Trivial    
Priority: P3 CC: frank.meies, hdu, hennerd, hossein.ir, issues, khirano, munzirtaha, pavel
Version: OOo 1.1Keywords: oooqa
Target Milestone: ---   
Hardware: PC   
OS: Windows, all   
URL: http://specs.openoffice.org/appwide/ctl/TextNumeralsContextMode-Spec.odt
Issue Type: ENHANCEMENT Latest Confirmation in: ---
Developer Difficulty: ---
Issue Depends on:    
Issue Blocks: 79434    
Attachments:
Description Flags
Mixed numerals test case
none
Problem with numerals for Farsi language
none
Problem with footnote numerals in Arabic language
none
show case none

Description aminm 2003-11-12 21:54:26 UTC
If I set Numerals preferances to System, the desired behavior is to type hindi
numerals when Keyboard is switched to Arabic or Farsi AND to type Arabic
numerals when keyboard is switched back to English. This never happens in Write
and it means you must go either with Hindi or Arabic numbers, not both! A
serious issue in bilingual texts!
Comment 1 sforbes 2003-11-17 11:54:48 UTC
Is this by Design? What do other apps (MS word, Koffice) do in this
regard?
Comment 2 aminm 2003-11-17 15:40:26 UTC
Oops I must clarify!
This issue is a restatement of issue #19222
Arabic/Farsi users expect to type in Hindi numerals when they switch
keyboard, EVEN IF their Windows default locale is a latin language.
"System Numerals" setting doesn't solve this. you have to implement
and add a "Context" setting as well to make an auto-selection of the
appropriate numeral context.
In M$ Word XP you set numerals to "Context" and off you go (ie "Hindi"
inside Arabic block and "Arabic" inside Latin block)
I also need to point out that Farsi and Arabic slighty differ in their
numerals representation. Arabic language uses U+0660...U+0669 unicode
range while the same thing in Farsi is represented in U+06F0...U+06F9
range.
Comment 3 Dieter.Loeschky 2003-11-20 16:56:44 UTC
DL->US: Could you please takeover?
Comment 4 Joost Andrae 2004-02-25 18:21:32 UTC
JA->US: wasn't there a possibility to switch the input locale on the fly ?
Comment 5 ulf.stroehler 2004-02-25 19:04:06 UTC
US->JA: you probably refer to setxkbmap, which switches the xkbd module
(keyboard driver) on the fly. But this has nothing to do with the OOo document
locale.

In fact the setting "System" doesn't help here, as you won't switch the desktop
locale for a different numbering scheme. And BTW. OOo wouldn't recognize this 
on the fly anyway.

For a workaround see Format/NumberingBullets/Options/Numbering.

Transferring to SBA for further evaluation.
Comment 6 aminm 2004-03-15 03:03:06 UTC
I recently had a chance to use openoffice under linux. The issue described here
is non-existant in the linux version because of the way KDE switches system
locale when you switch keyboards. However in Windows, system locale is
independent no matter how many keyboard layouts you have installed. Go for the
context mechanism and handle the numeral input and presentation by Office. 
I also protest the decision to lower the priority of the issue, as it cripples
the wide spread use of the application on Windows platform.
Comment 7 stefan.baltzer 2004-03-29 09:11:11 UTC
SBA->Aminm: What exactly is the ONE problem we're focussing on in this issue?
The summary looks like being about EITHER Arabic OR Hindi (or System) to be set
for display and printing, as it can be done in Tools - Options - Language
Setting - Compleyx Text Layout - in the "General Options" list box - "Numerals".
Then you come up with Keyboard Input being different on Linux (KDE) and on
Windows. This doesn't fit into one issue...  Please clarify. Thank You.
Set to invalid until further clarification.
Comment 8 aminm 2004-04-04 12:16:48 UTC
Created attachment 14317 [details]
Mixed numerals test case
Comment 9 aminm 2004-04-04 12:30:06 UTC
Aminm -> SBA
I created an attachment to show you the bug. The first line is the hindi
numerals and the 2nd line is their corresponding arabic numerals. I did it in OO
for linux. Now plz do this in OO for Windows. You can't. Now play with the
numerals options you told me and see what happens.
The context-option-related stuff I described earlier is the MS approach to mixed
numerals problem. You can devise whatever other strategies you wish.
There is also one other issue that needs to be filed separately. Changing fonts
is buggy too.
Comment 10 stefan.baltzer 2004-05-13 14:33:59 UTC
SBA: Now I see. 
Unlike that LINUX IME, the Windows input method does not give the Unicode
character that is needed, only the number and the language. 

At times the workaround is to use insert-special character (select to get a
Hindi number into Arabic text when the setting "Arabic" was choosen (or vice-versa).

SBA->BH: As discussed with OS, it looks like the "context" option is something
we lack at times. To have it similar as MS Word, one more listbox entry
(Context") in Toos-Options-Language Settings-Complex Text Layout , Listbox
"Numerals".
Reopening issue.
Comment 11 stefan.baltzer 2004-05-13 14:35:06 UTC
SBA: Reassigned to BH.
Comment 12 aminm 2004-05-15 08:56:31 UTC
Aminm -> SBA
Thanks for the technical clarification. Please take note of my second post when
you are going to actually implement the required feature for Windows. For Arabic
language (ar) the Unicode numerals block start at U+0660 through U+0669. For
Persian (fa) the corresponding range is between U+06F0 and U+06F9. I'm not sure
Urdu uses which of the blocks.
Comment 13 munzirtaha 2007-07-01 09:20:50 UTC
Fixed the "OS: " to say "Windows, all".
Comment 14 Joost Andrae 2008-07-09 10:36:59 UTC
Retarget issue to 3.1
Comment 15 bettina.haberer 2008-07-09 10:46:27 UTC
Reassigned to requirements.
Comment 16 Mathias_Bauer 2008-07-17 16:02:29 UTC
taking over for now
requirements need to be discussed
Comment 17 hennerdrewes 2008-07-17 20:11:50 UTC
Please note the following prerequisite, if you want to implement the discussed
"Context" option for digit display.

I posted a patch in issue 89825, which fixes the language for numbers in CTL
context. Numerals in writer are until now always classified as LATIN script,
which makes it very hard to implement a context option. With this patch, numbers
following RTL runs will get the correct CTL script, and the numbers get the
correct language attribute (e.g. Arabic).

So, if we have a future SvtCTLOptions::NUMERALS_CONTEXT active, we can simply
pass the current language to OutputDevice::SetDigitLanguage() in
SwTxtSizeInfo::CtorInitTxtSizeInfo().

Comment 18 Mathias_Bauer 2008-07-21 10:16:34 UTC
Thanks for the hint, I will discuss it with fme
Comment 19 frank.meies 2008-09-18 12:24:51 UTC
I suggest to follow hennerdrewes solution. A context mode based on the current
IME is not possible (at least not with a reasonable amount of effort) due to the
following reasons:

1. This would be a Windows only solution, since for *nix we cannot access the
IME information and 
2. We either would have to store the IME information in the file format or we
would have to do a right-after-typing conversion of the unicode characters
according the the current IME.

Let me sum up what we intend to do for this issue: Ascii digits are not longer
automatically associated with the currently active Western language. Instead we
evaluate the context of the digits to determine whether they are in a Western or
CTL context. This is covered by issue 89825. Then for formatting/painting of
digits, the respecitve Western or CTL language will be passed to vcl which then
chooses the glyphs according to that language if the 'Context' mode is set.
Comment 20 frank.meies 2008-09-19 14:02:43 UTC
[...] Instead we evaluate the context of the digits to determine whether they
are in a Western or CTL context. This is covered by issue 89825. [...]

One more remark: Issue 89825 basically classifies numbers which are embedded
into a RTL run as scripttype = CTL. So only for these cases, the CTL language of
the numbers can be evaluated to find the right digits. A 'full' implementation
would require to either perform an expensive post processing of the text in the
application code or to change the script type of a couple of Ascii characters in
i18n from Western to Weak.
Comment 21 frank.meies 2008-09-23 13:30:02 UTC
I take over.
Comment 22 frank.meies 2008-09-29 13:28:51 UTC
Spec available here:

http://specs.openoffice.org/appwide/ctl/TextNumeralsContextMode-Spec.odt
Comment 23 frank.meies 2008-11-27 10:50:13 UTC
Ready for QA.
Comment 24 stefan.baltzer 2008-12-10 09:14:32 UTC
Verified in CWS kashidafix.
Comment 25 hossein.ir 2009-01-22 10:40:19 UTC
I've tested with DEV300m39 (build:9378) on Windows XP. Context option is there,
but only works when applying Arabic language to characters. When using Farsi for
character attributes, it does not use Hindi numerals. Please note that Farsi
uses slightly different set of numerals from Arabic language, as stated by aminm.
Comment 26 hossein.ir 2009-01-22 10:45:27 UTC
Created attachment 59590 [details]
Problem with numerals for Farsi language
Comment 27 hossein.ir 2009-01-22 15:59:04 UTC
Even when using Arabic language for character's CTL language, footnotes do not
obey the rule of the context numerals. They are shown as numerals used in
English text.
I guess this problem may exist in some other automatic numbers rather than
footnotes.
Comment 28 hossein.ir 2009-01-22 16:10:43 UTC
Created attachment 59602 [details]
Problem with footnote numerals in Arabic language
Comment 29 hennerdrewes 2009-01-23 13:23:37 UTC
farsi problem: posted patch in issue 98399
Comment 30 hdu@apache.org 2009-01-23 14:44:56 UTC
@h15n: thanks for testing the fix in DEV300_m39. The other observations are very important too, but they 
deserve different issues, so this resolved resolved issue doesn't get diluted. HennerDrews already opened issue 98399 for the Farsi numerals. Would you open one for remaining problem with wrong numerals in 
footnotes and attach you sample document there?
Comment 31 hossein.ir 2009-01-23 19:58:36 UTC
@hdu: Please see issue 98418 that I've opened.
Comment 32 hdu@apache.org 2009-03-04 13:26:18 UTC
*** Issue 99852 has been marked as a duplicate of this issue. ***
Comment 33 belowsky 2009-04-09 13:50:28 UTC
Using OOo v3.1 RC:
If you write words in English then switch to Arabic and write only numerals,
then switch back to English and write words, you will find that the numeral you
entered are Arabic and not Indian.
In order for context to work correctly, the numbers have to be proceeded by
words from the same context.
Comment 34 belowsky 2009-04-09 14:03:40 UTC
Created attachment 61490 [details]
show case
Comment 35 stefan.baltzer 2009-04-27 12:20:43 UTC
SBA->belowski: See FMEs note from Sep, 18, 2008 about IME detection - This is
what you seem to expect in the  upper example of your PDF attachment.

But there is "no context" around the digits. So the Western (Arab) digit within
EN text and Hindi digit within arabic text string are OK. In the lower example
you see the new "context option" working, digits follow the text around them.

Feel free to re-read the spec:
http://specs.openoffice.org/appwide/ctl/TextNumeralsContextMode-Spec.odt
Capter 1.2 describes the behavior

OK in OOO310_m10. Closed.
Comment 36 hennerdrewes 2009-04-27 13:46:32 UTC
@sba: I am not sure, I completely understood belowsky's comment.

But it is actually possible to override the default "context" behaviour by
inserting LRMs or RLMs before digits. Maybe this little trick should be
described in some documentation?
Comment 37 aminm 2009-05-08 13:46:21 UTC
LRM or RLM do not normally override number presentation as per unicode bidi
algorithm description at http://www.unicode.org/reports/tr9/. However, explicit
directional embeddings or overrides could affect number presentations -
remember, numbers are weak characters in the bidi algorithm.
If you have a look at the section on resolving weak characters, the so called
"context" setting comes into play in how Arabic Numbers and European Numbers are
changed to fit the text run. 
The spec, however, allows implementations to reset numbers to be all Arabic or
all European (Section 4.3). Open Office for windows so far has ignored the
number formatting behaviour in the standard. With all due respect, I think there
is no need to invent a new spec but to use the standard itself as the reference. 
Comment 38 aminm 2009-05-08 13:50:11 UTC
LRM or RLM do not normally override number presentation as per unicode bidi
algorithm description at http://www.unicode.org/reports/tr9/. However, explicit
directional embeddings or overrides could affect number presentations -
remember, numbers are weak characters in the bidi algorithm.
If you have a look at the section on resolving weak characters, the so called
"context" setting comes into play in how Arabic Numbers and European Numbers are
changed to fit the text run. 
The spec, however, allows implementations to reset numbers to be all Arabic or
all European (Section 4.3). Open Office for windows so far has ignored the
number formatting behaviour in the standard. With all due respect, I think there
is no need to invent a new spec but to use the standard itself as the reference. 
Comment 39 hdu@apache.org 2009-05-08 14:09:53 UTC
I think these are all valid concerns. But the original item was fixed and verified, so the change got 
integrated and this issue is closed.

Is it possible to separate the remaining individual concerns as cleanly as possible and file new individual 
issues for each of them?