Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing |
Summary: | Can't Use both Hindi and Arabic Numerals | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Internationalization | Reporter: | aminm <persiantools> | ||||||||||
Component: | BiDi | Assignee: | stefan.baltzer | ||||||||||
Status: | CLOSED FIXED | QA Contact: | issues@l10n <issues> | ||||||||||
Severity: | Trivial | ||||||||||||
Priority: | P3 | CC: | frank.meies, hdu, hennerd, hossein.ir, issues, khirano, munzirtaha, pavel | ||||||||||
Version: | OOo 1.1 | Keywords: | oooqa | ||||||||||
Target Milestone: | --- | ||||||||||||
Hardware: | PC | ||||||||||||
OS: | Windows, all | ||||||||||||
URL: | http://specs.openoffice.org/appwide/ctl/TextNumeralsContextMode-Spec.odt | ||||||||||||
Issue Type: | ENHANCEMENT | Latest Confirmation in: | --- | ||||||||||
Developer Difficulty: | --- | ||||||||||||
Issue Depends on: | |||||||||||||
Issue Blocks: | 79434 | ||||||||||||
Attachments: |
|
Description
aminm
2003-11-12 21:54:26 UTC
Is this by Design? What do other apps (MS word, Koffice) do in this regard? Oops I must clarify! This issue is a restatement of issue #19222 Arabic/Farsi users expect to type in Hindi numerals when they switch keyboard, EVEN IF their Windows default locale is a latin language. "System Numerals" setting doesn't solve this. you have to implement and add a "Context" setting as well to make an auto-selection of the appropriate numeral context. In M$ Word XP you set numerals to "Context" and off you go (ie "Hindi" inside Arabic block and "Arabic" inside Latin block) I also need to point out that Farsi and Arabic slighty differ in their numerals representation. Arabic language uses U+0660...U+0669 unicode range while the same thing in Farsi is represented in U+06F0...U+06F9 range. DL->US: Could you please takeover? JA->US: wasn't there a possibility to switch the input locale on the fly ? US->JA: you probably refer to setxkbmap, which switches the xkbd module (keyboard driver) on the fly. But this has nothing to do with the OOo document locale. In fact the setting "System" doesn't help here, as you won't switch the desktop locale for a different numbering scheme. And BTW. OOo wouldn't recognize this on the fly anyway. For a workaround see Format/NumberingBullets/Options/Numbering. Transferring to SBA for further evaluation. I recently had a chance to use openoffice under linux. The issue described here is non-existant in the linux version because of the way KDE switches system locale when you switch keyboards. However in Windows, system locale is independent no matter how many keyboard layouts you have installed. Go for the context mechanism and handle the numeral input and presentation by Office. I also protest the decision to lower the priority of the issue, as it cripples the wide spread use of the application on Windows platform. SBA->Aminm: What exactly is the ONE problem we're focussing on in this issue? The summary looks like being about EITHER Arabic OR Hindi (or System) to be set for display and printing, as it can be done in Tools - Options - Language Setting - Compleyx Text Layout - in the "General Options" list box - "Numerals". Then you come up with Keyboard Input being different on Linux (KDE) and on Windows. This doesn't fit into one issue... Please clarify. Thank You. Set to invalid until further clarification. Created attachment 14317 [details]
Mixed numerals test case
Aminm -> SBA I created an attachment to show you the bug. The first line is the hindi numerals and the 2nd line is their corresponding arabic numerals. I did it in OO for linux. Now plz do this in OO for Windows. You can't. Now play with the numerals options you told me and see what happens. The context-option-related stuff I described earlier is the MS approach to mixed numerals problem. You can devise whatever other strategies you wish. There is also one other issue that needs to be filed separately. Changing fonts is buggy too. SBA: Now I see. Unlike that LINUX IME, the Windows input method does not give the Unicode character that is needed, only the number and the language. At times the workaround is to use insert-special character (select to get a Hindi number into Arabic text when the setting "Arabic" was choosen (or vice-versa). SBA->BH: As discussed with OS, it looks like the "context" option is something we lack at times. To have it similar as MS Word, one more listbox entry (Context") in Toos-Options-Language Settings-Complex Text Layout , Listbox "Numerals". Reopening issue. SBA: Reassigned to BH. Aminm -> SBA Thanks for the technical clarification. Please take note of my second post when you are going to actually implement the required feature for Windows. For Arabic language (ar) the Unicode numerals block start at U+0660 through U+0669. For Persian (fa) the corresponding range is between U+06F0 and U+06F9. I'm not sure Urdu uses which of the blocks. Fixed the "OS: " to say "Windows, all". Retarget issue to 3.1 Reassigned to requirements. taking over for now requirements need to be discussed Please note the following prerequisite, if you want to implement the discussed "Context" option for digit display. I posted a patch in issue 89825, which fixes the language for numbers in CTL context. Numerals in writer are until now always classified as LATIN script, which makes it very hard to implement a context option. With this patch, numbers following RTL runs will get the correct CTL script, and the numbers get the correct language attribute (e.g. Arabic). So, if we have a future SvtCTLOptions::NUMERALS_CONTEXT active, we can simply pass the current language to OutputDevice::SetDigitLanguage() in SwTxtSizeInfo::CtorInitTxtSizeInfo(). Thanks for the hint, I will discuss it with fme I suggest to follow hennerdrewes solution. A context mode based on the current IME is not possible (at least not with a reasonable amount of effort) due to the following reasons: 1. This would be a Windows only solution, since for *nix we cannot access the IME information and 2. We either would have to store the IME information in the file format or we would have to do a right-after-typing conversion of the unicode characters according the the current IME. Let me sum up what we intend to do for this issue: Ascii digits are not longer automatically associated with the currently active Western language. Instead we evaluate the context of the digits to determine whether they are in a Western or CTL context. This is covered by issue 89825. Then for formatting/painting of digits, the respecitve Western or CTL language will be passed to vcl which then chooses the glyphs according to that language if the 'Context' mode is set. [...] Instead we evaluate the context of the digits to determine whether they are in a Western or CTL context. This is covered by issue 89825. [...] One more remark: Issue 89825 basically classifies numbers which are embedded into a RTL run as scripttype = CTL. So only for these cases, the CTL language of the numbers can be evaluated to find the right digits. A 'full' implementation would require to either perform an expensive post processing of the text in the application code or to change the script type of a couple of Ascii characters in i18n from Western to Weak. I take over. Spec available here: http://specs.openoffice.org/appwide/ctl/TextNumeralsContextMode-Spec.odt Ready for QA. Verified in CWS kashidafix. I've tested with DEV300m39 (build:9378) on Windows XP. Context option is there, but only works when applying Arabic language to characters. When using Farsi for character attributes, it does not use Hindi numerals. Please note that Farsi uses slightly different set of numerals from Arabic language, as stated by aminm. Created attachment 59590 [details]
Problem with numerals for Farsi language
Even when using Arabic language for character's CTL language, footnotes do not obey the rule of the context numerals. They are shown as numerals used in English text. I guess this problem may exist in some other automatic numbers rather than footnotes. Created attachment 59602 [details]
Problem with footnote numerals in Arabic language
farsi problem: posted patch in issue 98399 @h15n: thanks for testing the fix in DEV300_m39. The other observations are very important too, but they deserve different issues, so this resolved resolved issue doesn't get diluted. HennerDrews already opened issue 98399 for the Farsi numerals. Would you open one for remaining problem with wrong numerals in footnotes and attach you sample document there? @hdu: Please see issue 98418 that I've opened. *** Issue 99852 has been marked as a duplicate of this issue. *** Using OOo v3.1 RC: If you write words in English then switch to Arabic and write only numerals, then switch back to English and write words, you will find that the numeral you entered are Arabic and not Indian. In order for context to work correctly, the numbers have to be proceeded by words from the same context. Created attachment 61490 [details]
show case
SBA->belowski: See FMEs note from Sep, 18, 2008 about IME detection - This is what you seem to expect in the upper example of your PDF attachment. But there is "no context" around the digits. So the Western (Arab) digit within EN text and Hindi digit within arabic text string are OK. In the lower example you see the new "context option" working, digits follow the text around them. Feel free to re-read the spec: http://specs.openoffice.org/appwide/ctl/TextNumeralsContextMode-Spec.odt Capter 1.2 describes the behavior OK in OOO310_m10. Closed. @sba: I am not sure, I completely understood belowsky's comment. But it is actually possible to override the default "context" behaviour by inserting LRMs or RLMs before digits. Maybe this little trick should be described in some documentation? LRM or RLM do not normally override number presentation as per unicode bidi algorithm description at http://www.unicode.org/reports/tr9/. However, explicit directional embeddings or overrides could affect number presentations - remember, numbers are weak characters in the bidi algorithm. If you have a look at the section on resolving weak characters, the so called "context" setting comes into play in how Arabic Numbers and European Numbers are changed to fit the text run. The spec, however, allows implementations to reset numbers to be all Arabic or all European (Section 4.3). Open Office for windows so far has ignored the number formatting behaviour in the standard. With all due respect, I think there is no need to invent a new spec but to use the standard itself as the reference. LRM or RLM do not normally override number presentation as per unicode bidi algorithm description at http://www.unicode.org/reports/tr9/. However, explicit directional embeddings or overrides could affect number presentations - remember, numbers are weak characters in the bidi algorithm. If you have a look at the section on resolving weak characters, the so called "context" setting comes into play in how Arabic Numbers and European Numbers are changed to fit the text run. The spec, however, allows implementations to reset numbers to be all Arabic or all European (Section 4.3). Open Office for windows so far has ignored the number formatting behaviour in the standard. With all due respect, I think there is no need to invent a new spec but to use the standard itself as the reference. I think these are all valid concerns. But the original item was fixed and verified, so the change got integrated and this issue is closed. Is it possible to separate the remaining individual concerns as cleanly as possible and file new individual issues for each of them? |