Apache OpenOffice (AOO) Bugzilla – Issue 86811
WW8: arabic/hindi numbers instead of decimal numbers - approved
Last modified: 2013-08-07 14:43:00 UTC
i opened a doc file with decimal numbers and all the numbers are shown in Arabic numbers. it used to be just fine with version 2.3 (i use debian sid OOo 2.4rc2-1) i didn't change the "view numbers as..." option ! of OOo preferences. this is a good view (version 2.3): http://picasaweb.google.com/nadavkav/Screen_capture/photo#5175292504552183922 this is a buggy view (version 2.4rc2): http://picasaweb.google.com/nadavkav/Screen_capture/photo#5175293123027474562 and i'm attaching the original document too.
Created attachment 51975 [details] decimal numbers are shown in arabic
MRU->HBRINKM: see attached documents, the numbers are imported as Hebrew symbols.
Most of the Hebrew documents are showing Hindi numbers instead of Arabic for me. Both on Ubuntu 8.10 beta - OO 2.4 RC2. And on Windows using OO 2.4
Bug is still valid in 2.4 final
On OO.o 2.4 final, both on Windows Xp and on Ubuntu 8.04 beta (updated) I still get this bug. However, it doesn't always happen. There are even documents that show some numbers normally and some numbers in the wrong type of digits (arab/hindy, watever... :) I can dig up such document if needed.
See another testcase and screenshots at #87625. Happens to me with 2.4.0 on Debian and Windows XP. Sorry for the duplicate bug. I removed my profile (.openoffice.org/ dir) to verify that problem isn't caused by an upgrade path.
This bug also exist in writer 3.0.0 beta macOSX link to a picture: in writer with this bug, the numbers are indic (used in arabic) and in neo the numbers are correct, arabic numbers (used in hebrew): http://picasaweb.google.com/kobi.zamir/Kzamir/photo#5183773634352557906 1. this is an imported .doc document. 2. some of the numbers are displayed correctly in the same document.
sad but true
*** Issue 87625 has been marked as a duplicate of this issue. ***
this is very upsetting, i allmost got my school using openoffice, and i am glad they didnt, i would have been ashamed. i had all 2.4 releases, they all had the bug
This problem means that many people that are just wish to try Open Office/Linux, will go away from this platforms, and probably will never wish to return to OpenOffice or Linux because of this bug. This problem imho must be solved asap, and should be marked as a *critical* (P1) bug, because it turns OpenOffice to be unusable to an entire group of people that uses Hebrew in OpenOffice.
This is a blocker for all who use Hebrew in their documents. The priority should be elevated to P1/P2. No one can wait for OOo 3.0 for the fix. Wait a few days more and the press in Israel will be on it and burn all what was built with OO market penetration.
I also think the priority should be elevated to P2. This bug makes OOo totally unusable for documents with Hebrew content in them. There is no workaround, no "living with it", nothing but a downgrade to 2.3.1 which is the last version without this but. The effect will be even worse if upcoming versions of linux distros will use this OOo version as the default for fresh installs. It will make the freshly installed OS totally unusable for Hebrew speakers forcing them to either run back screaming to their previous OS or downgrade (while probably breaking their package management) to OOo 2.3.1.
zoharsnir raised a very valid point. OOo is a major FOSS project. This bug is very bad for Israeli open-source advocates.
While I agree that this should be highly prioritized, I don't think we should argue for linux-distro issues on a OOo bug. If you want to stop your distro from moving to an OOo version with this bug, file a bug in the distro, like this one: https://bugs.launchpad.net/ubuntu/+source/openoffice.org/+bug/210204. If enough of the distros agree to hold the OOo version for it, this will add pressure to fix it quick.
I agree with the above said. Bad Hebrew in Open Office is bad for all FOSS advocacy in Israel. Israel is a small country and the only office suite availible here is MSOFFICE. Hence, this is a reall blocker for migration for FOSS. OOo suite has started to gain ground as a viable alternative, and this for sure will be a damage. Most new users who will use this version in Israel are very likely to 'burnt' for a long time. And it'll be hard to convince them that OOo suite is an alternative to MSOFFICE even long after this bug was fixed. Please, fix this bug ASAP.
As many allready noted this is serios problem for isreally FOSS. There is a serius aspect that since many Acdemic places are migarting to OO this will make totaly unusable to students.
Most of the Hebrew documents are showing Hindi numbers instead of Arabic. Using OOo 2.40rc2 on Ubuntu 8.04beta. I also rechecked language settings, tried to change numerics between system, Hindi and Arabic and nothing change in the document, still Hindi numerics. This is a usability disaster. Just can't use OOo 2.40 until it's fixed.
We have implemented a more "word compatible" way to display Hindi numbers in Word documents, see http://wiki.services.openoffice.org/wiki/Import_of_Hindi_numbers_from_Microsoft_Word_documents Maybe we have overlooked something or we misinterpreted the way how Words treats these documents. You could speed up the bug fixing by studying the spec and telling us if the spec already contains a design flaw or if we probably have a bug in the implementation.
mba: i read the wiki and as far as what we (in Israel) call the "123..." numbering system, we call it: the "decimal system" = base 10 and the other "١٢٣..." is the Arabic numbering system. I have been to India and as far as i remember they use the "123..." system and they don't call "١٢٣..." the Hindi numbering (may be the Indian Muslims use it ? i do not know) so i guess the wiki page is not correct (maybe the Unicode standard defines it in a different way ? i do not know :-) if it helps in any way... i made 4 test documents: OOo 2.3.1 localized Hebrew --> doc format OOo 2.3.1 localized Hebrew --> odt format OOo 2.4 rc2 --> doc format OOo 2.4 rc2 --> odt format in all i wrote two lines 1234567890 ١٢٣٤٥٦٧٨٩٠all of the opened correctly in both version of OO every body else: please, let us NOT get so much excited, i am sure this issue will be solved soon and if you are using serious Hebrew word processing and other tools in the OOo suite you should probably be using the localized Hebrew version of tk systems - OO v 2.3.1 which works very (very) well and has MANY things not implemented in the upstream OOo version 2.3.x / 2.4 . :-)
Well, according to Wikipedia, if I got it correctly (it's quite confusing): 0123... is the Arabic Numerals ٠١٢٣... is the Eastern Arabic Numerals References: http://en.wikipedia.org/wiki/Eastern_Arabic_numerals http://en.wikipedia.org/wiki/Arabic_numerals
1. the spec is wrong: """When a digit is marked to have CTL script in the imported Word document it shall be imported as Hindi digit.""" should be: When a digit is marked to have ARABIC script in the imported Word document it shall be imported as Hindi digit Hebrew and Japanese are a CTL languages and they use Arabic digits. Only Arabic use the Hindi digits. 2. The implementation is wrong: """If the configuration item RegardHindiDigits is set""" Using RegardHindiDigits=true all numbers are Hindi digits, See the 55 in the first line in the image: http://picasaweb.google.com/kobi.zamir/Kzamir/photo#5184709675230080882 But when usnig RegardHindiDigits=false some numbers are Hindi and some Arabic, See second line 9-13 in the image: http://picasaweb.google.com/kobi.zamir/Kzamir/photo#5184709658050211682
mba and kzamir: First, kzamir is mostly right about the spec: although there are other languages which also use Hindi digits (like Farsi), having the number characters marked CTL should not be enough to make them Hindi. Second, I didn't realize that RegardHindiDigits referred to the option setting that appears in kzamir's screenshots. I just tried it, and I can't make that string appear in any of the openoffice configuration file (even when the setting is changed as in the screenshot). Further, I don't have any Writer.xcs file (or any other .xcs file), I have Writer.xcu, but as said, it doesn't have a RegardHindiDigits string in it. Or, for that matter, FilterFlags or WW8. I'm using 2.4.0~rc6 on Debian unstable. I would expect, indeed, that when someone says "I want to use Hindi digits" (as kzamir seems to have done), all digits would be Hindi. mba, How is a user supposed to control RegardHindiDigits?
adebr: thank you for the wikipedia links :-) after reading as much as i could... i feel you should ignore most of what i wrote in comment #21 (i was probably sleeping when the math teacher mentioned that ;-)
IMHO it would be valid to assume that anyone using an English distro can understand "decimal" (1,2,3...) numerals, while only some can understand Hindi numerals (Ù¡,Ù¢,Ù£...). Therefore it would be prudent to display decimal numerals by default - and only display hindu numerals in cases where it was 100% certain that the use of them was intended. A document originally intended to display Hindu numerals would still be legible if the numerals were converted to decimal, and the problem easily corrected by using an appropriate font. A document originally intended to display Decimal numerals rendered with Hindu numerals would become illegible, and is NOT easily recoverable. I STRONGLY suggest - at least as a temporary fix - adding an option under "languge settings" to disable the rendering of hindu numerals. This bug is absolutely crippling - you should see how my physics articles look like now...
barad said: """ A document originally intended to display Hindu numerals would still be legible if the numerals were converted to decimal, and the problem easily corrected by using an appropriate font. """ This is wrong factually (a big part of the point is that OOo keeps the text in Unicode, and I really hope there is no Unicode font which gives U+0033 -- Arabic numeral 3 -- the form Ù£). Unless you happen to be a native-level Arabic or Farsi user (I'm assuming you aren't), this is also wrong-headed; don't tell other people what's important to them.
shai2platonix: What I'm basically saying - is that the ENGLISH distro should take special care in not replacing decimal numerals with hindi numerals. My assumption is that the vast majority of english distro users understand decimal numerals, and do not understand hindi numerals... I also still maintain that since this is largely a problem for imported .doc files - changing decimal numerals to a font with only hindu numerlas can serve as a temporary fix - while there seems to be no reverse solution. hence the urgency of the issue - the English distro should not be displaying hindu numerals in an english language file - just because it was written on a localized CTL word disto.
barad: 1. I assume when you speak of "distro" you mean the interface language -- well, while it is safe to assume users who choose this interface will understand Arabic (Western) numerals, it is NOT safe to assume that having Hindu numerals converted to Western is acceptable to them. This is the assumption you make when you propose that under English interface, it is better to err on the side of Western numerals; this is what I called wrong-headed. 2. You seem to miss the Unicode point entirely. Part of the problem is that while MSOffice represents characters in 8-bits encodings, where numbers are only encoded once, and the difference between 3 and Ù£ is mainly one of font; OOo keeps texts in Unicode, where 3 and Ù£ are different characters. If there is a font which is A) Unicode, and so fit for use in OOo and B) the character '3' gets the graphic form Ù£, then that font is buggy and needs to be killed. In OOo, you CANNOT turn a '3' into a 'Ù£' by changing the font. 3. The problem, as far as I have seen, is actually not with English files -- it's with Hebrew files. If you follow mba's links, you'll find that the problem -- apparently -- is that the current implementation decides that any CTL-enabled .doc file should use Hindu numerals; most English docs are not CTL-enabled. Do you think you'd see mostly Israelis on this bug if the bug showed on English doc files?
shai2platonix: Points 1 and 2 are well taken - I agree. This problem can occur in english-only files. The localized hebrew version of Microsoft word (hebrew interface) seems to save files as CTL enabled by default, regardless of the actual language used in the file. I'm assuming this is the case with othe localized CTL versions of word. I will attempt to post and example later on. The infrastructure for a quick patch to resolve this issue seems to exist under: Tools -> options -> language settings -> complex text layout -> general options (Numerals: Arabic/Hindi/System) But changing this either does not work, or only applies to new files. I recommend using a workaround like this as a temporary solution, then fully addressing the issue in v3.0 This issue has rendered a much needed upgrade unusable by thousands of high-school and university students, teachers and staff.
Target should be 2.4.1.
mba -> could you update the version the bug applies to "OOo 2.4.0" ? Thanks.
Done. I also added the "regression" keyword. We will try to provide a fix ASAP.
The spec at http://wiki.services.openoffice.org/wiki/Import_of_Hindi_numbers_from_Microsoft_Word_documents states that the conversion to Hindi numbers should be dependent on a setting called "RegardHindiDigits", which defaults to false. The implementation in ww8par.cxx, however, does not relate to any such setting. The conversion is performed on any CTL text. Also, as said before, a translation to Hindi numbers should be language dependent, not CTL dependent. But most seriously: The entire approach of handling the digit conversion does not match the inner handling of digts in OOo, and as such must lead to trouble. If you open an Arabic language word document, and you system settings are set "Show Hindi numbers", writer would have shown Hindi numbers in OOo 2.3.1 and before. If you want to display in Hindi numbers in OOo, you have to set the display settings to "Show Hindi numbers". Within the the document, the numbers are treated as 0x0030 .. 0x0039 values and not as 0x0660 .. 0x0669. Digits are stored logically as digits, and that's it. And there is an options of displaying in several shapes. As such, I suggest to revert the changes to ww8par.cxx to the pre-2.4 state. The remaining issue is not an issue of MS Word import. It is an issue that the OOo handling of digits could be enhanced. In addition to the three existing settings (system, Arabic, Hindi), there should be a language dependent one. If a numeral appears e.g. in a Arabic context, the text display would be done with Hindi numbers. This would give much more flexibility in number display, as it would be more document dependent and not only system setting dependent. vcl offers all the possibilities to "SetDigitLanguage" in many ways. There should a (not to complicated) way to connect text language to the digit language.
Although it seems a little ridiculous to post a patch for this one, I'll attach a fix. It removes the changes introduced for OO 2.4. Maybe like this we can speed things up...
Created attachment 52563 [details] revert file sw/source/filter/ww8/ww8par.cxx to its previous version
mba- Why is this issue still P3?
I don't have enough time to wonder whether this is a P2 or a P3. :-) It doesn't matter. The target "2.4.1" says it all: it will be fixed ASAP.
np :) Adjusting to P2, just for bookkeeping sake.
Does anybody have a sample .doc file with *English* text where digits show up as Eastern Arabic digits in OOo? barad?
I created two files: english-with-numbers.doc hebrew-with-numbers.doc Both created with MS Word 2003 SP3 (hebrew version + english interface pack) on WinXP SP2. Opened both of them with both OOo portable (from portableapps.com, they don't change the source app, just repack it) versions 2.3.1 and 2.4 (it's a prerelease 2 of portableapps based on final OOo 2.4). english-with-numbers.doc is displayed ok (arabic) in 2.3.1 and 2.4 hebrew-with-numbers.doc is displayed ok (arabic) in 2.3.1 but on 2.4 it's wrong (eastern-arabic). It's not what you asked for but I'll attach it anyway in case it will help anyone.
Created attachment 52727 [details] simple English with numbers document created with Hebrew MS Word 2003 SP3 on WinXP SP2
Created attachment 52728 [details] simple Hebrew with numbers document created with Hebrew MS Word 2003 SP3 on WinXP SP2
tml said: "Does anybody have a sample .doc file with *English* text where digits show up as Eastern Arabic digits in OOo? barad?" Not exactly, but I was having that problem (english OO.o w/hindi/eastern arabic numbers) when exporting to .pdf. See my (apparently related) bug at http://www.openoffice.org/issues/show_bug.cgi?id=87669. I did hear back from my math professor that the .doc created from the same .odt in 2.4 had some 'boxes where numbers should be' which may have been the same issue and she doesn't have hindi fonts or something. Lemme see if I can find the original .doc version I submitted to her. It looked fine to me (alignment issues) but I'll look at it again in 2.3.
nevermind, my english .doc showed all the numbers and symbols and everything just fine in oo 2.3.1. I can still upload (preferably email, the full version contains personal information and silly math homework) the original .doc if someone wants to look at it in MS Office, but I'm sure if I edit out stuff other than the math and re-save any problems might dissappear.
tml: I believe the behavior I saw was euther due to the fact that the hebrew supporting versions of office 97 (possibly xp as well?) use hnormal.dot as the default template. Either that, or the document contained an "invisible" CTL character in vicinity to the numerals. I'm having trouble reproducing the bug, due to lack of access to office 97, and since a recent OOo update seems to have fixed the problem - all attached files in the report now display correctly as long as Tools -> options -> language settings -> complex text layout -> general options -> Numerals Are set to arabic, if they are set to hindi - all numerals display in hindi, regardless of the language of the file. My current version is 1:2.4.0-3ubuntu2 (Ubuntu Hardy)
kalpan -> barad: This bug should already been fixed in Ubuntu (1:2.4.0-3ubuntu2). So I'm not surprised you can't reproduce. See (in Hebrew) http://linmagazine.co.il/node/view/47599 for more details.
->hbrinkm: officecfg/..../Writer.xcs and sw/source/filter/ww8/ww8par.?xx contain the configuration changes.
fixed. Now mapping Arabic to Hindi numbers iff language is appropriate. Note: specification needs update. I will give note here if the update is done.
Specification changed. It can be found here: http://wiki.services.openoffice.org/wiki/Import_of_Hindi_numbers_from_Microsoft_Word_documents#Detailed_Specification
I don't know if the current implementation follows the current specification, but if it does, it is incomplete -- we're likely to see a bug soon, calling "arabic/hindi numbers instead of persian/east-hindi numbers". Granted, this bug is a lot less severe (as 7 of the figure glyphs are actually the same), but still. I added this as an open issue in the specification.
@shai2platonix: In the fixed version we will replace arabic numbers by hindi numbers only if the text is "arabic". In my understanding that means that farsi or urdu documents will keep their arabic numbers (pre-2.4 behavior). Or do you expect/see something different?
@mba: Yes and no. A. I was talking about Farsi numerals, not western numerals in Farsi documents. I trust that the latter are not marked as Arabic or Farsi characters in .doc files. B. My comment was wrong: The current spec will not cause Farsi numerals to turn into Hindi, but into Arabic (western) numerals. I suspect that would be sub-optimal too, though I'm not a Farsi user. Judging by the preferences UI, though, Farsi numerals are probably not supported by OpenOffice.org at all, so there's bigger problems than .doc imports there.
What makes you think that? We are changing arabic numbers into hindi numbers. I don't think know farsi numbers but I assume that they have different UniCode values than arabic numbers and so won't be converted. And you also should consider that we do the change only for text with an "arabic" language attribute. If you have an example document that shows the problem you suspect to happen, please attach it to this issue so that someone can check that. We still have some days time to fix potential problems.
@mba: @shai2platonix: As I stated in my previous comment: The actual problem is not a problem of Word import. It is a limitation in OOo regarding a language dependent display of digits. It lacks a "context" option, for details see issue 22396. In my opinion efforts should be made to fix the real cause of the problem (issue 22396). Most things said in this discussion will turn out to be irrelevant then...
@mba: The way I understand it, Word does not actually provide the Unicode code of the character, but, for numerals, just the ASCII code for the equivalent Arabic (western) numeral and a context specifying the language. If that is the case, and the spec ignores languages other than Arabic, numbers in all such languages will be translated to western numerals on import -- because Word would display (5,farsi) as an upside-down heart (U+06f5), while OOo would only see the 5. If Word did store Unicode characters, this whole issue would be moot, wouldn't it? To my regret, I cannot produce a test doc -- as I said, I am not a Farsi user. @hennerdrewes: Of course you are right, Word import and Windows IME are two instances of the same problem.
added "approved" to the title, because it will be easier to work with the 2.4.1 meta issue during release status meetings.
Automation possible.
fixed on os114 (Target 2.4.1)
Ready for QA
Verified fix in CWS os114.
I encounter the problem in OOo 3.0 beta loading a hebrew Microsoft Word 2007 document : numbers are shown in arabic.
*** Issue 89264 has been marked as a duplicate of this issue. ***
I don't know if I am doing it right as this is my first QA report for an open source project and i must tell you guy you are doing great job and here is my small contribution ;) I have upgraded to OOo 3.0 and the problem still occur, Open a Hebrew file in the OOo 3.0 and the decimal numbers are showing in Arabic. For me this problem is a killer for switching for OOo. I guess some other people as well. Cheers.
The bug still exists in 3.0 beta. This bug says it is fixed does that mean we should open a new ticket for the same issue?
The has not been integrated into the Master builds. It is fixed in an internal test build (CWS) and will be integrated into the product soon - so you needn't file a new issue for any version.
Checked fix in OOH680m17 and DEV300m18.
How do we know if it's integrated into some version or not?