Issue 100737

Summary: Signed numbers displayed incorrectly in RTL Calc
Product: Calc Reporter: alan
Component: formattingAssignee: ooo
Status: CONFIRMED --- QA Contact:
Severity: Trivial    
Priority: P3 CC: chrcm, elisko, farzaneh, hdu, hennerd, issues, niklas.nebel, ooo, samphan, thomas.lange, yba
Version: OOo 3.0.1Keywords: BIDI, needhelp
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: DEFECT Latest Confirmation in: ---
Developer Difficulty: ---
Issue Depends on:    
Issue Blocks: 114236    
Attachments:
Description Flags
Patch for Hebrew number formats
none
In RTL OOo, places an LRM before an opening minus sign in number formats none

Description alan 2009-04-01 08:20:35 UTC
To reproduce the bug:

a) set SAL_RTL_ENABLED to ”TRUE”
b) start OOo and open a spreadsheet
c) type “-3” in a cell
d) the number appears as “3-”
Comment 1 alan 2009-04-03 14:43:48 UTC
This is a regression bug. The behavior is fine in 2.3.1. Beginning in 2.4, the
bug appears.
Comment 2 alan 2009-04-06 14:29:02 UTC
Has anyone looked at this bug? This is a very serious problem for RTL users. 

In Impress, "-3" appears ok if the text direction is LTR. Change the text dir to
RTL, and you'll see "3-".

In Calc, it shows up as "3-" even if the text dir is LTR, as long as the UI
language is RTL.

Is this a problem in the edit engine? Is it a problem in the calc code? It seems
that the regression occured between 2.3.1 and 2.4? Where in the code should I
starts to look in order to fix it? 

TIA for feedback.
Comment 3 thomas.lange 2009-04-06 15:03:01 UTC
On which platforms does this occur?
Maybe Unix/Linux only or is it Windows as well?


Comment 4 hdu@apache.org 2009-04-06 15:06:22 UTC
confirmed on OOO310_m9 on UNX
Comment 5 oc 2009-04-06 15:52:19 UTC
Hi Niklas, please have a look
Comment 6 hdu@apache.org 2009-04-06 16:02:05 UTC
ICU's BiDi algorithm claims that the correct visual order for signed numbers in RTL contexts is digits_left 
sign_right (e.g. 123+). Writer also does it that way, but the EditEngine is different (don't know why yet. TL?
).
Comment 7 alan 2009-04-06 17:51:54 UTC
This may be related to the change in the bidi type of minus hyphen from Unicode
4.0 to Unicode 4.0.1, as described in issue 57833. Eike wrote that as of m197,
we're using Unicode 5.0.0.
Comment 8 alan 2009-04-06 18:00:29 UTC
Though Writer places the sign to the right, and the digits to the left, popular
usage (at least for Hebrew) is otherwise. A user who types the number "-3"
generally expects to the the sign on the left. At present, in Calc, the sign is
always to the right, and it can't be changed without using formatting characters.
Comment 9 alan 2009-04-06 18:03:15 UTC
-->tl: This occurs on both Linux and Windows.
Comment 10 hdu@apache.org 2009-04-07 07:48:58 UTC
So, according to the latest unicode standard and to issue 57833 the numbers are layouted perfectly.

If the popular usage of signed numbers in RTL-enabled spreadsheet applications is different from the 
unicode standard, then Calc's number formatter should be adjusted to it. Maybe by inserting BiDi-
markers, maybe by using related plus-minus codepoints that have more matching BiDi-properties.
Comment 11 ooo 2009-04-07 11:13:23 UTC
I guess this is _only_ for Hebrew, isn't it? Not other RTL locales?
Comment 12 hennerdrewes 2009-04-07 11:32:07 UTC
we should ask some native speakers...
Comment 13 alan 2009-05-04 15:58:51 UTC
Here is a patch which almost solves the problem for Hebrew, and leaves handling
for other RTL languages as it is. It sets all of the number formats for Hebrew
to display the minus sign to the left, by adding an LRM and a minus sign before
a negative number. Unfortunately, I can't modify the number format "Default",
which will still display the minus sign to the right. For the Hebrew version, we
can deal with this in an ugly way by including a default template which sets the
default number format to a format other than "Default". But I would prefer if
there was a way that I could get the format "Default" to also display the minus
sign to the left. Suggestions are welcome.
Comment 14 alan 2009-05-04 16:00:11 UTC
Created attachment 61982 [details]
Patch for Hebrew number formats
Comment 15 cemu 2009-05-16 23:49:48 UTC
In arabic, the algebraic sign is like in french or german always on the left
side of the number, e.g. "-3", "-12321", NEVER "3-". Arabic as script is RTL,
but numbers in the text including algebraic signs are "LTR", so the number is
always on the left side of the number.
http://ar.wikipedia.org/wiki/%D8%A3%D8%B9%D8%AF%D8%A7%D8%AF_%D8%B3%D8%A7%D9%84%D8%A8%D8%A9_%D9%88%D9%85%D9%88%D8%AC%D8%A8%D8%A9

Comment 16 alan 2009-05-17 17:46:15 UTC
ayaniger->farzanehs:
Does Persian place the minus sign to the right of a number or to the left?
Comment 17 alan 2009-05-19 12:53:00 UTC
-->cmu
Here's a generic RTL patch, which will work for Arabic as well as Hebrew. It
supersedes the previous Hebrew patch I posted.

-->er
So far, we know that Hebrew and Arabic place the minus sign to the left, and I'm
still waiting for answers from the Urdu, Thai, and Persian project leads. I'll
post the patch, so it can be integrated when you think we have enough of a
consensus.
Comment 18 alan 2009-05-19 12:54:50 UTC
Created attachment 62363 [details]
In RTL OOo, places an LRM before an opening minus sign in number formats
Comment 19 samphan 2009-05-20 07:49:00 UTC
Thai is LTR CTL so this issue does not apply.
Comment 20 alan 2009-05-20 09:29:26 UTC
Behdad Esfahbod has written me that Persian, like Hebrew and Arabic, places the
minus sign on the left. I think we have enough of a consensus now to put the
minus sign on the left by default for RTL, and use the patch I've posted.
Comment 21 hdu@apache.org 2009-06-15 12:08:16 UTC
added the PATCH flag
Comment 22 ooo 2009-06-15 14:09:57 UTC
Though the patch in the number formatter (or the modified format codes) may cure
the primary symptom I doubt it is what we want, it might create new problems.
Preceding already the raw formatted string with LRM would not only insert the
LRM for display purposes, but would also include it in every other string
operation, such as copy&paste via clipboard and writing to document files.
Parsing such string may or may not work, depending on whether the target
application ignores a LRM.

Resetting issue from PATCH to DEFECT for this reason.

Instead, the LRM could be inserted only if the string is to be displayed or
printed. For Calc, that could be in ScDrawStringsVars::SetText() if the original
cell data is numeric, I guess.
Comment 23 alan 2009-06-16 09:24:44 UTC
If the change is to made for each application where the symptom occurs, this
would have to be fixed not only in calc, but also in Writer tables with number
recognition.

I'm wondering though how serious a concern it should be that a target
application would ignore an LRM. Wouldn't it be fair to assume that any
application which would be Unicode-aware enough to reverse the order of "-3"
because of the new Unicode Bidi properties of minus/hyphen would be
Unicode-aware enough to support an LRM? 
Comment 24 ooo 2009-06-16 12:11:27 UTC
> If the change is to made for each application where the symptom occurs, this
> would have to be fixed not only in calc, but also in Writer tables with number
> recognition.

Yes. And Impress tables, Chart data tables, and maybe more. The actual
logic and insertion could be still made in the number formatter, with
the method getting passed an argument whether it should insert a LRM.

> I'm wondering though how serious a concern it should be that a target
> application would ignore an LRM. Wouldn't it be fair to assume that any
> application which would be Unicode-aware enough to reverse the order of "-3"
> because of the new Unicode Bidi properties of minus/hyphen would be
> Unicode-aware enough to support an LRM? 

Not necessarily. I wasn't referring to reversal of "-3". For a simple
example take the export of numbers to CSV, you'd have a field content
with <LRM>-3. An application parsing that file may not recognize the
data being numeric if it does not ignore the LRM character.

When copying data via clipboard, the pasting application inserts data as
is, in an RTL environment you'd end up with the logical sequence
ABC <LRM>Data DEF.
Now, how is that supposed to be displayed?

I don't think the LRM belongs into data because it is a mere display
option.
Comment 25 kaplanlior 2010-08-14 16:43:53 UTC
This problem still appears at OpenOffice.org 3.2.1 (OOO320m19).