Issue 112144

Summary: l10n: Insert LRM automatically for Latin words in RTL mode
Product: Internationalization Reporter: mbnoimi <mbnoimi>
Component: codeAssignee: AOO issues mailing list <issues>
Status: CONFIRMED --- QA Contact:
Severity: Trivial    
Priority: P2 CC: hdu, issues
Version: OOo 3.2.1 RC2   
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: FEATURE Latest Confirmation in: ---
Developer Difficulty: ---
Attachments:
Description Flags
Correct rendering (see "C++" word)
none
Wrong rendering (see "C++" word) none

Description mbnoimi 2010-06-05 15:16:23 UTC
As shown in the following images you can see that microsoft wordpad can render
latin chars. with Arabic ones correctly while OOo can't

wordpad rendering:
http://i.imgur.com/rlsBG.png

OOo rendering:
http://i.imgur.com/u9GkV.png
Comment 1 mbnoimi 2010-06-05 15:17:42 UTC
Created attachment 69815 [details]
Correct rendering (see "C++" word)
Comment 2 mbnoimi 2010-06-05 15:18:48 UTC
Created attachment 69816 [details]
Wrong rendering (see "C++" word)
Comment 3 hennerdrewes 2010-06-06 06:30:29 UTC
@mbnoimi: In order to type C++ in a RTL paragraph, insert a LRM (left-to-right
mark) after the last "+". (Insert, Formatting Mark, left-to-right mark).
Comment 4 mbnoimi 2010-06-06 14:54:23 UTC
This isn't practical solution because the user have to insert this mark manually
for every Latin word in RTL document where I suppose it must be inserted
automatically just like Microsoft editors (Word or wordpad) for that I reported
this issue.
Comment 5 mbnoimi 2010-06-06 15:07:47 UTC
Any way I changed this issue to feature request because I believe this isn't a
real bug although it's very necessary for RTL documents (I'm suffering a lot
because of missing it).
Comment 6 pavel 2010-07-01 06:50:59 UTC
Eike, can you find better owner for this issue?
Comment 7 ooo 2010-07-01 14:26:48 UTC
Inserting an LRM for Latin words certainly is not a solution.. writing direction
should be determined by script, and usually is. It might be necessary though for
the string C++ because C isn't exactly a word and + characters may be weak.
However, C is not weak but LTR, and even if + was weak the correct writing
direction should result for C++, if I'm not mistaken.

@hdu: any insights?
Comment 8 hdu@apache.org 2010-07-01 14:54:11 UTC
The direction of plus-signs in the sample text is AFAIK determined by UAX#9's X3.3.4 "Resolving Neutral 
Types" rule N2 which says: "any remaining neutrals take the embedding direction". So unless they are in a 
LRM context or in a default-LTR paragraph their default-N2 direction needs to be overridden by a LR* 
mark for the desired behaviour.
Comment 9 ooo 2010-07-01 14:59:23 UTC
Isn't the embedding direction in this case LTR because the ++ follow C?
Comment 10 hdu@apache.org 2010-07-01 15:29:33 UTC
Indeed, unicode defines plus sign as ET: weak, not neutral. But since there is no european number before 
it W5 rules that it becomes ON: neutral, so N2 applies.

Several BiDi libraries agree that the way OOo is doing BiDi is correct. Do an experiment with your favorite 
webbrowser looking at this file:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML></HEAD><BODY DIR="RTL"> &#1574;C++&#1574;</BODY></HTML>

Change the RTL to LTR and look again: voilĂ !
Comment 11 hdu@apache.org 2010-07-02 09:42:48 UTC
The topic of applying UAX#9's rules to unmarked BiDi-text not being what is expected from experienced 
BiDi users comes up regularly: see issue 100737, issue 93325, issue 105623 and maybe 85360.

There is obviously some need for a more general heuristic that can automatically add BiDi-Markers to 
some unmarked BiDi-text so that the resulting UAX#9 BiDi ordering becomes DWIMmy.
Comment 12 hdu@apache.org 2010-07-02 09:46:12 UTC
s/93325/92325/g
Comment 13 Marcus 2017-05-20 11:33:36 UTC
Reset assigne to the default "issues@openoffice.apache.org".