Issue 112144 - l10n: Insert LRM automatically for Latin words in RTL mode
Summary: l10n: Insert LRM automatically for Latin words in RTL mode
Alias: None
Product: Internationalization
Classification: Code
Component: code (show other issues)
Version: OOo 3.2.1 RC2
Hardware: All All
: P2 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
Depends on:
Reported: 2010-06-05 15:16 UTC by mbnoimi
Modified: 2017-05-20 11:33 UTC (History)
2 users (show)

See Also:
Issue Type: FEATURE
Latest Confirmation in: ---
Developer Difficulty: ---

Correct rendering (see "C++" word) (559.10 KB, text/plain)
2010-06-05 15:17 UTC, mbnoimi
no flags Details
Wrong rendering (see "C++" word) (622.21 KB, text/plain)
2010-06-05 15:18 UTC, mbnoimi
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description mbnoimi 2010-06-05 15:16:23 UTC
As shown in the following images you can see that microsoft wordpad can render
latin chars. with Arabic ones correctly while OOo can't

wordpad rendering:

OOo rendering:
Comment 1 mbnoimi 2010-06-05 15:17:42 UTC
Created attachment 69815 [details]
Correct rendering (see "C++" word)
Comment 2 mbnoimi 2010-06-05 15:18:48 UTC
Created attachment 69816 [details]
Wrong rendering (see "C++" word)
Comment 3 hennerdrewes 2010-06-06 06:30:29 UTC
@mbnoimi: In order to type C++ in a RTL paragraph, insert a LRM (left-to-right
mark) after the last "+". (Insert, Formatting Mark, left-to-right mark).
Comment 4 mbnoimi 2010-06-06 14:54:23 UTC
This isn't practical solution because the user have to insert this mark manually
for every Latin word in RTL document where I suppose it must be inserted
automatically just like Microsoft editors (Word or wordpad) for that I reported
this issue.
Comment 5 mbnoimi 2010-06-06 15:07:47 UTC
Any way I changed this issue to feature request because I believe this isn't a
real bug although it's very necessary for RTL documents (I'm suffering a lot
because of missing it).
Comment 6 pavel 2010-07-01 06:50:59 UTC
Eike, can you find better owner for this issue?
Comment 7 ooo 2010-07-01 14:26:48 UTC
Inserting an LRM for Latin words certainly is not a solution.. writing direction
should be determined by script, and usually is. It might be necessary though for
the string C++ because C isn't exactly a word and + characters may be weak.
However, C is not weak but LTR, and even if + was weak the correct writing
direction should result for C++, if I'm not mistaken.

@hdu: any insights?
Comment 8 2010-07-01 14:54:11 UTC
The direction of plus-signs in the sample text is AFAIK determined by UAX#9's X3.3.4 "Resolving Neutral 
Types" rule N2 which says: "any remaining neutrals take the embedding direction". So unless they are in a 
LRM context or in a default-LTR paragraph their default-N2 direction needs to be overridden by a LR* 
mark for the desired behaviour.
Comment 9 ooo 2010-07-01 14:59:23 UTC
Isn't the embedding direction in this case LTR because the ++ follow C?
Comment 10 2010-07-01 15:29:33 UTC
Indeed, unicode defines plus sign as ET: weak, not neutral. But since there is no european number before 
it W5 rules that it becomes ON: neutral, so N2 applies.

Several BiDi libraries agree that the way OOo is doing BiDi is correct. Do an experiment with your favorite 
webbrowser looking at this file:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML></HEAD><BODY DIR="RTL"> &#1574;C++&#1574;</BODY></HTML>

Change the RTL to LTR and look again: voilà!
Comment 11 2010-07-02 09:42:48 UTC
The topic of applying UAX#9's rules to unmarked BiDi-text not being what is expected from experienced 
BiDi users comes up regularly: see issue 100737, issue 93325, issue 105623 and maybe 85360.

There is obviously some need for a more general heuristic that can automatically add BiDi-Markers to 
some unmarked BiDi-text so that the resulting UAX#9 BiDi ordering becomes DWIMmy.
Comment 12 2010-07-02 09:46:12 UTC
Comment 13 Marcus 2017-05-20 11:33:36 UTC
Reset assigne to the default "".