Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing
|Summary:||Move implementation of ApplyLreOrRleEmbedding() to i18npool ApplyNaturalBidiEmbedding() (was: Wrong BiDi-behavior for composed strings)|
|Component:||i18npool||Assignee:||AOO issues mailing list <issues>|
|Status:||CONFIRMED ---||QA Contact:|
|Priority:||P3||CC:||alan, hdu, issues, okhayat, stephan.bergmann.secondary, thomas.lange, yba|
|Version:||OOo 2.3.1 RC1|
|Issue Type:||TASK||Latest Confirmation in:||---|
Description djihed 2008-01-18 12:45:41 UTC
For RTL languages such as Arabic and Hebrew, OpenOffice in general fails to correctly direct the interface because it doesn't correctly implement the BiDi algorithm. http://unicode.org/reports/tr9/ In particular, neutral characters such as the dot "." and parentheses are not correctly displayed. In issue #84553. A quick solution was to set the BiDi preferences of the widget in question. This is more of a hack and needs to be done in numerous places anywhere text appears. Forcing translators to use RLE or RLM should not be needed either. Please have a look at attachments for illustrations: attachment #50301 [details] attachment #50302 [details] attachment #50964 [details] (I filed this here because I have been referred to the EditEngine which is responsible for this and which belongs here).
Comment 1 djihed 2008-01-18 12:46:51 UTC
Hmm. Direct link to attachments, I don't know how to hotlink them: http://www.openoffice.org/nonav/issues/showattachment.cgi/50301/ui-rtl.gif http://www.openoffice.org/nonav/issues/showattachment.cgi/50302/dialog-rtl.gif http://www.openoffice.org/nonav/issues/showattachment.cgi/50964/2.PNG
Comment 2 michael.ruess 2008-01-18 14:41:37 UTC
Reassigned to SBA.
Comment 3 firstname.lastname@example.org 2009-01-23 16:20:31 UTC
Adjusted the summary to the root cause of these classes of problems. Background details (from my entry in issue 78466#desc6): There is a general problem with code which just composes strings by adding other strings together. There are probably many parts of OOo where such compositions have just been designed and tested for LTR-strings. Before the individual strings get merged into the bigger composition they each look good alone, because they seem to assume "natural layout" (the first strong character defines the default direction). @sb: I suggest to add a String helper method which adds LRE/RLE...PDF (U+202A/U+202B...U+202C) pairs to strings that assume natural layout. This would help owners of string compositing code to keep the fixes for the problems simple, unless the composition itself requires reordering of parts.
Comment 4 Stephan Bergmann 2009-01-26 12:49:50 UTC
@er: If I understand hdu correctly, he intends to address the following problem in the following way. The problem is as follows: Consider the composition of the two strings U+05D0 HEBREW LETTER ALEF and U+0061 LATIN SMALL LETTER A U+0028 LEFT PARENTHESIS U+0062 LATIN SMALL LETTER B U+0029 RIGHT PARENTHESIS The bidirectional algorithm would erroneously render this like "(a(bA" (where "A" represents alef) rather than like "a(b)A" (i.e., it would switch back to right-to-left for the closing parenthesis after the embedded left-to-right run). This could be fixed by embedding the complete second string in U+202A LEFT-TO-RIGHT EMBEDDING ... U+202C POP DIRECTIONAL FORMATTING Whether a given string intended for such composition would need to be wrapped in LRE...PDF or RLE...PDF could be derived from the string content, e.g., by looking at the directionality of the first strong character in the string. So, hdu proposes to add a function to the sal strings that, given a string, returns the string embedded in either LRE...PDF or RLE...PDF, depending on the string's contents. Now, I would argue that such a function should not be added to the sal strings directly (as it offers functionality that is above the rudimentary semantic level at which the sal strings reside, and would need semantic character data---to determine the directionality of the first strong character in a string, see above---that is currently not available in sal), but could instead be grouped with other I18N functionality.
Comment 5 alan 2009-01-26 13:02:22 UTC
@hdu: an alternative to LRE/PDF (which is two characters) could be to a single LRM at the end of the combined string, as described in issue 18024
Comment 6 alan 2009-01-26 13:04:42 UTC
@hdu (with corrected typo): an alternative to LRE/PDF (which is two characters) could be to add a single LRM at the end of the combined string, as described in issue 18024
Comment 7 email@example.com 2009-01-26 13:41:11 UTC
Whether to do it with LRE/RLE..PDF or by adding a LRM/RLM marker at the end, is fine all with me. What is important is that the programmers doing string composition have a good chance to get it right. It must be as easy as possible by using some automatic string helpers. Because if these coders get it wrong all the resulting bugs usually hit one poor guy (==me) first, as seen by the countless related bugs.
Comment 8 firstname.lastname@example.org 2009-01-26 13:41:14 UTC
Whether to do it with LRE/RLE..PDF or by adding a LRM/RLM marker at the end, is all fine with me. What is important is that the programmers doing string composition have a good chance to get it right. It must be as easy as possible by using some automatic string helpers. Because if these coders get it wrong all the resulting bugs usually hit one poor guy (==me) first, as seen by the countless related bugs.
Comment 9 ooo 2009-01-27 15:35:19 UTC
Using a LRE/RLE..PDF pair probably would be better in case someone wanted to combine the result with yet another string again. See also: issue 78466, and better samples in issue 32179: http://www.openoffice.org/nonav/issues/showattachment.cgi/16816/OOo_parenthesis_bug.png http://www.openoffice.org/nonav/issues/showattachment.cgi/16819/parens-ar.png http://www.openoffice.org/nonav/issues/showattachment.cgi/16868/wrong_next_previous_button.png I can provide some method in i18npool respectively unotools/i18n, but of course that would not fix the various instances the issue incarnates. Once the interface will be provided, applications will actually have do to use it.
Comment 10 thomas.lange 2009-01-28 08:07:53 UTC
TL->ER: Note: svtools/langtab.hxx (CWS vcl99) has now a function ApplyLtrOrRtlEmphasis that adds the emphasis characters to the string if there are none yet and the string does not consist of neutral chars only. (HDU said to look just for the first not neutral char for this.)
Comment 11 email@example.com 2009-01-28 09:05:29 UTC
> ApplyLtrOrRtlEmphasis It works fine, great! Once that method moves to a more general place in the i18n framework I suggest to rename it. E.g. to ApplyNaturalBidiEmbedding(). Also adding the embedding markers when the string doesn't contain neutral characters can be avoided as they aren't needed then.
Comment 12 thomas.lange 2009-01-28 10:49:18 UTC
TL->ER: Slight change for the mentioned function: it is now called ApplyLreOrRleEmbedding.
Comment 13 thomas.lange 2009-01-28 10:51:30 UTC
Comment 14 erack 2009-03-23 22:13:20 UTC
Reassigning to spare time account.
Comment 15 erack 2009-08-19 23:09:42 UTC
Looking into moving the implementation of ApplyLreOrRleEmbedding() down to i18npool I encountered some obstacle. The implementation takes advantage of SvtSysLocale to obtain a cached character classification at almost no cost. That is not available in i18npool, and we certainly don't want to create new instances of XCharacterClassification for each string inspection, so would need some cached instance there as well and refactor some coding details. Nothing hard to implement, just doesn't fit into my schedule for OOo3.2, therefor retargeting to OOo3.3. Anyway, according to 'hdu' the original bug is fixed with ApplyLreOrRleEmbedding(), so I'm setting the issue type to TASK. However, best would even be if SvtSysLocale could pass the cached instances down to i18npool instead of having to create another instance there.
Comment 16 firstname.lastname@example.org 2009-08-20 07:08:50 UTC
@erack: the idea behind this method is just to find the first codepoint in the string with "strong" bidi- direction properties. Since icu is already available in i18npool and its u_charDirection() method is cheap there is probably not much need for fancy caching directly or indirectly (by tunneling the cache of all character properties cache from other modules).
Comment 17 email@example.com 2009-08-20 07:48:43 UTC
Splitting the method getDefaultDirection() off is probably a good idea too.
Comment 18 erack 2009-08-22 20:41:13 UTC
@hdu: A rewrite to directly use ICU sounds indeed most promising. Thanks for the thought.
Comment 19 erack 2010-02-24 19:53:06 UTC
Fruit flies like banana.. retargeting to OOo3.x, should actually be OOo3.4 but that target doesn't exist yet.
Comment 20 firstname.lastname@example.org 2010-06-30 09:53:05 UTC
Status update: When ICU's http://bugs.icu-project.org/trac/ticket/7772 ("Fast string direction detection") is resolved the method getDefaultDirection() could be implemented more directly using that new ICU call, if OOo's ICU baseline had it available.