Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing |
Summary: | Move implementation of ApplyLreOrRleEmbedding() to i18npool ApplyNaturalBidiEmbedding() (was: Wrong BiDi-behavior for composed strings) | ||
---|---|---|---|
Product: | Internationalization | Reporter: | djihed <djihed> |
Component: | i18npool | Assignee: | AOO issues mailing list <issues> |
Status: | CONFIRMED --- | QA Contact: | |
Severity: | Trivial | ||
Priority: | P3 | CC: | alan, hdu, issues, okhayat, stephan.bergmann.secondary, thomas.lange, yba |
Version: | OOo 2.3.1 RC1 | ||
Target Milestone: | --- | ||
Hardware: | All | ||
OS: | All | ||
Issue Type: | TASK | Latest Confirmation in: | --- |
Developer Difficulty: | --- |
Description
djihed
2008-01-18 12:45:41 UTC
Hmm. Direct link to attachments, I don't know how to hotlink them: http://www.openoffice.org/nonav/issues/showattachment.cgi/50301/ui-rtl.gif http://www.openoffice.org/nonav/issues/showattachment.cgi/50302/dialog-rtl.gif http://www.openoffice.org/nonav/issues/showattachment.cgi/50964/2.PNG Reassigned to SBA. Adjusted the summary to the root cause of these classes of problems. Background details (from my entry in issue 78466#desc6): There is a general problem with code which just composes strings by adding other strings together. There are probably many parts of OOo where such compositions have just been designed and tested for LTR-strings. Before the individual strings get merged into the bigger composition they each look good alone, because they seem to assume "natural layout" (the first strong character defines the default direction). @sb: I suggest to add a String helper method which adds LRE/RLE...PDF (U+202A/U+202B...U+202C) pairs to strings that assume natural layout. This would help owners of string compositing code to keep the fixes for the problems simple, unless the composition itself requires reordering of parts. @er: If I understand hdu correctly, he intends to address the following problem in the following way. The problem is as follows: Consider the composition of the two strings U+05D0 HEBREW LETTER ALEF and U+0061 LATIN SMALL LETTER A U+0028 LEFT PARENTHESIS U+0062 LATIN SMALL LETTER B U+0029 RIGHT PARENTHESIS The bidirectional algorithm would erroneously render this like "(a(bA" (where "A" represents alef) rather than like "a(b)A" (i.e., it would switch back to right-to-left for the closing parenthesis after the embedded left-to-right run). This could be fixed by embedding the complete second string in U+202A LEFT-TO-RIGHT EMBEDDING ... U+202C POP DIRECTIONAL FORMATTING Whether a given string intended for such composition would need to be wrapped in LRE...PDF or RLE...PDF could be derived from the string content, e.g., by looking at the directionality of the first strong character in the string. So, hdu proposes to add a function to the sal strings that, given a string, returns the string embedded in either LRE...PDF or RLE...PDF, depending on the string's contents. Now, I would argue that such a function should not be added to the sal strings directly (as it offers functionality that is above the rudimentary semantic level at which the sal strings reside, and would need semantic character data---to determine the directionality of the first strong character in a string, see above---that is currently not available in sal), but could instead be grouped with other I18N functionality. @hdu: an alternative to LRE/PDF (which is two characters) could be to a single LRM at the end of the combined string, as described in issue 18024 @hdu (with corrected typo): an alternative to LRE/PDF (which is two characters) could be to add a single LRM at the end of the combined string, as described in issue 18024 Whether to do it with LRE/RLE..PDF or by adding a LRM/RLM marker at the end, is fine all with me. What is important is that the programmers doing string composition have a good chance to get it right. It must be as easy as possible by using some automatic string helpers. Because if these coders get it wrong all the resulting bugs usually hit one poor guy (==me) first, as seen by the countless related bugs. Whether to do it with LRE/RLE..PDF or by adding a LRM/RLM marker at the end, is all fine with me. What is important is that the programmers doing string composition have a good chance to get it right. It must be as easy as possible by using some automatic string helpers. Because if these coders get it wrong all the resulting bugs usually hit one poor guy (==me) first, as seen by the countless related bugs. Using a LRE/RLE..PDF pair probably would be better in case someone wanted to combine the result with yet another string again. See also: issue 78466, and better samples in issue 32179: http://www.openoffice.org/nonav/issues/showattachment.cgi/16816/OOo_parenthesis_bug.png http://www.openoffice.org/nonav/issues/showattachment.cgi/16819/parens-ar.png http://www.openoffice.org/nonav/issues/showattachment.cgi/16868/wrong_next_previous_button.png I can provide some method in i18npool respectively unotools/i18n, but of course that would not fix the various instances the issue incarnates. Once the interface will be provided, applications will actually have do to use it. TL->ER: Note: svtools/langtab.hxx (CWS vcl99) has now a function ApplyLtrOrRtlEmphasis that adds the emphasis characters to the string if there are none yet and the string does not consist of neutral chars only. (HDU said to look just for the first not neutral char for this.) > ApplyLtrOrRtlEmphasis
It works fine, great!
Once that method moves to a more general place in the i18n framework I suggest to rename it. E.g. to
ApplyNaturalBidiEmbedding(). Also adding the embedding markers when the string doesn't contain
neutral characters can be avoided as they aren't needed then.
TL->ER: Slight change for the mentioned function: it is now called ApplyLreOrRleEmbedding. . Reassigning to spare time account. Looking into moving the implementation of ApplyLreOrRleEmbedding() down to i18npool I encountered some obstacle. The implementation takes advantage of SvtSysLocale to obtain a cached character classification at almost no cost. That is not available in i18npool, and we certainly don't want to create new instances of XCharacterClassification for each string inspection, so would need some cached instance there as well and refactor some coding details. Nothing hard to implement, just doesn't fit into my schedule for OOo3.2, therefor retargeting to OOo3.3. Anyway, according to 'hdu' the original bug is fixed with ApplyLreOrRleEmbedding(), so I'm setting the issue type to TASK. However, best would even be if SvtSysLocale could pass the cached instances down to i18npool instead of having to create another instance there. @erack: the idea behind this method is just to find the first codepoint in the string with "strong" bidi- direction properties. Since icu is already available in i18npool and its u_charDirection() method is cheap there is probably not much need for fancy caching directly or indirectly (by tunneling the cache of all character properties cache from other modules). Splitting the method getDefaultDirection() off is probably a good idea too. @hdu: A rewrite to directly use ICU sounds indeed most promising. Thanks for the thought. Fruit flies like banana.. retargeting to OOo3.x, should actually be OOo3.4 but that target doesn't exist yet. Status update: When ICU's http://bugs.icu-project.org/trac/ticket/7772 ("Fast string direction detection") is resolved the method getDefaultDirection() could be implemented more directly using that new ICU call, if OOo's ICU baseline had it available. Reset assigne to the default "issues@openoffice.apache.org". |