Issue 105623 - Brackets are not handled right when a Hebrew word is bracketed in western (Dutch,English) text
Summary: Brackets are not handled right when a Hebrew word is bracketed in western (Du...
Status: CONFIRMED
Alias: None
Product: Writer
Classification: Application
Component: code (show other issues)
Version: OOo 3.1.1
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on: 112240
Blocks:
  Show dependency tree
 
Reported: 2009-10-05 18:13 UTC by pmladek
Modified: 2017-05-20 11:15 UTC (History)
8 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Test document. (7.58 KB, application/vnd.oasis.opendocument.text)
2009-10-05 18:14 UTC, pmladek
no flags Details
The same problems with arabic (10.29 KB, application/vnd.oasis.opendocument.text)
2009-10-18 07:48 UTC, cemu
no flags Details
unicode bracket handling in gedit (46.08 KB, image/jpeg)
2009-10-22 22:37 UTC, cemu
no flags Details
unicode bracket handling in gedit (404 bytes, text/plain)
2009-10-22 22:38 UTC, cemu
no flags Details
unicode bracket handling in gedit (404 bytes, text/plain)
2009-10-22 22:40 UTC, cemu
no flags Details
bracket handling in openoffice 3.1.1 (11.20 KB, application/vnd.oasis.opendocument.text)
2009-10-22 22:41 UTC, cemu
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description pmladek 2009-10-05 18:13:19 UTC
When I write a text in Dutch (or English or probably any LTR language) and I
put in the text a Hebrew word between brackets, then the brackets are not
handled right at the end of the line. When a Hebrew word (or words) between
brackets is standing at the end of the line and goes to the next line, then the
first bracket is not going with the Hebrew word to the next line, but stays
behind (all alone). It should stay with the Hebrew word.

The problem does not occur with an English word in a Hebrew text.

See also https://bugzilla.novell.com/show_bug.cgi?id=397090
Comment 1 pmladek 2009-10-05 18:14:50 UTC
Created attachment 65144 [details]
Test document.
Comment 2 eric.savary 2009-10-05 22:04:13 UTC
@HDU: please have a look.
Comment 3 hdu@apache.org 2009-10-14 14:34:48 UTC
@od: I suggest to try to keep BiDi-runs together when determining the line break position. AFAIK only 
WriterEngine has this problem, EditEngine already seems to do it properly.
Comment 4 hdu@apache.org 2009-10-14 14:40:27 UTC
update to above: the BiDi-runs for the pantheses and the RTL-word are different. They still shouldn't be 
separated. There is probably already logic for it is handled well in the non-BiDi case: the parentheses are 
kept with their content text
Comment 5 hennerdrewes 2009-10-14 15:32:33 UTC
Actually it would be nice, if the parentheses belonged to the RTL run. Issue
89825 introduced a related approach for numerals, issue 16354 for punctuation
characters. 

Especially with parentheses at the border of embedded bidi runs, we sometimes
encounter problems. Directionality problems with parentheses can usually be
fixed with LRM and RLM characters, but most users don't seem to be aware of this
option. 

Unfortunately, I don't see a straightforward way to assign correct bidi
properties to parentheses by context. But maybe it is worth discussing the
possibilities and options?

@pmladek: The problem *does* also occur with LTR word in e.g. Hebrew text. But
you need to set the paragraph direction to RTL. 

But what is interesting: The first (RTL word in LTR paragraph) case can be fixed
with by inserting RLM before the first and after the second parenthesis. If you
apply the same approach to the second case (inserting LRMs), the English word is
broken apart when placed at the end of the line.

Comment 6 hdu@apache.org 2009-10-14 16:02:27 UTC
We should stay as close as possible with the BiDi-algorithm (except for issues such as 100737) so 
changing the bracket's BiDi properties (which influences bracket mirroring) doesn't sound like such a good 
idea to me. I'll leave it to the expert users to decide on this though.

I agree that it might be a good idea to use the same font for the parentheses/brackets/braces etc. as the 
font for contained text (in this case the CTL-font for the CTL-text). The question what to do with mixed 
content or with unbalanced brackets becomes non-trivial.
Comment 7 hennerdrewes 2009-10-15 08:10:52 UTC
@hdu: Generally I agree with you on the subject of changing bidi properties. But
in the case of brackets and parentheses similar problems repetitively seem to
pop up. Therefore I feel the need to rethink the current situation once more.
Some of these thoughts don't relate directly to this issue, but I think the
broader view will also contribute to the current problem. 

Paired parentheses contain the notion of opening and closing. This is expressed
visually, but only if both parentheses are directionally interpreted in the same
way. So we mainly encounter problems in cases, where one bracket has an
unambiguous bidi context and the second one is on the boundary of bidi runs. I
think these cases could be improved by applying the unambiguous context to the
pair bracket.

The example to this issue is different, because here we have a symmetric
situation (only one direction inside the brackets). Directionally-speaking it
does not make a difference, if the brackets are assigned to the outer or inner
run. They will swap their places, but the visual result is the same.
Typographically, there is a difference: The brackets could belong to the inner
or outer script and would be displayed in the corresponding font. In any case,
as you stated before, bracket and enclosed word shouldn't be separated even if
they belong to different runs.

In the current situation the paragraph direction determines the script type of
the parentheses in the latter cases. In these paired situations, the script type
could also be determined by the enclosed script or the outer script. Each mode
of interpretation could lead to subtle differences in the visual appearance
(depending on the fonts chosen). But I think it is also a more general
(philosophical) question: Where do the brackets belong? To the outer or to the
enclosed?

Comment 8 hdu@apache.org 2009-10-15 09:39:03 UTC
Added some experts to CC to join a constructive discussion.

> I think these cases could be improved by applying the unambiguous context to the pair bracket.

I agree that the pair should have matched properties.

> Where do the brackets belong? To the outer or to the enclosed?

IMHO brackets/braces/parantheses belong to the outer text: for me they mean something like CALL 
and RET so they should belong to the calling context... the same applies to quotation marks

While we are at it we should also consider the default direction of the inner text: should it be defined 
by the outer text or by its "natural" direction or by an own flag (e.g. "bracket default direction" which 
defaults to "paragraph default direction")
Comment 9 hennerdrewes 2009-10-15 16:32:32 UTC
> I agree that the pair should have matched properties.
So here is one detail, that could be improved. 

The more I think of it, the concepts of opening and closing, inner and outer are
most valid and need to be considered (and currently they are not!!!)

> IMHO brackets/braces/parantheses belong to the outer text:
Semantically speaking I agree with you. But visually they are placed closer to
the enclosed text. Therefore I think there should be at least an option to
display them in the same style (font) as the enclosed text. Other opinions on this?

In regular writing I seldomly feel the need to force a change on the writing
direction. There may be more special cases, but currently I cannot think of any
sensible way to improve anything here in an automatized manner.
Comment 10 hdu@apache.org 2009-10-16 09:55:33 UTC
In some little tests it looks as if WriterEngine already does something like bracket pairing for roman text. If 
this is so I suggest to make that code also applicable and active for BiDi cases. This might be a reasonable 
first step.
Comment 11 hennerdrewes 2009-10-16 18:29:04 UTC
@hdu: Can you be more specific? What kind of bracket pairing is Writer doing?
Comment 12 hdu@apache.org 2009-10-17 08:18:43 UTC
Add a bracketed word like "(hello)" to a line and experiment with it. Writer will keep not break the word 
and the brackets apart, even if it is spelled e.g. "( hello)". Haven't looked at the relevant Writer code 
though.
Comment 13 hennerdrewes 2009-10-17 08:43:12 UTC
But this doesn't seem to be "pairing". 

If you delete the 2nd bracket, the result is the same. 
More strange: type "(hello )"

Result: The closing bracket doesn't stay with the word. 
But also peculiar: You can add as many spaces as you want between the opening
bracket and the word: The spaces are treated as if they were hard spaces.

Would be interesting to have a look at the code...
Comment 14 cemu 2009-10-18 07:47:41 UTC
The same problems with arabic. See attachment.
Comment 15 cemu 2009-10-18 07:48:44 UTC
Created attachment 65426 [details]
The same problems with arabic
Comment 16 cemu 2009-10-22 22:36:19 UTC
Maybe you can look to gedit (sourcecode available), this editor as a wonderful 
handling of unicode (without the need for configuration), e.g. western, arabic 
and hebrew. Please have a look to the screenshots. Brackets are also handled 
right.
Comment 17 cemu 2009-10-22 22:37:42 UTC
Created attachment 65544 [details]
unicode bracket handling in gedit
Comment 18 cemu 2009-10-22 22:38:12 UTC
Created attachment 65545 [details]
unicode bracket handling in gedit
Comment 19 cemu 2009-10-22 22:40:30 UTC
Created attachment 65546 [details]
unicode bracket handling in gedit
Comment 20 cemu 2009-10-22 22:41:43 UTC
Created attachment 65547 [details]
bracket handling in openoffice 3.1.1
Comment 21 yba 2009-10-30 07:28:21 UTC
From Mati Allouche:

a) I think that parentheses belong to the encompassing text and not to the text
included within. As proof, consider the following logical string (where upper
case represents Hebrew letters), displayed in a LTR paragraph:

   eng1 eng2 (HEB3 HEB4) eng5 eng6

If the whole string is displayed on one line, as shown below

   eng1 eng2 (4BEH 3BEH) eng5 eng6

it does not matter if the left parenthesis is an open parenthesis associated
with the English text or a closing parenthesis associated with the Hebrew text
and subject to symmetric swapping (and reversely for the right parenthesis).
But if the string is broken into 2 lines, associating the parentheses with the
encompassing text will display as

  eng1 eng2 (3BEH
  4BEH) eng5 eng6

while associating the parentheses with the inner text will display as

   eng1 eng2  3BEH)
   (4BEH eng5 eng6

I think that the first display is the preferred one.

b) The problem does not seem to be related to directionality, but to the
algorithm for determining line breaks (it might be that the algorithm considers
directional runs boundaries as allowed break points).  Why this algorithm
behaves differently for LTR and RTL text, at least when parentheses are
concerned, is part of the issue.

c) If the problem is not related to directionality, changing the Bidi properties
of parentheses is not going to fix it.

d) Changing the Bidi properties of any character to values different from
specified by Unicode is a bad idea anyway.  I hope that there is no need to
justify this statement.
Comment 22 hdu@apache.org 2009-11-05 11:32:35 UTC
Mati and Alan: Thanks for your expert comments. I agree with a, b and c. These items also confirm that 
having this issue assigned to the WriterEngine and EditEngine team is correct. For item d I agree in 
principle but I'd also point to issue 100737.
Comment 23 Oliver-Rainer Wittmann 2009-12-04 14:12:53 UTC
It is not clear due to limited resource, if this issue can be solved for OOo
3.3. To be honest I am adjusting the target.
Comment 24 kaplanlior 2011-01-30 14:32:57 UTC
Issue 112240 which blocked this issue was fixed. Can someone look on this issue
again for 3.4? Thanks.
Comment 25 Marcus 2017-05-20 11:15:44 UTC
Reset assigne to the default "issues@openoffice.apache.org".