Issue 112240 - When ICU >=4.4 is used, automatic line break handeled incorrectly when caused by a closing parenthesis/square bracket.
Summary: When ICU >=4.4 is used, automatic line break handeled incorrectly when caused...
Status: CLOSED FIXED
Alias: None
Product: Internationalization
Classification: Code
Component: i18npool (show other issues)
Version: OOO320m19
Hardware: PC Linux, all
: P3 Trivial with 5 votes (vote)
Target Milestone: 3.4.1
Assignee: oc
QA Contact: issues@l10n
URL:
Keywords: oooqa
: 113637 (view as issue list)
Depends on: 104310
Blocks: 105623
  Show dependency tree
 
Reported: 2010-06-09 10:57 UTC by ndv
Modified: 2017-05-20 10:30 UTC (History)
6 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Sample document. (8.32 KB, application/vnd.oasis.opendocument.text)
2010-06-09 10:58 UTC, ndv
no flags Details
Sample screenshot. (51.94 KB, image/jpeg)
2010-06-09 10:59 UTC, ndv
no flags Details
here's a quick and easy fix anyway (3.22 KB, patch)
2011-01-25 15:20 UTC, caolanm
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this issue.
Description ndv 2010-06-09 10:57:03 UTC
When an automatic line break is caused by a closing parenthesis (that is, it
makes the last word too long for that line), only the parenthesis itself breaks
to the next line, while the word attached to it remains on the previous line.

Steps to reproduce:
1. Open a new Writer document.
2. Write a line that fits exactly into the current page width.
3. Enter a closing parenthesis ), causing the last word on that line to break.
4. Only the parenthesis break.

This only happens with a closing (right) parenthesis, an opening parenthesis
behaves correctly (both the parenthesis and the word break). Using a closing
square bracket results in the same problem. However, this does not happen when
using a closing curly bracket.
Comment 1 ndv 2010-06-09 10:58:19 UTC
Created attachment 69882 [details]
Sample document.
Comment 2 ndv 2010-06-09 10:59:56 UTC
Created attachment 69883 [details]
Sample screenshot.
Comment 3 ndv 2010-06-09 11:03:05 UTC
Forgot to mention: my OS is Arch Linux.
Comment 4 michael.ruess 2010-06-09 12:03:05 UTC
Cannot reproduce this with Windows and SUSE Linux. Do you work with OOo
downloaded from openoffice.org or the one provided by Arch Linux?
Comment 5 eric.savary 2010-06-09 12:06:50 UTC
Plus to MRU's question: does this happen with other fonts than "Nimbus Mono L"
Comment 6 ndv 2010-06-09 12:30:09 UTC
mru: Packages provided by Arch Linux - 'openoffice-base-beta 3.2.1_ooo320_m19'.
Package 'openoffice-base 3.2.0-3' exhibits the same behavior.

es: Happens with every font I've tried - mono, sans and serif fonts.
Comment 7 eric.savary 2010-06-09 12:43:05 UTC
Please try the OOo version from our site.
It sounds like a problem of the version of your distro.
Comment 8 ndv 2010-06-09 15:50:04 UTC
Are there any other alternatives to confirm/solve this issue?
I have downloaded the OOo version from the site, and it contains a series of RPM
files, a format with which I have very little experience (Arch Linux uses a
different packaging system). I'm not eager to toy around with it for fear of
messing up my installation. However, if you think this is the best way to
proceed I'll go ahead and try to install it anyway.

That being said, before opening this issue I posted for help on a forum (neither
OOo's nor Arch Linux's), and another Arch user using the exact same package as
myself didn't have this issue, while an Ubuntu 10.04 user reported that the
problem exists for him too (using OOo 3.2.0).
Comment 9 eric.savary 2010-06-09 16:04:25 UTC
So we "might" exclude a distro bug (?)...

@HDU: what do you think about the first description and the screenshot?
Comment 10 hdu@apache.org 2010-06-10 07:37:08 UTC
Confirming. Yes, this behaviour is independent of the platform. Handling parenthesis, brackets, braces etc. 
smarter would be a worthwhile improvement. Also see the discussion in issue 105623 on this topic.

Another thing that is worth pointing out on the topic of start-of-line or end-of-line chars is that OOo 
already has the feature "forbidden chars", but only for CJK-Layout. Use Tools->Options->Language-
>EnableAsian to enable the CJK-UI elements, then look at the tabpage Tools->Options->Language-
>AsianLayout->LastChars or FirstChars. Maybe this concept should be available for all scripts.

IMHO a solution based on smarter parenthesis/brace/bracket handling would be better compared to the 
approach used in the forbidden-chars feature.
Comment 11 eric.savary 2010-08-04 09:54:32 UTC
*** Issue 113637 has been marked as a duplicate of this issue. ***
Comment 12 beurt 2010-08-04 10:09:28 UTC
Author of the duplicate bug
http://www.openoffice.org/issues/show_bug.cgi?id=113637 I confirm issue with
OOO320m12 on Mandriva 2010.1 (font Liberation Serif), language: French.

It's a major bug: we cannot publish serious documents with such mistakes
(closing brackets alone after a line break)
Comment 13 cufalo 2010-08-04 14:26:09 UTC
I confirm this bug in Debian Sid.
OO version: 3.2.1-5
Comment 14 nowahn 2010-08-08 14:05:08 UTC
I confirm this issue in ooo 3.2.0
This is a regression since ooo 3.1.1 (working fine in this version)

This is not a file issue (a file made and saved with 3.1.1, then opended with
3.2.0 acts bad).

Note that inserting a "No-width no break" character does NOT make the
parenthesis to stay with the word left from it.

The "No-width no break" character is the one that (should) make the two adjacent
characters stay together when wrapping lines. To access it :
Tools --> Options... --> Languages
   Check "Enabled for complex text layout (CTL)"
Then it is accessible there :
Insert --> Formatting Mark --> No-width no break

As beurt says, I think this is a MAJOR regression, since nothing can be
published with this issue, and the most obvious workaround ("No-width no break"
character) does NOT work.

System tested :
ooo 3.2.0 on Mandriva 2010.1 KDE (bug here)
ooo 3.2.0 on Mandriva 2010.1 Gnome (bug here)
ooo 3.1.1 on Mandriva 2010.0 KDE (no bug here)
Tested with many different fonts.
Comment 15 simon_g 2010-10-29 17:44:58 UTC
I just had this issue with a closing parenthesis under Debian Lenny (ooo-build
3.2.1.4, Debian package 1:3.2.1-6).

However, inserting a No-width no-break special character per nowahn's
instructions solved my problem (whereas it didn't work for nowahn).
Comment 16 ralfg 2010-11-03 21:31:52 UTC
same observation here (Debian testing
). This is my version:

 1:3.2.1-7 0
        500 http://ftp.debian.de sid/main Packages

Wrong Wrapping of Parenthesis was an issue as early as in 2004, but then it was
fixed. Now it seems to re-occur.
Comment 17 artificeprime 2010-12-05 07:25:35 UTC
Another Mandriva 2010.1 user chiming in.

Version:  3.2.0.9 - OOO320m12 (Build:9483)
Packages: openoffice.org-3.2-4.1mdv2010.1
Language: English

I noticed there's been some discussion of this (same/similar/related?) issue on
the Debian bug tracker today:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=601078

This comment by Charles Plessy looks like a particularly promising nugget:

--QUOTE--

Of course, as René noted, the key difference is probably that 3.2.1-7ubuntu1
depends on libicu42 and 3.2.1-9 depends on libicu44.

I investigated the differences between the versions 4.2 and 4.4 of icu, in
particular in the directory “source/data/unidata/”. Support for parenthesis
seem to have changed while switching from Unicode 5.1 to 5.2. In particluar,
the right parenthesis and right square bracket, but not the right curly bracket
have been moved from a ‘Close_Punctuation’ (CL) to a ‘Close_Parenthesis’ (CP)
category. This completely fits with the symptoms that were also reported
to the openoffice bug tracker
(http://qa.openoffice.org/issues/show_bug.cgi?id=112240).

Do you this is enough to reassign this bug to the libicu44 package ?

-- END QUOTE --

Note that Charles links back to here.

My own ICU package is: libicu44-4.4-2mdv2010.1

Anyway, hope this might help, and this bug gets my vote.
Comment 18 hdu@apache.org 2010-12-06 07:55:17 UTC
Thanks for the info; this difference between ICU42 and ICU44 is very interesting. I'm not sure whether and 
how Writer uses the character classification for its line breaking though. If it does it should treat CP like CL 
for line breaking.
Comment 19 artificeprime 2010-12-07 07:54:00 UTC
Over at the Debian bug tracker René Engelhard recommended that Charles post his
discovery to bugs.freedesktop.org (who I then presume must be the
developers/maintainers of libicu). Charles has done so, and the place from where
to watch and hope might now be there:

https://bugs.freedesktop.org/show_bug.cgi?id=31271

Interestingly it appears that René has had his own suspicions about libicu44
since late October.

I've got my fingers crossed (though I might just revert to libicu42 in the mean
time -- barring a major dependency mess....).
Comment 20 ooo 2010-12-07 12:35:52 UTC
@hdu:
Line breaking uses character classification, see
i18npool/source/breakiterator/data/line.txt
ICU 4.2 does not know CP Close_Parenthesis class, so we can't define this before
we switch to a recent ICU.

Grabbing issue for spare time account.
Comment 21 caolanm 2011-01-25 15:20:21 UTC
Created attachment 75640 [details]
here's a quick and easy fix anyway
Comment 22 erack 2011-01-27 23:10:23 UTC
oookayyyy..

In cws locales34:

changeset f7c1450c8c30
http://hg.services.openoffice.org/cws/locales34/changeset/f7c1450c8c30
M configure
M configure.in
M i18npool/source/breakiterator/makefile.mk
M icu/icuversion.mk
M set_soenv.in

You can observe the progress and possible integration date of CWS locales34 at
http://tools.services.openoffice.org/EIS2/cws.ShowCWS?Path=DEV300%2Flocales34
Comment 23 erack 2011-02-08 22:50:21 UTC
Reassigning to QA for verification.
Comment 24 erack 2011-02-08 23:07:00 UTC
Note to QA: just verify that there is no regression with the standard build. The
additional bracket handling comes only into play if ICU >= 4.4 is used, e.g. in
a build against system ICU.
Comment 25 stefan.baltzer 2011-02-11 15:57:50 UTC
Verified in CWS locales34.