Issue 42660

Summary: [Calc] Need feature for easy manual override of incorrect word breaking
Product: Internationalization Reporter: samphan
Component: codeAssignee: oc
Status: CLOSED FIXED QA Contact: issues@l10n <issues>
Severity: Trivial    
Priority: P3 CC: arthit, falko.tesch, hin.stone, issues, jjc, markpeak, nusorn
Version: 680m74Keywords: oooqa
Target Milestone: ---   
Hardware: All   
OS: All   
URL: http://specs.openoffice.org/g11n/word_breaking/42660_Easy_override_of_incorrect_word_breaking.odt
Issue Type: FEATURE Latest Confirmation in: ---
Developer Difficulty: ---
Issue Depends on:    
Issue Blocks: 41707    
Attachments:
Description Flags
Spec draft
none
Spec update! none

Description samphan 2005-02-13 05:19:33 UTC
Algorithms for finding word-breaks in Thai text are not 100% accurate. The
dictionary-based algorithm used by OOo's ICU based line-breaker gives poor
results when text contains words not in the dictionary, which can easily happen,
for example, with new words or with words that are
transliterations of English words.  Although this can to some extent be
alleviated with better algorithms or better dictionaries, no algorithm is likely
to be 100% accurate in the foreseeable future. It is therefore important for
there to be a easy way for users to manually override the
word-breaks that are found automatically.

Two characters in the Unicode are designed for this - "zero-width space" (ZWSP :
U+200B)  and "word joiner" (U+2060).  

8<-- From Unicode 4.0 - Chapter 15 -->8

Zero Width Space. The U+200B ZERO WIDTH SPACE indicates a word boundary, except that
it has no width. Zero-width space characters are intended to be used in
languages that have
no visible word spacing to represent word breaks, such as Thai, Khmer, or
Japanese. When
text is justified, ZWSP has no effect on letter spacing—for example, in English
or Japanese
usage.

Word Joiner. U+2060 WORD JOINER behaves like U+00A0 NO-BREAK SPACE in that it
indicates the absence of word boundaries; however, the word joiner has no width.
The function
of the character is to indicate that line breaks are not allowed between the
adjoining characters,
except next to hard line breaks.
8<----------------------------->8

So the users should be able to put a ZWSP to add a breakable position and a WJ
to prevent break at a position. I think ICU should already handle this two
characters. However, users need some way to input the two Unicode characters
into the document. For example:-

Ctrl-space = Non-breaking space (normal OOo shortcut key)
Shift-space = Zero-width space
Ctrl-shift-space = Word joiner

And this will allow the users to easily adjusting where the word-breaker break
lines, whatever lanugage the text is.
Comment 1 samphan 2005-03-01 11:33:09 UTC
Microsoft Office Word 2003 has exactly this feature. The zero-width space is
called "No-Width Optional Break". The word joiner is called "No-Width Non
Break". It can be reach from Insert->Symbol->Special Characters. There're no
default shortcut keys associate with them but you can define one.

Now in Word 2003 I can insert the 'no-width optional break' inside an English
word at the begining of a following line and the previous line will break there,
much like soft-hyphen. And I can insert the 'no-width non break' at the end of a
line to stop the line from breaking there, much like nonbreaking space.

The feature seem not to work reliably with Thai, however. Look like a MS Office
bug result from special handling of Thai text.
Comment 2 arthit 2005-04-01 23:36:08 UTC
confirmed.
Comment 3 arthit 2005-04-01 23:38:07 UTC
confirmed.
Comment 4 arthit 2005-04-01 23:43:13 UTC
a bug in Issue Tracker?

time stamps are in reverse order!

--- Additional comments from arthit Fri Apr 1 15:36:08 -0800 2005 ---
--- Additional comments from arthit Fri Apr 1 15:38:07 -0800 2005 ---

set to FIXED, and will set back to NEW
(instead of STARTED as now).
Comment 5 arthit 2005-04-01 23:43:36 UTC
reopen
Comment 6 Martin Hollmichel 2005-05-11 16:05:36 UTC
set target to 2.0.1
Comment 7 falko.tesch 2005-09-20 05:46:04 UTC
Created attachment 29699 [details]
Spec draft
Comment 8 Oliver Specht 2005-09-21 08:14:51 UTC
->FT: How about an entry with the non breaking hyphen?
Comment 9 falko.tesch 2005-09-21 09:39:24 UTC
Created attachment 29747 [details]
Spec update!
Comment 10 Oliver Specht 2005-09-22 07:45:19 UTC
Implemented in:
sw/inc/cmdid.h
sw/inc/swtypes.hxx
sw/sdi/_textsh.sdi
sw/sdi/swriter.sdi
sw/sdi/swslots.src
sw/source/ui/shells/textsh.cxx
sw/source/ui/shells/textsh1.cxx
sw/uiconfig/sglobal/menubar/menubar.xml
sw/uiconfig/sweb/menubar/menubar.xml
sw/uiconfig/swriter/menubar/menubar.xml
officecfg/registry/data/org/openoffice/Office/UI/WriterCommands.xcu
officecfg/registry/data/org/openoffice/Office/UI/GenericCommands.xcu
Comment 11 jjc 2005-09-22 12:31:52 UTC
The spec doesn't make clear which apps this feature is supposed to be
implemented for.  It is supposed to work not just for Writer but for the other
OOo applications in particular Impress.  The feature was designed so that it can
work uniformly across Writer/Impress/Calc/Draw.

From the files you mention, it looks like it's implemented just in Writer, so
I'm reopening.
Comment 12 Oliver Specht 2005-09-23 07:21:45 UTC
->FT: It looks as if you have to change the spec.
Comment 13 falko.tesch 2005-09-26 14:51:19 UTC
FT: Specification is now checked into CVS and available through _see URL in
URL-field of this issue_. Please disregard the attached early draft from now on.
Thnks.
Comment 14 falko.tesch 2005-09-27 13:31:35 UTC
FT->James: you are right, it wasn't obvious enough. I updated the spec therefore.
FT->DR: Please have a look at the spec and implement the feature in question
within Calc, thanks.
Comment 15 falko.tesch 2005-09-29 09:50:03 UTC
FT: Spec updated, please refer only to updated spec (dated 29.09.050
Comment 16 daniel.rentz 2005-10-18 11:11:19 UTC
fixed in SRC680/thaiissues
Comment 17 daniel.rentz 2005-10-31 10:21:46 UTC
back to QA

re-open issue and reassign to oc@openoffice.org
Comment 18 daniel.rentz 2005-10-31 10:21:58 UTC
reassign to oc@openoffice.org
Comment 19 daniel.rentz 2005-10-31 10:22:34 UTC
reset resolution to FIXED
Comment 20 oc 2005-10-31 11:03:43 UTC
Hi Stefan, please take over

re-open issue and reassign to sba@openoffice.org
Comment 21 oc 2005-10-31 11:03:53 UTC
reassign to sba@openoffice.org
Comment 22 oc 2005-10-31 11:04:00 UTC
reset resolution to FIXED
Comment 23 stefan.baltzer 2005-11-07 15:30:37 UTC
SBA: Reopened to reassign.
Comment 24 stefan.baltzer 2005-11-07 15:31:07 UTC
SBA: Reassigned to OC.
Comment 25 stefan.baltzer 2005-11-07 15:31:50 UTC
SBA: Resolution set back to "Fixed".
Comment 26 oc 2005-11-10 16:13:37 UTC
verified in internal build cws_thaiissues
Comment 27 oc 2005-11-21 14:45:25 UTC
closed because fix available in OOo2.0m142
Comment 28 Uwe Fischer 2005-11-29 15:44:45 UTC
created new help file text/shared/01/formatting_mark.xhp
added links from text/swriter/main0104.xhp, text/scalc/main0104.xhp,
text/sdraw/main0104.xhp, text/simpress/main0104.xhp