Issue 13494

Summary: Character processing has changed dramatically from 1.0.x
Product: Writer Reporter: nirendram <nmaharaj>
Component: codeAssignee: stefan.baltzer
Status: CLOSED FIXED QA Contact: issues@sw <issues>
Severity: Trivial    
Priority: P4 CC: issues, oooqa
Version: OOo 1.1 BetaKeywords: oooqa
Target Milestone: ---   
Hardware: PC   
OS: All   
Issue Type: DEFECT Latest Confirmation in: ---
Developer Difficulty: ---

Description nirendram 2003-04-16 07:54:51 UTC
My Original Problem
--------------------
Ctrl-LeftArrow in 1.1beta appears to treat each underscore as a word, and skips
one underscore each time.


The Revised Problem
-------------------
Character processing appears to have changed quite a bit in 1.1beta as compared
to OOo 1.0.x.


The Systems Tested
------------------
OOo 1.0.2 and 1.1beta on Win98SE.


The Test
--------
Test data: the line  xxAAxxBBxxCCxx
with the 'x' characters replaced by the characters to be tested. I performed a
Ctrl-Left and Ctrl-Right from end to end on the above line for each and every
non-alphanumeric character on my keyboard.

The characters I tested were:
~!@#$%^&*()_-+=[]{};':",./\<>?|


The Results
-----------
I divided the characters into 4 categories:
1. Characters which, when entered successively,
   are treated as one word when traversing left
   and right with Ctrl-Left / Right.

2. Characters which, when entered successively,
   are treated as one word when traversing right
   but as separate words (each character = one word)
   when traversing left.

3. Characters which are each treated as separate
   words regardless of the direction of traversal.

4. Characters which are treated as part of a word.

In OOo 1.0.2, all characters except the '@' sign fell into category 1. The '@'
sign fell into category 4.

In OOo 1.1beta, the situation was as follows:
. Category 1 - ; ' , .
. Category 2 - ! @ # % & * ( ) _ -
               { } [ ] " / \ ?
. Category 3 - ~ $ + ^ = < > |
. Category 4 - None

It appears as if there is a vast difference in the way these characters are
treated in these two versions of OOo.

My results have been confirmed by at least one person (on 644m7 / WinXP). Please
see the thread 'Character processing changes in 1.1beta' in the users list.

Surely these changes cannot be by design, as they make things quite inconvenient.
Comment 1 prgmgr 2003-07-10 20:35:43 UTC
Thank you for using and supporting OOo.

Verified in 1.1 Beta 2.

PM->HI:  Defect or enhancement?  Is this related to the changes
         in the break iterator?
Comment 2 h.ilter 2003-07-14 13:46:04 UTC
HI->FME: I've prepared a doc for this. Please load ../fme/13494.sxw
Comment 3 frank.meies 2003-07-14 13:53:44 UTC
FME->KHONG: Looks like a breakiterator issue.
Comment 4 karl.hong 2003-07-14 23:09:07 UTC
We made big change from OOo1.0 to OOo1.1, in OOo1.0.*, we wrote a 
simple word breakiterator ourselves. In OOo1.1, we changed to use ICU 
breakiterators for word/line/sentence breaks. We could not say ICU is 
perfect, but it does take a lot of things into account.

We also made some changes in ICU breakiterator to meet our need. I 
will do some investigation to see what part of test result from this 
bug is contributed from ICU, and what part is from our patch. Some 
obvious problems like in category 2 needs to be fixed.


Comment 5 karl.hong 2003-08-08 23:56:46 UTC
Problems for category 2 and 3 is fixed.

All punctuations and signs entered successively are treated as a 
single word, as described in category 1.

Only single apostrophe and period are treated as part of a word, as 
described in category 4.
Comment 6 karl.hong 2003-08-09 00:14:20 UTC
As another bug points out, period, or full stop should not be part of 
word. I remove it. Now only apostrophe is treated as part of word.
Comment 7 karl.hong 2003-09-09 23:51:13 UTC
Verified in CWS i18n08.
Comment 8 oc 2003-09-22 15:45:34 UTC
Adjusting owner
Comment 9 oc 2003-09-22 15:45:56 UTC
adjusting resolution
Comment 10 stefan.baltzer 2003-11-06 11:49:24 UTC
SBA: Changed OS to "All". Verified in CWS i18n08.
Comment 11 jack.warchold 2004-08-06 15:03:09 UTC
seen good in 680_m49-4
closing