Issue 11993 - XBreakIterator::getCurrentWord with DICTIONARY_WORD flag
Summary: XBreakIterator::getCurrentWord with DICTIONARY_WORD flag
Alias: None
Product: App Dev
Classification: Unclassified
Component: api (show other issues)
Version: 3.3.0 or older (OOo)
Hardware: All All
: P3 Trivial
Target Milestone: ---
Assignee: steffen.grund
QA Contact: issues@api
Depends on:
Blocks: 3117
  Show dependency tree
Reported: 2003-03-03 14:50 UTC by thomas.lange
Modified: 2013-02-24 21:07 UTC (History)
1 user (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Note You need to log in before you can comment on or make changes to this issue.
Description thomas.lange 2003-03-03 14:50:12 UTC
When using the breakiterator in DICTIONARY_WORD mode and I have a text 
like "abcd ef  ghi??? KLM", the results should be:

a) if the cursor (index) is placed right after the "d" 
  isBeginWord: false
  isEndWord  : true
  getCurrentWord : 0, 4    but returned is: 5, 7

b) if it is placed amidst the two spaces between "f" and "g":
  isBeginWord: false
  isEndWord  : false
  getCurrentWord : 8, 8    but returned is: 9, 12

c) if it is placed right after the "i":
  isBeginWord: false
  isEndWord  : true
  getCurrentWord : 9, 12   but returned is: 16, 9

d) if it is placed right before the "K":
  isBeginWord: true
  isEndWord  : false
  getCurrentWord : 16, 9
This case is OK.

Note: getWordBoundary is always being called with last argument set to TRUE.

To put it in somewhat other words, comparing it to isBeginWord and isEndWord 
the results should be:

isBeginWord   isEndWord   result
  false         false       x, x   where x is the current cursor/index position
  false         true        the boundary of the word before the cursor
  true          false       the boundary of the word following the cursor
  true          true        the word following the cursor if the last function
                            argument is TRUE,
                            the word before the cursor if it is false

The fix of #i3117# depends on this being fixed.
Comment 1 thomas.lange 2003-03-03 15:02:44 UTC
TL->Karl: Seems to be your issue.
Comment 2 thomas.lange 2003-03-20 13:02:16 UTC
The case
isBeginWord   isEndWord   result
  false         false       x, x   where x is the current  
                                   cursor/index position

Should have been:
  false         false       x, x   where x is the current position 
                                   if the cursor is not within
                                   a dictionary word, otherwise
                                   the boundaries of the word the 
                                   cursor is located in.

Comment 3 karl.hong 2003-04-05 01:17:09 UTC
I have fixed case a) c) and d). For b), according to the internal bug 
106385, DICTIONARY_WORD should skip spaces. getWordBoundary will find 
previous or next word boundary according to last function argument.

It is impossible to have isBeginWord and isEndWord both TRUE.

If whole text does not have word, ex. only contains space and 
punctuation, getWordBoundary will return startPos and endPos as 
nStartPos, and isBeginWord and isEndWord as FALSE.
Comment 4 frank 2003-04-23 14:57:56 UTC
Hi Michael,

please set this one to verified if 3117 is found fixed.


Comment 5 michael.ruess 2003-04-23 15:53:31 UTC
Set to "Verified" in agreement with FME
Comment 6 michael.ruess 2003-04-23 15:53:45 UTC
Comment 7 michael.ruess 2003-04-24 14:07:29 UTC
The fix caused problems while importing HTML tables (see internal task
#109082), so it has been taken back from OO 1.1 beta2.
Comment 8 michael.ruess 2003-04-24 14:08:42 UTC
Reassigned to Karl, to newly fix it in OO 1.1 final.
Comment 9 stefan.baltzer 2003-04-24 15:30:24 UTC
SBA: I talked to FME and the fix for the core issue 3117 (Ctrl+F7 does
not call Thesaurus if the cursor is right behind a word) will be done
within Writer and not by the breakiterator.
Reassigned to Thomas.
Comment 10 thomas.lange 2003-05-19 15:41:56 UTC
TL: As dicussed with MI I will change the target to OO 2.0 in order to
have time to discuss this and it' impact in more detail.
Comment 11 karl.hong 2003-05-27 19:55:02 UTC
*** Issue 14904 has been marked as a duplicate of this issue. ***
Comment 12 ingenstans 2003-05-27 21:06:03 UTC
Well, if it is really goig to remai unfixed until 2.0 (and I don't 
understand the earlier reasoning, but did file the issue just marked 
as a duplicate) could someone please explaine what are the 
circumstances when this bug is triggered? This is because I'll need to 
write a workaround into my case changing macro if this is not going to 
be fixed until 2.0

Comment 13 thomas.lange 2003-05-28 07:51:28 UTC
TL-Karl: Could you answer the question?

Comment 14 karl.hong 2003-05-28 08:43:44 UTC
Karl: In current implementation, breakiterator only ignores space 
when it tries to find word boundary. Puctuations is counted as word. 
When you have something like "word", first word is '"', its boundary 
is 0,1. and second is 'word' and boundary is 1,5. You could see first 
word's end is the second word's start, overlapped. When you put 
cursor in any position of 'word', first call goToStartOfWord move 
cursor to word's start, in second call goToEndOfWord, which call 
getWordBoundary with direction as backwords, meaning you want 
previous word, it returns first word's boundary. Now you get your 
selection's boundary as second word's start and first word's end, in 
this case, both of them are 1, you get nothing, and insert your new 
word in position 1. 

Comment 15 karl.hong 2003-05-28 18:53:58 UTC
Correction: in second call goToEndOfWord, it is not because it calls 
getWordBoundary with backwards direction, it is because it call 
isEndWord first, since two words are overlapped, second word's start 
is first word's end, isEndWord return true and you get both start and 
end of selection are 1.
Comment 16 karl.hong 2003-08-08 19:58:04 UTC
Fixed in CWS i18n08
Comment 17 karl.hong 2003-09-10 00:17:54 UTC
Verified in CWS i18n08.
Comment 18 oc 2003-09-22 15:44:00 UTC
Adjusting owner
Comment 19 oc 2003-09-22 15:44:33 UTC
Adjusting resolution
Comment 20 stefan.baltzer 2003-10-22 16:46:02 UTC
SBA->SW: Please have a look.
Comment 21 stefan.baltzer 2003-10-22 16:47:02 UTC
Comment 22 stephan.wunderlich 2003-10-22 16:56:16 UTC
SW->SG: please verify this in i18n08, which can be found on cwsserv03
Comment 23 steffen.grund 2003-10-30 12:46:47 UTC
fix works with getCurrentWord, but did change the behaviour of
isBeginWord and isEndWord functions.

This bug is verfied nevertheless, isBeginWord and isEndWord are
handled in #i21907.
Comment 24 steffen.grund 2003-10-30 12:49:42 UTC
set to verified.
Comment 25 ingenstans 2004-03-28 11:49:19 UTC
what is happening about this? It is still broken in 680_m32.
Comment 26 steffen.grund 2004-04-01 13:01:41 UTC
andrewb is indeed right: this does not work in src680_m32, the behaviour is
again like described in the bug. So the bug goes back to Karl.
Comment 27 steffen.grund 2004-04-01 13:03:43 UTC
cleared resolution.
Comment 28 karl.hong 2004-06-02 06:00:29 UTC
This regression is caused by bug fix for 112021. 

When cursor is on the end of a word, but not the begin of another word, no matter 
getWordBoundary is searching forwards or backwords, it should return boundary 
of the word.

Refix in cws i18n13.
Comment 29 karl.hong 2004-06-02 06:00:58 UTC
Comment 30 karl.hong 2004-06-21 18:48:37 UTC
reopen the issue for reassigning to QA.
Comment 31 karl.hong 2004-06-21 18:49:30 UTC
Comment 32 steffen.grund 2004-07-19 11:33:08 UTC
Comment 33 steffen.grund 2004-07-19 11:34:02 UTC
Comment 34 steffen.grund 2004-07-20 11:47:42 UTC
Comment 35 steffen.grund 2004-07-20 11:48:09 UTC
checked on Solaris and Windwos, works -> verified.
Comment 36 steffen.grund 2004-07-23 13:18:35 UTC
Checked on Solaris and Windows again, worked -> clsed.