Issue 24098

Summary: i18n: API: beginOfSentence/endOfSentence
Product: App Dev Reporter: thomas.lange
Component: apiAssignee: stephan.wunderlich
Status: CLOSED FIXED QA Contact: issues@api <issues>
Severity: Trivial    
Priority: P3 CC: issues, thomas.lange
Version: 3.3.0 or older (OOo)   
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: DEFECT Latest Confirmation in: ---
Developer Difficulty: ---
Attachments:
Description Flags
Writer document with sample test macro none

Description thomas.lange 2004-01-05 11:08:34 UTC
XBreakIterator::beginOfSentence does not stay at the begin of the sentence when
started with the position that refers to the start of the current sentence but
instead advance to the start of the previous sentence. 
It should stay at the beginning.

The endOfSentence should behave similar.

TL->KHONG: Please give that bug back to me when you fixed your part since I have
to change an implementation in the sw project after that.
(Note: SwCursor::GoSentence has to adapted to the new behaviour)
Comment 1 thomas.lange 2004-01-05 11:09:26 UTC
Add myself to cc list.
Comment 2 karl.hong 2004-01-06 23:10:59 UTC
fixed in cws i18n11.

Karl->TL: do you want to put your fix in i18n11? or you will wait until it is 
integrated.
Comment 3 karl.hong 2004-01-06 23:12:54 UTC
Reassign back to TL.
Comment 4 thomas.lange 2004-02-02 14:49:08 UTC
There is still a minor inconsistency with the beginOfSentence function.
See macro:

oBreak = createUnoService( "com.sun.star.i18n.BreakIterator" )
'msgbox oBreak.dbg_methods
Dim en_US as new com.sun.star.lang.Locale
en_US.Language = "en"
en_US.Country  = "US"
aTxt = "He heard him!  That XX didn't bode well."
msgbox aTxt
nBeg = oBreak.beginOfSentence( aTxt, 13, en_US )
nEnd = oBreak.endOfSentence( aTxt, 13, en_US )
nBeg = oBreak.beginOfSentence( aTxt, 40, en_US )
nEnd = oBreak.endOfSentence( aTxt, 40, en_US )
msgbox "begin: " + nBeg
msgbox "end  : " + nEnd


When the text has only 40 charcaters (as in the macro) and the position is
placed right
after the final '.' character beginOfSentence should still return 15 since the
behaviour
is similar for index 13 even though 13 still is part of the 'regular text'.
I think it would only be consistent to have the function behave the same way
with index 40.

TL->Karl: Please return the issue to me when it is fixed. Thanks!
Comment 5 thomas.lange 2004-02-02 14:51:32 UTC
.
Comment 6 thomas.lange 2004-02-03 06:37:46 UTC
TL->Karl: I found another inconsistency.
If you add to spaces to the beginning of the sample text in the above macro and
call beginOfSentence with an index of 25 the result points right before the
"That" which is OK, but if you use 7 as index the result will be 0 where it
should have been 2.

BTW: In all other cases I have encountered the white spaces between two
sentences belonged to the first one which I found quite consistent.
Comment 7 thomas.lange 2004-02-03 06:51:39 UTC
TL: Files changed:
sw:
- swcrsr.hxx     1.9.386.1
- swcrsr.cxx     1.31.96.1
- unoobj.cxx    1.71.48.1

TL->QA: The modified classes are SwCursor and SwXTextCursor (i.e. the
XSentenceCursor interface).
Comment 8 thomas.lange 2004-02-03 06:56:15 UTC
Created attachment 12851 [details]
Writer document with sample test macro
Comment 9 thomas.lange 2004-02-03 06:57:25 UTC
Added document with debugging macro.
(Needs editing for test purposes.)
Comment 10 karl.hong 2004-02-04 02:04:31 UTC
Karl->TL: For first problem, cursor locates on the end of the string, I would 
consider it is out of boundry and will set both beginOfSentence and 
endOfSentence as -1. 
If you remove spaces after first sentence, you will see 13 is belong to next 
sentence, to make it consistence, 40 should belong to out of boundry.
Issue i24850 states same problem. 

For second problem, spaces in beginning of string, ICU includes them in first 
sentence. Sicne we skip space append to sentence, I will skip space precede to 
the sentence also.
Comment 11 thomas.lange 2004-02-06 13:23:42 UTC
Sorry for being late with this I pondered quite a while about the -1 thing for
endOfSentence and only now I found what bugs me.

Lets look at an example:
  "  First sentence. Second sentence."
If you have a position within the second sentence for example 20
wouldn't you find it quite usuals to extract a whole sentence with the 
following construct:

nStart = beginOfSentence( aTxt, 20, ...)
nEnd = endOfSentence( aTxt, 20, ...)
aSentence = aTxt.copy( nStart, nEnd - nStart )

If you return -1 this will not work always and one needs to have a workaraound
for this in the last sentence.
Therefore I think it would be unconvenient to implement it this way.
And for the same reason beginOfSentence and endOfSentence should return the same
value for an empty text or text with only whitespaces.
That is, being behind the end of the text should be a legal position to be
returned by the function and probably be safe (though not quite useful) as input
position as well.

TL->Karl: What do you think about this?
Comment 12 karl.hong 2004-02-06 19:11:58 UTC
Karl->TL: I think you misunderstand me. For the new example, beginOfSentence 
returns 18, while endOfSentence returns 34, it works for your copy.

What I meant both methods will return -1 is when you put cursor in position 34, 
which is out of text boundary.

Unlike word breakiterator, we have simple sentence breakiterator, which does not 
have searching direction, nor skipping space options. Current implementation is 
forward search and skip white space. 

Comment 13 thomas.lange 2004-02-10 10:21:21 UTC
TL->SW: My part also fixed in CWS i18n11.
Comment 14 thomas.lange 2004-02-10 10:22:04 UTC
.
Comment 15 stephan.wunderlich 2004-02-20 09:39:25 UTC
SW: looks good in cws_i18n11
Comment 16 stephan.wunderlich 2004-02-20 09:39:53 UTC
=> verified
Comment 17 stephan.wunderlich 2004-02-20 13:03:55 UTC
SW->TL: the XBreakIterator stuff seems to work as expected but somethings seems
te be damaged with traveling through the text. The following macro til now gave
me the first three sentences at the start of the first three paragraphs:

 xText = ThisComponent.getText()
 xSentenceCursor = xText.createTextCursor()
 oParCursor = xText.createTextCursor()
 
 oParCursor.gotoEndOfParagraph(true)
 xSentenceCursor.gotoRange(oParCursor.getStart(), False)

for k=1 to 3
 oParCursor.gotoNextParagraph(false)
 xSentenceCursor.gotoNextSentence(false)
 xSentenceCursor.gotoEndOfSentence(true)
 print xSentenceCursor.getString()
next

But in this cws I get three times the first sentence of the first paragraph
Comment 18 thomas.lange 2004-03-10 12:13:11 UTC
.
Comment 19 thomas.lange 2004-03-10 12:13:48 UTC
Files changed
- swcrsr.cxx  1.31.96.2
Comment 20 thomas.lange 2004-03-10 15:53:47 UTC
.
Comment 21 stephan.wunderlich 2004-03-10 16:26:53 UTC
SW: fixed in cws_i18n11
Comment 22 stephan.wunderlich 2004-03-10 16:27:28 UTC
SW: works as expected in cws_i18n11 => verified
Comment 23 stephan.wunderlich 2004-03-23 11:21:34 UTC
SW: ok in src680_m32 => closed