Apache OpenOffice (AOO) Bugzilla – Issue 24098
i18n: API: beginOfSentence/endOfSentence
Last modified: 2013-02-24 21:07:49 UTC
XBreakIterator::beginOfSentence does not stay at the begin of the sentence when started with the position that refers to the start of the current sentence but instead advance to the start of the previous sentence. It should stay at the beginning. The endOfSentence should behave similar. TL->KHONG: Please give that bug back to me when you fixed your part since I have to change an implementation in the sw project after that. (Note: SwCursor::GoSentence has to adapted to the new behaviour)
Add myself to cc list.
fixed in cws i18n11. Karl->TL: do you want to put your fix in i18n11? or you will wait until it is integrated.
Reassign back to TL.
There is still a minor inconsistency with the beginOfSentence function. See macro: oBreak = createUnoService( "com.sun.star.i18n.BreakIterator" ) 'msgbox oBreak.dbg_methods Dim en_US as new com.sun.star.lang.Locale en_US.Language = "en" en_US.Country = "US" aTxt = "He heard him! That XX didn't bode well." msgbox aTxt nBeg = oBreak.beginOfSentence( aTxt, 13, en_US ) nEnd = oBreak.endOfSentence( aTxt, 13, en_US ) nBeg = oBreak.beginOfSentence( aTxt, 40, en_US ) nEnd = oBreak.endOfSentence( aTxt, 40, en_US ) msgbox "begin: " + nBeg msgbox "end : " + nEnd When the text has only 40 charcaters (as in the macro) and the position is placed right after the final '.' character beginOfSentence should still return 15 since the behaviour is similar for index 13 even though 13 still is part of the 'regular text'. I think it would only be consistent to have the function behave the same way with index 40. TL->Karl: Please return the issue to me when it is fixed. Thanks!
.
TL->Karl: I found another inconsistency. If you add to spaces to the beginning of the sample text in the above macro and call beginOfSentence with an index of 25 the result points right before the "That" which is OK, but if you use 7 as index the result will be 0 where it should have been 2. BTW: In all other cases I have encountered the white spaces between two sentences belonged to the first one which I found quite consistent.
TL: Files changed: sw: - swcrsr.hxx 1.9.386.1 - swcrsr.cxx 1.31.96.1 - unoobj.cxx 1.71.48.1 TL->QA: The modified classes are SwCursor and SwXTextCursor (i.e. the XSentenceCursor interface).
Created attachment 12851 [details] Writer document with sample test macro
Added document with debugging macro. (Needs editing for test purposes.)
Karl->TL: For first problem, cursor locates on the end of the string, I would consider it is out of boundry and will set both beginOfSentence and endOfSentence as -1. If you remove spaces after first sentence, you will see 13 is belong to next sentence, to make it consistence, 40 should belong to out of boundry. Issue i24850 states same problem. For second problem, spaces in beginning of string, ICU includes them in first sentence. Sicne we skip space append to sentence, I will skip space precede to the sentence also.
Sorry for being late with this I pondered quite a while about the -1 thing for endOfSentence and only now I found what bugs me. Lets look at an example: " First sentence. Second sentence." If you have a position within the second sentence for example 20 wouldn't you find it quite usuals to extract a whole sentence with the following construct: nStart = beginOfSentence( aTxt, 20, ...) nEnd = endOfSentence( aTxt, 20, ...) aSentence = aTxt.copy( nStart, nEnd - nStart ) If you return -1 this will not work always and one needs to have a workaraound for this in the last sentence. Therefore I think it would be unconvenient to implement it this way. And for the same reason beginOfSentence and endOfSentence should return the same value for an empty text or text with only whitespaces. That is, being behind the end of the text should be a legal position to be returned by the function and probably be safe (though not quite useful) as input position as well. TL->Karl: What do you think about this?
Karl->TL: I think you misunderstand me. For the new example, beginOfSentence returns 18, while endOfSentence returns 34, it works for your copy. What I meant both methods will return -1 is when you put cursor in position 34, which is out of text boundary. Unlike word breakiterator, we have simple sentence breakiterator, which does not have searching direction, nor skipping space options. Current implementation is forward search and skip white space.
TL->SW: My part also fixed in CWS i18n11.
SW: looks good in cws_i18n11
=> verified
SW->TL: the XBreakIterator stuff seems to work as expected but somethings seems te be damaged with traveling through the text. The following macro til now gave me the first three sentences at the start of the first three paragraphs: xText = ThisComponent.getText() xSentenceCursor = xText.createTextCursor() oParCursor = xText.createTextCursor() oParCursor.gotoEndOfParagraph(true) xSentenceCursor.gotoRange(oParCursor.getStart(), False) for k=1 to 3 oParCursor.gotoNextParagraph(false) xSentenceCursor.gotoNextSentence(false) xSentenceCursor.gotoEndOfSentence(true) print xSentenceCursor.getString() next But in this cws I get three times the first sentence of the first paragraph
Files changed - swcrsr.cxx 1.31.96.2
SW: fixed in cws_i18n11
SW: works as expected in cws_i18n11 => verified
SW: ok in src680_m32 => closed