Apache OpenOffice (AOO) Bugzilla – Issue 76869
UTF-16 iterator for 32-bit code points.
Last modified: 2008-05-30 16:51:08 UTC
Hi Stephan, For break iterator implementation in languages using not only the Unicode BMP and surrogates we need an iterator or methods at the OUString class that for UTF-16 - return a 32-bit Unicode value (code point) at a given string offset (code unit), see icu::UnicodeString::char32At(int32_t offset) http://icu-project.org/apiref/icu4c/classUnicodeString.html#9ca80740ef5199cf1809c66a4ef6ba3d - move the code unit index by delta code points like in icu::UnicodeString::moveIndex32(int32_t index, int32_t delta) http://icu-project.org/apiref/icu4c/classUnicodeString.html#a28839315561a834a4f6954ad583fa35 - according to Karl, for breakiterator purposes it would be nice to have these combined in something like sal_Unicode32 rtl::OUString::moveIndex32(const OUString& text, sal_Int32 & index, sal_Int32 increment) For a more general usable OUString iterator the methods getChar32Start, getChar32Limit of icu::UnicodeString and setIndex32, next32 and other ...32 of icu::UCharCharacterIterator would come handy, see http://icu-project.org/apiref/icu4c/classUnicodeString.html http://icu-project.org/apiref/icu4c/classUCharCharacterIterator.html Eike
.
Created attachment 44972 [details] proposed solution
See <http://www.openoffice.org/servlets/BrowseList?list=interface-discuss&by=thread&from=1753959> for a discussion. The attached SRC680m211.patch implements the solution proposed at <http://www.openoffice.org/servlets/ReadMsg?list=interface-discuss&msgNo=853>, keeping the relevant unit tests intact (sal/qa/rtl/ostring, extended sal/qa/rtl/oustring, and sal/qa/rtl/uri).
@khong: How shall we proceed? Either you have a CWS on which you need this and I add it there. Or I create a CWS for this issue and get it integrated and then you can proceed.
@sb, I have created a cws i18n31, please add the feature there. Thanks.
@khong: done
As discussed at <http://www.openoffice.org/servlets/ReadMsg?list=interface-discuss&msgNo=863>, changed postIncrementCodePoints to incrementCodePoints which is pre or post depending on sign.
Also added rtl_uString_newFromCodePoints and corresponding rtl::OUString constructor. Relevant unit tests in sal/qa/rtl/oustring/rtl_OUString2.cxx:1.9.20.4.
SBA->ER: As discussed, please verify in CWS i18n31. Reassigned to ER.
Verified presence in CWS.
Present in master, closing.