Issue 76869 - UTF-16 iterator for 32-bit code points.
Summary: UTF-16 iterator for 32-bit code points.
Status: CLOSED FIXED
Alias: None
Product: porting
Classification: Code
Component: code (show other issues)
Version: current
Hardware: All All
: P3 Trivial (vote)
Target Milestone: OOo 2.3
Assignee: ooo
QA Contact: issues@porting
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-05-02 17:38 UTC by ooo
Modified: 2008-05-30 16:51 UTC (History)
2 users (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
proposed solution (19.79 KB, patch)
2007-05-09 13:14 UTC, Stephan Bergmann
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this issue.
Description ooo 2007-05-02 17:38:30 UTC
Hi Stephan,

For break iterator implementation in languages using not only the
Unicode BMP and surrogates we need an iterator or methods at the
OUString class that for UTF-16

- return a 32-bit Unicode value (code point) at a given string offset
  (code unit), see icu::UnicodeString::char32At(int32_t offset)
 
http://icu-project.org/apiref/icu4c/classUnicodeString.html#9ca80740ef5199cf1809c66a4ef6ba3d

- move the code unit index by delta code points like in
  icu::UnicodeString::moveIndex32(int32_t index, int32_t delta)
 
http://icu-project.org/apiref/icu4c/classUnicodeString.html#a28839315561a834a4f6954ad583fa35

- according to Karl, for breakiterator purposes it would be nice to have
  these combined in something like
  sal_Unicode32 rtl::OUString::moveIndex32(const OUString& text, sal_Int32 &
index, sal_Int32 increment)

For a more general usable OUString iterator the methods getChar32Start,
getChar32Limit of icu::UnicodeString and setIndex32, next32 and other
...32 of icu::UCharCharacterIterator would come handy, see
http://icu-project.org/apiref/icu4c/classUnicodeString.html
http://icu-project.org/apiref/icu4c/classUCharCharacterIterator.html

  Eike
Comment 1 Stephan Bergmann 2007-05-03 08:55:47 UTC
.
Comment 2 Stephan Bergmann 2007-05-09 13:14:23 UTC
Created attachment 44972 [details]
proposed solution
Comment 3 Stephan Bergmann 2007-05-09 13:23:35 UTC
See
<http://www.openoffice.org/servlets/BrowseList?list=interface-discuss&by=thread&from=1753959>
for a discussion.  The attached SRC680m211.patch implements the solution
proposed at
<http://www.openoffice.org/servlets/ReadMsg?list=interface-discuss&msgNo=853>,
keeping the relevant unit tests intact (sal/qa/rtl/ostring, extended
sal/qa/rtl/oustring, and sal/qa/rtl/uri).
Comment 4 Stephan Bergmann 2007-05-30 14:17:22 UTC
@khong:  How shall we proceed?  Either you have a CWS on which you need this and
I add it there.  Or I create a CWS for this issue and get it integrated and then
you can proceed.
Comment 5 karl.hong 2007-05-30 21:24:51 UTC
@sb, I have created a cws i18n31, please add the feature there. Thanks.
Comment 6 Stephan Bergmann 2007-05-31 13:02:50 UTC
@khong: done
Comment 7 Stephan Bergmann 2007-06-01 12:15:58 UTC
As discussed at
<http://www.openoffice.org/servlets/ReadMsg?list=interface-discuss&msgNo=863>,
changed postIncrementCodePoints to incrementCodePoints which is pre or post
depending on sign.
Comment 8 Stephan Bergmann 2007-07-03 08:23:13 UTC
Also added rtl_uString_newFromCodePoints and corresponding rtl::OUString
constructor.  Relevant unit tests in sal/qa/rtl/oustring/rtl_OUString2.cxx:1.9.20.4.
Comment 9 stefan.baltzer 2007-07-10 13:28:15 UTC
SBA->ER: As discussed, please verify in CWS i18n31.
Reassigned to ER.
Comment 10 ooo 2007-07-10 15:40:39 UTC
Verified presence in CWS.
Comment 11 ooo 2008-05-30 16:51:08 UTC
Present in master, closing.