Issue 47752 - Allow conversion to unicode beyond base plane
Summary: Allow conversion to unicode beyond base plane
Alias: None
Product: Internationalization
Classification: Code
Component: code (show other issues)
Version: OOo 2.0
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
Depends on:
Reported: 2005-04-19 14:30 UTC by
Modified: 2017-05-20 11:13 UTC (History)
2 users (show)

See Also:
Latest Confirmation in: ---
Developer Difficulty: ---


Note You need to log in before you can comment on or make changes to this issue.
Description 2005-04-19 14:30:20 UTC
Currently some encoding conversions to unicode from e.g. BIG5 map unicodes which
would lie outside the base plane into the base plane's private use area. When
the rest of OOo can properly handle surrogate pairs this workaround is no longer
Comment 1 Stephan Bergmann 2005-04-19 15:35:56 UTC
The encoding in question is Big5HKSCS, see $surrogates in
sal/textenc/generate/ 1.3.
Comment 2 2005-04-20 17:20:40 UTC
Maybe this should not be a compile time constant. E.g. when a Big5 document was
imported and then exported as a PUA encoded. When one wants to work with real
unicode encoding one needs a converter from "PUA unicode" version.
Comment 3 Stephan Bergmann 2005-04-21 08:00:37 UTC
The conversion from Unicode to Big5HKSCS handles both PUA and non-BMP,
regardless of $surrogates.  If you want two different conversions from Big5HKSCS
to Unicode at runtime, one using PUA, the other using non-BMP, then I think it
would be better to have two different RTL_TEXTENCODINGs for them (as the
sal/textcvt.h interface does not easily allow to make this distinction).
Comment 4 2005-05-30 10:08:08 UTC
Yes, having two different target encodings is a better idea than the compile
time option. What would it look like though? RTL_TEXTENCODING_UCS2 and
RTL_TEXTENCODING_UNICODE? In this way other encodings than BIG5 could harvest
the benefits of runtime surrogate/non-surrogate encoding too.
Comment 5 Stephan Bergmann 2005-05-31 08:38:34 UTC
Clarified offline with hdu that the last three comments (April 20, 2005 to May
30, 2005) went in a wrong direction and should be ignored.

However, when we eventually do the $surrogates switch for Big5HKSCS (so that
some Big5HKSCS then map to Unicode non-BMP instead of PUA), the following
problem probably needs to be addressed:  According to hdu, some fonts for
Big5HKSCS have their glyphs ordered according to "Unicode PUA," not "Unicode
non-BMP."  To work correctly with those fonts then, some function is needed to
map "Unicode non-BMP" to "Big5HKSCS-specific Unicode PUA."
Comment 6 Marcus 2017-05-20 11:13:22 UTC
Reset assigne to the default "".