Issue 13089 - Paste of unicode characters is not honored.
Summary: Paste of unicode characters is not honored.
Status: CLOSED FIXED
Alias: None
Product: Internationalization
Classification: Code
Component: ui (show other issues)
Version: OOo 1.1 Beta
Hardware: PC Linux, all
: P3 Trivial (vote)
Target Milestone: ---
Assignee: mdxonefour
QA Contact: issues@l10n
URL:
Keywords: oooqa
Depends on:
Blocks:
 
Reported: 2003-04-06 16:25 UTC by Unknown
Modified: 2013-08-07 15:00 UTC (History)
4 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
testcase, open in your browser and follow the instructions in the page (3.02 KB, text/html)
2003-08-19 07:35 UTC, bulbul
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description Unknown 2003-04-06 16:25:34 UTC
1. Type the following word into gedit or kedit:

     smörgåsbord

2. Mark it and paste it into OpenOffice word processor with the middle mouse button.

3. The result in OpenOffice is:

   sm\x{00F6}rg\x{00E5}sbord

I.e. OpenOffice doesn't honor unicode characters for pasting.
Comment 1 bulbul 2003-06-01 01:22:16 UTC
I confirm this, also for Linux with OOo 1.1beta2, but not for the word
smörgåsbord, which works fine for me. When i try to paste a character
which is beyond the usual extended ASCII characters, i can no longer
paste into OOo.

Another way to test this is to open kcharselect. This program lets you
choose not only a character, but also how it is written to the
clipboard: in a default way (current locale?), UTF-8, or HTML. (That's
all in the Edit menu.) Here are the three results i get if i try to
paste the ĉ character (small "c" with a circumflex "^") into OOo:

default:   ? (a question mark)
UTF-8:     Ä (upper case "A" with umlaut/diaresis)
HTML:      ĉ

Note that the font in OOo has the required characters; i can insert
the character using the character selection dialog.

I think that this is a regression. I can no longer paste Esperanto
text into OOo, for which i had developed special tools. I think i'll
have to use another word processor for this project. :-( 

I can happily copy these characters between other programs such as
kcharselect, Mozilla, gedit, and Kate.
Comment 2 dov 2003-06-26 14:37:34 UTC
I just verified that I get the behaviour that I described in
OpenOffice Beta2 if 

  LC_ALL=C

On the other hand if LC_ALL is undefined, it behaves as
expected.Perhaps this is a feature...
Comment 3 bulbul 2003-08-16 23:35:40 UTC
For me LC_ALL is undefined, but i still get the bug. The other
variables starting with LC which are set are these:

   LC_COLLATE=en_US
   LC_CTYPE=en_US
   LC_MESSAGES=en_US
   LC_MONETARY=en_US
   LC_NUMERIC=en_US
   LC_TIME=en_US

(Say, what does "LC" stand for, anyway?)

I am still seeing this bug with OOo 1.1 RC3.

I don't see how this would be a feature, since pasting behaves
normally in other applications. I also think that this is a
regression. I'm pretty sure i was able to paste these characters in 1.0.
Comment 4 bulbul 2003-08-19 06:47:25 UTC
Hmmm... A few more subtleties. Using OOo RC3 on Linux. The problem now
seems to only be pasting unformatted text, whereas before the problem
occurred whether or not the text was formatted, i think. So, to
summarize, with RC3 the results are as follows:

1. Paste formatted text (such as in the body of a web page) 
   with Ctrl+v or middle button
     -> works fine
2. Paste unformatted text (such as from a text area in a browser)
   with Ctrl+v or middle button
     -> special characters are pasted as question marks
3. Take formatted text and copy using 
   Edit->Paste Special->Unformatted Text
     -> special characters are pasted as question marks

I will create a small test case and attach it.
Comment 5 bulbul 2003-08-19 07:35:03 UTC
Created attachment 8563 [details]
testcase, open in your browser and follow the instructions in the page
Comment 6 diane 2003-10-30 21:01:23 UTC
I cannot open the attachment. smörgåsbord copies into OOo ok for me. I
could not find the &#265 character on my character map, and do not
know what other character to pick that is not an ASCII character at
the moment...OOo 1.1.0 on RH9.
Comment 7 con.hennessy 2003-10-30 21:15:00 UTC
Following the attached link shows the problem when the chars are copied to OOo 1.1.0 
Comment 8 Joost Andrae 2003-10-30 21:20:44 UTC
JA->PL: please have a look at this issue. reassigned
Comment 9 diane 2003-10-30 21:33:02 UTC
Ok, I got the attached html file open...that was operator error. The
test there work ok for me. I run gnome here. The font I used was
bitstream vera sans. My character codes for both mozilla and OOo are
set to UTF-8. (1.1.0 on rh9 linux.)
Comment 10 t8m 2003-10-30 22:30:21 UTC
It works for me fine too on RH Linux 7.3 with Ximian Desktop 2 and
OpenOffice.org 1.1.0 CZ

(My locale is set to UTF8)
Comment 11 t8m 2003-10-30 22:51:31 UTC
I've changed the locale environment of the OOo to non UTF8 and now it
doesn't work. 

I thought that the unformatted text in X clipboard doesn't contain any
information about encoding so OOo can't know that the text in it is
encoded in UTF8.

But when I tried copying from mozilla with UTF8 locale to gedit with
ISO8859-x locale it worked. Maybe gedit/gnome has some autodetect
mechanism to detect UTF8 characters in clipboard?

Then I tried to set locale on all apps incl. mozilla to ISO8859-x
locale and the unicode characters were pasted as \x{XXXX} codes.
So maybe OOo should properly detect these codes in clipboard and
convert them to appropriate Unicode characters.
Comment 12 philipp.lohmann 2003-10-31 09:45:38 UTC
Most likely this simply means that kedit/gedit transport data as
XA_STRING which has no encoding (it's essentially a byte sequence).
OOo interprets this as "use the encoding of your locale" - which for C
- locale or EN-US is ascii and hence cannot transport characters
outside the ascii range.

I'll have a look whether that is really the case here.
Comment 13 philipp.lohmann 2003-11-10 17:08:18 UTC
The reason was a different one: the funny characters come out of Xlib
itself when the locale is not properly set. This happens when text is
transported as COMPOUND_TEXT, which is the default for multibyte
charactersets. But since kedit and gedit transport UTF8_STRING anyway
(which is preferable since already converted to Unicode), OOo should
choose that target instead of COMPOUND_TEXT. I changed the code so
this is the case now (in CWS vcl7pp1r3).

Please note that there is still the possibility for these strange
conversions with applications that do not support UTF8_STRING but
COMPOUND_TEXT instead. Since the conversion fails inside Xlib itself
there's not much i can do about that.
Comment 14 philipp.lohmann 2003-11-13 13:37:04 UTC
pl->md: please verify in CWS vcl7pp1r3
Comment 15 mdxonefour 2003-11-14 11:10:29 UTC
Verfied on CWS vcl7pp1r3.
Comment 16 mdxonefour 2003-11-14 11:12:44 UTC
.
Comment 17 mdxonefour 2003-11-14 11:13:28 UTC
.
Comment 18 mdxonefour 2004-01-30 12:18:53 UTC
.