Apache OpenOffice (AOO) Bugzilla – Issue 13089
Paste of unicode characters is not honored.
Last modified: 2013-08-07 15:00:39 UTC
1. Type the following word into gedit or kedit: smörgåsbord 2. Mark it and paste it into OpenOffice word processor with the middle mouse button. 3. The result in OpenOffice is: sm\x{00F6}rg\x{00E5}sbord I.e. OpenOffice doesn't honor unicode characters for pasting.
I confirm this, also for Linux with OOo 1.1beta2, but not for the word smörgåsbord, which works fine for me. When i try to paste a character which is beyond the usual extended ASCII characters, i can no longer paste into OOo. Another way to test this is to open kcharselect. This program lets you choose not only a character, but also how it is written to the clipboard: in a default way (current locale?), UTF-8, or HTML. (That's all in the Edit menu.) Here are the three results i get if i try to paste the ĉ character (small "c" with a circumflex "^") into OOo: default: ? (a question mark) UTF-8: Ä (upper case "A" with umlaut/diaresis) HTML: ĉ Note that the font in OOo has the required characters; i can insert the character using the character selection dialog. I think that this is a regression. I can no longer paste Esperanto text into OOo, for which i had developed special tools. I think i'll have to use another word processor for this project. :-( I can happily copy these characters between other programs such as kcharselect, Mozilla, gedit, and Kate.
I just verified that I get the behaviour that I described in OpenOffice Beta2 if LC_ALL=C On the other hand if LC_ALL is undefined, it behaves as expected.Perhaps this is a feature...
For me LC_ALL is undefined, but i still get the bug. The other variables starting with LC which are set are these: LC_COLLATE=en_US LC_CTYPE=en_US LC_MESSAGES=en_US LC_MONETARY=en_US LC_NUMERIC=en_US LC_TIME=en_US (Say, what does "LC" stand for, anyway?) I am still seeing this bug with OOo 1.1 RC3. I don't see how this would be a feature, since pasting behaves normally in other applications. I also think that this is a regression. I'm pretty sure i was able to paste these characters in 1.0.
Hmmm... A few more subtleties. Using OOo RC3 on Linux. The problem now seems to only be pasting unformatted text, whereas before the problem occurred whether or not the text was formatted, i think. So, to summarize, with RC3 the results are as follows: 1. Paste formatted text (such as in the body of a web page) with Ctrl+v or middle button -> works fine 2. Paste unformatted text (such as from a text area in a browser) with Ctrl+v or middle button -> special characters are pasted as question marks 3. Take formatted text and copy using Edit->Paste Special->Unformatted Text -> special characters are pasted as question marks I will create a small test case and attach it.
Created attachment 8563 [details] testcase, open in your browser and follow the instructions in the page
I cannot open the attachment. smörgåsbord copies into OOo ok for me. I could not find the ĉ character on my character map, and do not know what other character to pick that is not an ASCII character at the moment...OOo 1.1.0 on RH9.
Following the attached link shows the problem when the chars are copied to OOo 1.1.0
JA->PL: please have a look at this issue. reassigned
Ok, I got the attached html file open...that was operator error. The test there work ok for me. I run gnome here. The font I used was bitstream vera sans. My character codes for both mozilla and OOo are set to UTF-8. (1.1.0 on rh9 linux.)
It works for me fine too on RH Linux 7.3 with Ximian Desktop 2 and OpenOffice.org 1.1.0 CZ (My locale is set to UTF8)
I've changed the locale environment of the OOo to non UTF8 and now it doesn't work. I thought that the unformatted text in X clipboard doesn't contain any information about encoding so OOo can't know that the text in it is encoded in UTF8. But when I tried copying from mozilla with UTF8 locale to gedit with ISO8859-x locale it worked. Maybe gedit/gnome has some autodetect mechanism to detect UTF8 characters in clipboard? Then I tried to set locale on all apps incl. mozilla to ISO8859-x locale and the unicode characters were pasted as \x{XXXX} codes. So maybe OOo should properly detect these codes in clipboard and convert them to appropriate Unicode characters.
Most likely this simply means that kedit/gedit transport data as XA_STRING which has no encoding (it's essentially a byte sequence). OOo interprets this as "use the encoding of your locale" - which for C - locale or EN-US is ascii and hence cannot transport characters outside the ascii range. I'll have a look whether that is really the case here.
The reason was a different one: the funny characters come out of Xlib itself when the locale is not properly set. This happens when text is transported as COMPOUND_TEXT, which is the default for multibyte charactersets. But since kedit and gedit transport UTF8_STRING anyway (which is preferable since already converted to Unicode), OOo should choose that target instead of COMPOUND_TEXT. I changed the code so this is the case now (in CWS vcl7pp1r3). Please note that there is still the possibility for these strange conversions with applications that do not support UTF8_STRING but COMPOUND_TEXT instead. Since the conversion fails inside Xlib itself there's not much i can do about that.
pl->md: please verify in CWS vcl7pp1r3
Verfied on CWS vcl7pp1r3.
.