Apache OpenOffice (AOO) Bugzilla – Issue 19313
Scandinavian characters (æøå) broken in some situations.
Last modified: 2013-08-07 14:43:45 UTC
Using the latest OOo builds, I noticed that the special Scandinavian characters Æ, Ø, Å, æ, ø, å are broken on import from Word documents. Also copying text from OOo to other applications (generic Windows RTF control), the letters are broken in the same pattern. Sample text here - the correct text first, after the second 'OBS:' comes a sample of the broken text. I'll leave it to others to assign a priority to this. The problem is consistent, happens at every import / cut&paste operation. Sample text: OBS: Installasjonen består av to deler. Først må du installere selve skriverdriveren. Det gjør du ved å klikke p? knappen Installer på denne siden. Deretter klikker du p? knappen Ekstra filer for å få installert de underliggende rutinene. Du skal ikke velge kommandoen Skriv ut til fil når du bruker pdf995, men bare klikke på Skriv ut som om du skrev ut på en vanlig papirskriver. OBS: Installasjonen best?r av to deler. F?rst m? du installere selve skriverdriveren. Det gj?r du ved ? klikke p? knappen Installer p? denne siden. Deretter klikker du p? knappen Ekstra filer for ? f? installert de underliggende rutinene. Du skal ikke velge kommandoen Skriv ut til fil n?r du bruker pdf995, men bare klikke p? Skriv ut som om du skrev ut p? en vanlig papirskriver.
Created attachment 9122 [details] Sample text with repaired and broken 'æøå' chars. ä and ö are fine.
ä and ö, which are used in Swedish and Finnish, are fine, interestingly.
jw: reassigend to jw
hello henrikc, can you attach the word document please? the rtf looks in OOo1.1rc4 like in the MS wordpad. copy and past it from any other into OOo does not change the swedisch letters in any way
Created attachment 9124 [details] Word document showing broken chars
Created attachment 9125 [details] Word document showing broken chars
Created attachment 9126 [details] Word document showing broken chars
Created attachment 9127 [details] Here's an OOo version of the file.
Sorry for the triple .doc file - site was slow in responding. There's a good reason the original .RTF file looks like a WordPad file - WordPad uses the same RichText control that I created it with in my own application. BTW, how do I ascertain exactly which OOo I'm currently running? I've seen the bug in rc4, but am not sure if I'm running rc3 or rc4 here.
Hm, to clarify: The problem occurs when pasting from OOo to other apps. And now I had a case when it works. Damn, thought it was systematic.
Possibly related behaviour: When I paste from OOo to my RichEdit control, there are sometimes characters dropped from the text when format changes. For instance, if I have a word in Italics in the text, the space between the italicised word and the next gets lost in the paste. In the reverse direction, pasting from RichEdit app to OOo, I notice that line breaks frequently turn into page breaks.
FYI, I'm running a clipcache application that preserves every single cut&paste operation I've done for months. Feel free to request more examples.
for the record: not reproducible (on linux)
sorry henrikc, i can not reproduce this neither. can you tell me the name of the clipcache program? maybe it just occurs in combination with this program?
The ClipCache (that's its name) will just do an after-the-copy paste into itself, probably not significant. Instead, I'll get my hands on some Word documents that have the identical problem - my neighbour has a drive full of these. I'll be back :)
Created attachment 9196 [details] MS Word 2000 file. Problem reproduced in OOo 1.1.0 rc8 Danish.
Created attachment 9198 [details] Another Word2000 file with the problem. OOo 1.1.0 rc8, Danish
WFM in RC3 english build. I suspect the problem is caused by the danish RC build and in that case this issue should be assigned to Pavel Janik.
@henrikc: Can you reproduce with original English RC4?
I get the bug on my clean windows XP system too. It is occuring consistently and appearently without exceptions. Someone suggested a "bad build"? This is the build used: ftp://ftp.linux.cz/pub/localization/OpenOffice.org/devel/1.1.0- RC4/build-8/OOo_11rc4_danish_Win32Intel_install.zip OS: Windows XP SP1 Pro Danish
We should separate two issues: bad cut&paste and bad import. We do not change anything in cut&paste, but our patch which changes old DOC import could be a potential problem. But without testing original RC4 English, I can not say more.
I've tested my files in English rc4, and the import bug does *not* exist there. Neither does the import problem, but that one has been a little more elusive, I'll try it out some more. Personally, I believe the two are the same issue. Sure looks like a build problem. Pavel?
so. i installed the version tazly linked to with that version the error is reproducible even in the english version the error occurs after the installation of the danish version after uninstalling the danish version and reinstalling the english one the error is not reproducible again. i think this is an error in the danish build. pavel is this your issue then?
Then yes, it is a problem with the build. How could I reproduce it the faster way? Opening No.RTF or Humor.doc in OOo RC4?
Any of the two files would do, they're both small and show the same problem.
Sorry, misunderstood the question. Take any of the last two files, those are verified to have the problem with rc4 build 8.
Ressign to me.
It is in fact cause by one of patches we use in our version. I have conditionalized it for Czech version only thus Danish will not be affected from build-9 on. Please verify in build-9 which will be produced during weekend. This is only about opening the file Humor.doc. Not about cut&paste. Please file separate issue for that with good explanation. But as I said, we do not patch anything about cut&paste...
Please verify.
-
Will verify when build 9 is there, on Monday at latest.
Midair collissions abound. Did the status end up correct?
Fixed build-9 for GNU/Linux is on its way to ftp.linux.cz.
Good. Since I found the problem under Windows, a Windows build would be nice as well, or I'll not feel safe that the issue is resolved.
Of course :-) Windows build will be a little bit later, approx. on Tuesday. Other people can at least confirm on Linux. But the final word is on you anyway :-)
- added myself cc: Henrik, please reassign this issue to Pavel.
No, it is fixed from my side. I reassigned it back to Henrik for verification.
OK, I will wait for you to build a Windows-version since the Windows version caused the problem in the first place. Is the problem confirmed in a danish Linux RC4-build?
Yes, it is generic problem. I reproduced it with build-8 also on GNU/Linux. build-9 is OK.
set target to 1.1.1
> set target to 1.1.1 Say what? It's a critical MS-Word import bug. I'm just waiting for the Windows build to declare it fixed. Pavel?
This problem is only present in our builds and is already solved in Linux builds. Windows builds will follow the next week because Josef is on vacation. It will be included 1.1.0 Danish final.
I've verified the problem is fixed in RC5. Setting new status. Interestingly, that seems to be the case for cut&paste too - had no problems during a few operations. A power outage prevented me from testing this intensely, but I'll make noise if it turns out there's still trouble. Now I'm wondering what that patch of our Chezk friends was about - will it still be applied to their version, leading to them having trouble importing the Scandinavian chars?
We have a problems with Czech characters without that patch. With it, we are OK. This is a tradeoff: having Czech or not... It will be fixed in 2.0 for all languages. Thank you for verifying.