Issue 61927 - WW6: exporting a document twice replaces capital danish characters with squares
Summary: WW6: exporting a document twice replaces capital danish characters with squares
Status: CONFIRMED
Alias: None
Product: Writer
Classification: Application
Component: save-export (show other issues)
Version: OOo 2.0.1
Hardware: All All
: P3 Trivial with 4 votes (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords: oooqa
: 66388 69473 83861 125966 (view as issue list)
Depends on:
Blocks: 90439
  Show dependency tree
 
Reported: 2006-02-11 08:35 UTC by stp
Modified: 2014-12-25 13:38 UTC (History)
6 users (show)

See Also:
Issue Type: PATCH
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
The best approach I reckon (1.31 KB, patch)
2009-04-09 13:03 UTC, caolanm
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this issue.
Description stp 2006-02-11 08:35:40 UTC
1. write "æ ø å Æ Ø Å" in a new document
2. save as word 6
3. close document
4. open saved document
5. write "æ ø å Æ Ø Å" on the next line
6. save again (overwrite)
7. close document
8. open saved document yet again
9. Notice that Æ Ø Å is now replaced by squares
Comment 1 lohmaier 2006-02-24 00:45:11 UTC
removing ms_interoperability keyword. Bugs in import/export filters don't
deserve the keyword which is meant for fundamental differences- things you can
achieve with one software, but not with the other.

Please don't add the keywords without giving a reason.
Comment 2 lohmaier 2006-02-24 00:51:51 UTC
confirming.
The letters are not replaced by "squares", but by katakana "ニ リ ナ" (ni ri na)
Comment 3 michael.ruess 2006-03-16 15:52:32 UTC
MRU->FLR: exactly perform the described steps. You can copy the mentioned text
via "unformatted text" into Writer so you needn't use the special char dialog.
When reopening the exported document for the second time, the letters are broken.
Comment 4 michael.ruess 2006-06-13 11:18:19 UTC
*** Issue 66388 has been marked as a duplicate of this issue. ***
Comment 5 michael.ruess 2006-09-12 09:46:32 UTC
*** Issue 69473 has been marked as a duplicate of this issue. ***
Comment 6 michael.ruess 2007-11-26 09:25:41 UTC
*** Issue 83861 has been marked as a duplicate of this issue. ***
Comment 7 exexcel 2008-04-28 23:42:40 UTC
This bug also appears in word95-format and also with german characters (ÖÄÜß are
broken, öäü are ok).

It's quite a nasty bug because it breaks existing word95-documents just by
opening, changing something and saving them. And You don't see the problem
before You open the document again days or weeks later. The word95-export is
practically broken for german documents, i would prefer a warning message at
saving to just breaking the document.
Comment 8 flr 2008-04-29 08:56:10 UTC
So --  looks like an import problem at the end. When opening the "broken" doc in
Word the characters are displayed correctly. Can you confirm this?
Comment 9 exexcel 2008-04-29 11:18:12 UTC
Looks rather like an export problem because it works correctly when creating a
new document and only breaks after opening and saving again (see initial
descrption). so writer is able to save it correctly, it just doesn't does it
always. in issue 23813 there's a comment by "cmc" which sounds like there is
mapping done:

"so
on export we break the unicode down into the equivalent windows codepages that
match (which is the bit of cleverness I implemented to improve this whole area
recently),"

... just looks like it's incomplete because it works for öäü but not for ÖÄÜß
(same in danish).

(I don't have word so i can't confirm if that opens it correctly.)
Comment 10 flr 2008-04-29 11:26:19 UTC
well --- Writer definely exports something it can't read again... And cmc is
right, it definelty has something to do with the codepages, since there is no
Unicode support in the WW6 format...

But: MS Word is able to import correctly what Writer saved. Event the "second
export". So it rather looks like an import problem to me... 

Can anyone --- who has Word 95 --- check  whether Word 95 can import both export
1 and export 2? [I currently only have Word > 95 installed]....
Comment 11 caolanm 2008-04-29 12:14:09 UTC
FWIW, on export we use "GetPseudoCharRuns" in writerwordglue.cxx to split the
text of a paragraph into character runs, and for a WW6 we break those runs down
so that they comprise of characters that can all be rendered with the same 8bit
font with a mapping like..

... { UnicodeScript_kLatin1Supplement, 
UnicodeScript_kLatin1Supplement,
RTL_TEXTENCODING_MS_1252}, ...

So you can check if the exported characters are getting an appropiate export
TEXTENCODING as the meCharSet member of a CharRun in there.

For each WW8_SwAttrIter::OutAttr we force a "if not WW8 force the font encoding
to the run's charset" which should set the font for that range correctly for
ww6/95 if it is different from the underlying style

Meanwhile in importing we have SwWW8ImplReader::ReadPlainChars where for WW6 we
call GetCurrentCharSet to figure out the charset that the 8bit text is in. So if
e.g. the export is working but not the import then the right font *should* have
shown up during import in SwWW8ImplReader::SetNewFontAttr with the same
eSrcCharSet that we exported and should be at the top of maFontSrcCharSets and
available for use by ReadPlainChars
Comment 12 flr 2008-04-29 12:22:34 UTC
cmc thanks for the update. The only thing I don't understand then is why MS Word
2003 can read it correctly ;-)
This lead me to thinking that this is an import issue....
Comment 13 stp 2008-10-04 18:10:18 UTC
Please set target
Comment 14 fbonsch 2009-03-28 18:45:43 UTC
OO version 3.0.1 - A number of special characters get lost, when re-opening a 
text file in Word 6 format. They are changed to rectangles, "?" or other 
characters. In the German character set the characters Ä Ö Ü ß are concerned, 
but not ä ö ü. There is no problem with .rtf and .odt formats. This bug makes 
it impossible to use Word 6 files with OpenOffice in a business environment.
Thanks very much to all OpenOffice developers and contributors.
Comment 15 caolanm 2009-04-09 13:03:12 UTC
Created attachment 61485 [details]
The best approach I reckon
Comment 16 caolanm 2009-04-09 13:06:21 UTC
I reckon this is the best approach to take, given what I see with a native copy
of Word95. i.e. while Word97 flags a unicode fonts with a chs for shiftjis,
Word95 marks it as "ANSI"

Outside of that our export looks good, and our import *would* work, except for
the findings of #i52786# where we are forced to discard encoding information for
ANSI ranges and try at a higher level, which gets our importer looking at the
chs value for the font. i52786 might be wrong after all, but this fix here
should be ok whether that's correct or not and should solve the problem.
Comment 17 thorsten.ziehm 2009-11-04 13:36:46 UTC
OOo 3.2 is in show-stopper stage. If this issue is critical for the release
please re-target it back. Otherwise this issue will be set to target 3.x now.
Comment 18 meywer 2009-11-26 16:02:29 UTC
exexcel wrote:
> It's quite a nasty bug because it breaks existing word95-documents just by
> opening, changing something and saving them. And you don't see the problem
> before you open the document again days or weeks later. The word95-export is
> practically broken for german documents, I would prefer a warning message at
> saving to just breaking the document.

A warning appears (if you don't suppress it), but the warning is not good - one
thinks, one looses things, which aren't implemented in word95, but one looses
things, which should work!

Have a look on issue 107220 too.
Comment 19 jbf.faure 2010-03-25 09:51:49 UTC
Add me to cc.
Comment 20 majukr05 2010-08-28 00:45:22 UTC
[OOO330_m5 on WinXP]

That issue still occurs in OOo 3.2.1,
but I can't replicate it anymore in OOO330_m5.
So is it fixed now?
Comment 21 majukr05 2010-10-14 19:48:29 UTC
CWS OOO330/writerfilter08ooo330 [integrated in OOO330_m4]?
<https://tools.services.openoffice.org/EIS2/cws.ShowCWS?logon=false&Id=9642&Path=OOO330%2Fwriterfilter08ooo330>
Comment 22 Rob Weir 2013-03-11 15:04:04 UTC
I'm adding this comment to all open issues with Issue Type == PATCH.  We have 220 such issues, many of them quite old.  I apologize for that.  

We need your help in prioritizing which patches should be integrated into our next release, Apache OpenOffice 4.0.

If you have submitted a patch and think it is applicable for AOO 4.0, please respond with a comment to let us know.

On the other hand, if the patch is no longer relevant, please let us know that as well.

If you have any general questions or want to discuss this further, please send a note to our dev mailing list:  dev@openoffice.apache.org

Thanks!

-Rob
Comment 23 Rob Weir 2013-07-30 02:47:39 UTC
Reset assignee on issues not touched by assignee in more than 1000 days.
Comment 24 Regina Henschel 2014-12-25 13:35:34 UTC
*** Issue 125966 has been marked as a duplicate of this issue. ***
Comment 25 Regina Henschel 2014-12-25 13:38:33 UTC
It is a guess, that the attached patch solves issue 125966 too. Please, when you apply the patch, test, whether it solves issue 125966 too, and if not, set that issue to NEW so that another solution can be found.