Issue 31168 - Data Loss when saving Unicode File to OO File Format
Summary: Data Loss when saving Unicode File to OO File Format
Alias: None
Product: Writer
Classification: Application
Component: ui (show other issues)
Version: OOo 1.1.2
Hardware: PC Windows 2000
: P3 Trivial (vote)
Target Milestone: ---
Assignee: openoffice
QA Contact: issues@sw
Depends on:
Reported: 2004-07-07 02:26 UTC by reinerg61
Modified: 2013-08-07 14:43 UTC (History)
3 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---

UTF8 Encoded File with BOM and 4 characters (19 bytes, text/txt)
2004-07-07 02:27 UTC, reinerg61
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description reinerg61 2004-07-07 02:26:01 UTC
Unicode files with code points U+1XXXX or greater can not be saved in Open 
Office format. The characters are missing when the *.sxw file is reopened.


OS: Win2000 SP4 w/ registry modifications to support Unicode code points 
U+1XXXX and greater (see Also 
installed GB18030 support package ( and the 
free CODE2001 font (

Test Case:

Step 1: Copied 4 characters U+1D790 U+1D791 U+1D792 U+1D793 into OpenOffice 
using BabelMap application and doing a copy and paste (OO doesn't have ability 
to enter code points >= U+FFFF). (Characters are invisible at this point since 
my default font does not have the glyphs for these characters).

Step 2: Select "Edit:Select All"

Step 3: Set font to Code2001 (characters are now visible)

Step 4: Save as "Text Encoded" UTF8 (This creates a file with a BOM and the 4 

Step 5: Close file and reopen. Edit:Select and change font to Code2001 (OO 
locks up if Code2001 is used as the default font). Characters are visible, so 
this proves OO can successfully save as a plain UTF8 file. Also verified file 
correctness with hex editor.

Step 6: Now save as " 1.0 Text Document (.sxw)" (Again OO locks 
up for me, might be a Code2001 font problem)

Step 7: Reopen the new sxw file.  The characters are missing!!

Note: The lockup problem does not seem to occur when I use the  commercial 
SimSun (Founder Extended) font, but the missing character problem still 
occurs.  These fonts both have glyphs at code points greater than U+FFFF. See 
my messsage in the users mail group "Unicode Plane 2 Questions"
Comment 1 reinerg61 2004-07-07 02:27:50 UTC
Created attachment 16293 [details]
UTF8 Encoded File with BOM and 4 characters
Comment 2 michael.ruess 2004-07-07 11:08:26 UTC
Reassigned to ES.
Comment 3 Stephan Bergmann 2004-07-07 14:22:16 UTC
It appears that the four characters >= U+10000 are completely missing from the
context.xml stream of the .sxw document generated in step 6.
Comment 4 reinerg61 2004-09-26 18:30:19 UTC
I tried opening the test case UTF8 file on on Debian Linux kernel 2.4.22 with
KDE desktop.  With the Linux version of OpenOffice 1.1.2 I only see square
symbols.  The proper glyphs are not being drawn.
Comment 5 openoffice 2004-11-18 20:28:47 UTC
Fixed saving of Unicode >= 0x010000 in XML. (Loading already worked.)

Fix is in CWS swqcore02; should make it into milestone SRC680 m65 or so.

dvo->reinerg61: Please test once this is available in a public build. Your
comment on display should go into a different issue (if it persists), because
load/save and the display code are rather different things.
Comment 6 openoffice 2004-12-10 12:44:13 UTC
reopen for QA
Comment 7 openoffice 2004-12-10 12:44:45 UTC
dvo->es: Please test.
Comment 8 eric.savary 2004-12-10 13:23:42 UTC
Comment 9 eric.savary 2004-12-13 23:49:26 UTC
Comment 10 eric.savary 2004-12-13 23:51:20 UTC
fixed but failed.
Now the text does not even display boxes, the document shows as empty.
No text in there
Comment 11 openoffice 2004-12-14 13:18:16 UTC
Strange. Works for me in swqcore02 build.
Comment 12 openoffice 2004-12-14 13:44:40 UTC
dvo->es: Visibility of the symbols may be a font problem. The bugs is hopefully
fixed anyway. :-)
Comment 13 openoffice 2004-12-14 13:45:19 UTC
reopen for QA
Comment 14 openoffice 2004-12-14 13:45:30 UTC
dvo->es: Please test.
Comment 15 openoffice 2004-12-14 13:45:39 UTC
set fixed
Comment 16 eric.savary 2004-12-14 16:14:16 UTC
Verified in cws_swqcore02
Comment 17 eric.savary 2005-01-10 17:33:21 UTC
ES->DVO: as seen and discussed, the fix is not complete in the master. The
characters are there but invisible (not painted). Maybe any conflict with a
recent VCL CWS?
Comment 18 eric.savary 2005-01-10 17:33:42 UTC
Comment 19 reinerg61 2005-01-11 03:43:51 UTC
I repeated the test on build 680_m69 on a Window XP machine. The characters no
longer display. But, the characters will display on this machine using version
1.1.4. I tried Uniscribe (usp10.dll) version 1.420.2600.2180 and 1.471.4030.0.
Uniscribe (usp10.dll) is used to render glyphs on a Windows machine.
Comment 20 openoffice 2005-01-11 15:10:45 UTC
dvo->reinerg61: Thanks for the report. I am seing the same issue here.

dvo: I'm in a bit of a fix here, because:
1) The load/save part now works, and 
2) Unicode surrogate support seems broken elsewhere, particularly the display, and
3) surrogate support isn't even officially in the product.

What I want to do is this: I will consider this issue to be for load/save in XML
only. As such, it's a developer bug, fixed, and can now be closed. Dealing with
the surrogate display issues is another issue since it touches completely
different code. (And doesn't match this issue's description either.)

dvo: I pronounce this issue fixed. 'Fixed', in that any surrogate characters
should be loaded and saved correctly from/into the Writer XML formats (*.sxw,
*.odt). The fix has been integrated into milestone m70.

dvo->hdu: Issue #i40391# is a follow-on to this issue deals with the display of
surrogate characters. I assign it to you. Examples from this issue can be used
to reproduce the problem.

dvo->reinerg61: Load/save should work in m70 (or later). The display problem
will be tracked using issue 40391.
Comment 21 Mathias_Bauer 2007-02-05 13:55:24 UTC
going to close ancient issues
Comment 22 Mathias_Bauer 2007-02-05 13:57:23 UTC
closing ancient issues