Issue 42909

Summary: Interoperability with MS Word - text language doesn't convert properly
Product: Internationalization Reporter: samphan
Component: codeAssignee: flr <freuter>
Status: CLOSED DUPLICATE QA Contact: issues@l10n <issues>
Severity: Trivial    
Priority: P2 CC: arthit, hin.stone, issues, jjc, nusorn
Version: 680m84Keywords: oooqa
Target Milestone: ---   
Hardware: PC   
OS: All   
Issue Type: DEFECT Latest Confirmation in: ---
Developer Difficulty: ---
Issue Depends on:    
Issue Blocks: 41707    
Attachments:
Description Flags
The original Writer document for line-breaking test
none
The same document convert to MS Word XP format
none
bugdoc with correct language settings - but still not working
none
Bugdoc exported with patch from james - attribues set corretly; however the line-break is not performed in my version of word
none
Bugdoc saved from Word, after space/backspace with US keyboard, with bad breaks
none
Bugdoc saved from Word, after space/backspace with TH keyboard, with good breaks
none
Manually hacked version of flr's bugdoc with the applyBreakingRules flag set
none
Untested patch to unconditionally set the applyBreakRules flag on export none

Description samphan 2005-02-16 11:54:28 UTC
Both MS Word and OOo have a concept of text language that is a property of the
characters. In OOo, you set the text language in 'Format->Character-> Fonts'. In
MS Word, you set the text language in 'Tools->Lanugage->Set Lanugage'

The reason is, for some features, the softwares must know the language of the
text to work correctly. An example : Thai texts don't have spaces between words
but still line-breaking is done at word boundaries.  In OOo you must set the
text language correctly (CTL font->Language = Thai) for the Thai line-breaking
to work. 

However, even when the text langauge is set correctly in OOo, if you save the
OOo document as Word XP document, the langauge information saved by OOo will not
be recognized by Word.

Test case:-
- The attached Writer document contains a line with 16 copy of a 3 characters
Thai word 'การ' and a space in the middle. The line is formatted as CTL=Thai.
การการการการการการการการ การการการการการการการการ

- I saved the Writer document as a MS Word XP .doc file, attached

1) Load the .doc file in MS Word XP/2003. The line can be breaked at every 3
characters but Word breaks the line at the space in the middle. That's because
Word doesn't think that the text is Thai. You can check this by checking the
current language in 'Tools->Lanugage->Set Lanugage'. It'll be 'English (U.S.)'.

2) The problem is : You can't even change the language to Thai to make the
line-breaking behave correctly.  Try select (all) the text and open the 'Set
Language' dialog box, then choose Thai. Nothing will change. Checking the
current language in the 'Set Language' dialog box again, you'll see that it
still is 'English (U.S.)'.

3) Try load the .doc back in Writer, you'll see that the language information is
still there, 'Format->Character->CTL Font' still is Thai. So the information
must be saved, but in a way that MS Word doesn't recognize and use.

This bug is very serious because it makes it impossible to convert Thai
documents from OOo to MS Word or create a MS Word document in Thai using OOo. I
don't know if this happen to other languages too but I guess it should be.
Comment 1 samphan 2005-02-16 11:55:44 UTC
Created attachment 22701 [details]
The original Writer document for line-breaking test
Comment 2 samphan 2005-02-16 11:56:38 UTC
Created attachment 22702 [details]
The same document convert to MS Word XP format
Comment 3 jjc 2005-02-20 00:53:29 UTC
This looks similar to issue #23784, which sba concluded was a feature request
for more text alignment options, but I don't see what it has to do with text
alignment .  (From a user's perspective, it's a serious bug, not a missing feature.)
Comment 4 arthit 2005-03-21 04:53:03 UTC
Confirmed.

Raised Priority to P2
- data loss (language information)
- basic functionality is not working properly (export document)

Set Target milestone to OOo 2.0,
please change this if you find it inappropriate.
Comment 5 arthit 2005-03-21 04:54:19 UTC
comment from james_clark
"issue 42909 : this hasn't been analyzed yet;
it makes the OOo functionality of exporting to .doc format effectively
non-functional for Thai users.
It affects other languages as well, but the effects are much more serious for Thai:
text is not properly tagged with its language, and the language of text cannot
be changed in Word;
*** this is critical for Thai because line-breaking does not work in Word if
text is not properly tagged as Thai. ***
There's no known workaround."
Comment 6 falko.tesch 2005-03-21 11:38:00 UTC
FT: Andreas please check. I consider this a serious issue, too.
Comment 7 andreas.martens 2005-03-21 17:18:14 UTC
Yes, I agree. It's a serious issue. At least for OOo2.0.1 we have to find a
solution. BTW: if you use RTF format instead of .doc, the language information
is recognized by Word.
Comment 8 jjc 2005-03-26 10:50:59 UTC
If within Word you save the document (e.g. LineBreakTest.doc) as XML, close the
document, and then open the XML version of the document, the text is tagged as
Thai.  This is in Word 2003, Thai edition.
Comment 9 andreas.martens 2005-03-29 13:28:48 UTC
AFAIK FME and you did already some investigation into this issue.
Comment 10 jjc 2005-03-30 01:17:40 UTC
The fix to issue #46087 will fix this issue as well. See that issue for more
details. Runs are in fact correctly tagged with the language.  The problem is
that runs are not marked as being complex script: Word evidently doesn't allow
something that's not complex script to be tagged with a complex script language.
Comment 11 andreas.martens 2005-03-31 14:28:41 UTC
We will investigate to find a solution for OOo2.0
Comment 12 flr 2005-04-01 16:26:20 UTC
Created attachment 24528 [details]
bugdoc with correct language settings - but still not working
Comment 13 flr 2005-04-01 16:28:16 UTC
flr: The problem is *not* the language setting. I have attached a .DOC file -
generated with a modified Writer - whose language is correctly set to Thai.
However, WW does not brake the lines correctly.
I suggest there is a Unicode export problem. The .DOC format has a strange
"chp.idctHint" flag...
Comment 14 flr 2005-04-01 17:07:03 UTC
Solved with patch from james_clark for #i46087#.
Comment 15 flr 2005-04-01 17:08:18 UTC
Solved with patch from james_clark for #46087#.
Fixed in dvoqbfix2.


*** This issue has been marked as a duplicate of 46087 ***
Comment 16 flr 2005-04-01 18:47:18 UTC
flr: The patch from james leads to correct language attributes. However my
version of Word still does *not* perform the line break.
Can you try it with your Word Version; perhaps my setting for complex scripts
are set incorrectly.
The patch from james is applied in dvoqbfix2.
Comment 17 flr 2005-04-01 18:48:29 UTC
Created attachment 24531 [details]
Bugdoc exported with patch from james - attribues set corretly; however the line-break is not performed in my version of word
Comment 18 jjc 2005-04-02 03:03:51 UTC
I can confirm that Word 2003 (Thai edition) does not perform correct
line-breaking on LineBreakTest_expored_with_patch_from_james.doc.

Some possible clues:

a) saving this to XML in Word 2003 and reopening solves the problem; if the file
is saved again as .doc, then when the .doc file is reopened is still works correctly

b) if in Word you change the keyboard layout to Thai, then type a space (with
the cursor still before the first character), Word performs correct
line-breaking; if you then do backspace (or Ctrl-Z), the correct line-breaking
remains

c) if you do b), but with US keyboard layout, Word doesn't do correct line-breaking

If after b) and c) (using backspace rather than Ctrl-Z), you then resave the
file as .doc, you get two very similar .doc files, for one of which Word does
correct line-breaking and for one of which it does not. Maybe analyzing the
difference between these files will tell us what the problem is.  Unfortunately
wv2 debug dumps show no difference.
Comment 19 jjc 2005-04-02 03:10:12 UTC
Created attachment 24534 [details]
Bugdoc saved from Word, after space/backspace with US keyboard, with bad breaks
Comment 20 jjc 2005-04-02 03:11:10 UTC
Created attachment 24535 [details]
Bugdoc saved from Word, after space/backspace with TH keyboard, with good breaks
Comment 21 jjc 2005-04-02 06:39:24 UTC
I think I've figured it out. The problem is a missing document property.

If you go to the Compatibility tab of the Options dialog, there should be an
option called something like "Apply breaking rules" (I've only got the Thai
language version, so I'm not sure what it's called in English).  The problem is
that OOo isn't setting this property, which causes Word not to apply Thai
breaking rules. Word is smart enough to set this property automatically when you
enter Thai text or open an XML file containing Thai, but it doesn't set it when
you open a .doc file with Thai.

In the Word XML format this corresponds to the <w:applyBreakingRules/> element.

In the .doc format, it's towards the end of the DOP structure, specifically bit
0x20 in the byte immediately after the 0x04 from fDontUseHTMLAutoSpacing.
Comment 22 jjc 2005-04-02 06:42:33 UTC
Created attachment 24537 [details]
Manually hacked version of  flr's bugdoc with the applyBreakingRules flag set
Comment 23 jjc 2005-04-02 06:46:14 UTC
Created attachment 24538 [details]
Untested patch to unconditionally set the applyBreakRules flag on export
Comment 24 flr 2005-04-11 15:54:06 UTC
flr: duplicate to #i46732#. fixed in fr8fix1 (with appropriate language tests...)



*** This issue has been marked as a duplicate of 46732 ***
Comment 25 jjc 2005-04-19 16:09:22 UTC
*** Issue 23784 has been marked as a duplicate of this issue. ***
Comment 26 Mathias_Bauer 2006-08-30 14:18:51 UTC
closing