Issue 114482

Summary: Creating new Hebrew document in Word 95 format produces corrupted file (Attached)
Product: Writer Reporter: gilboa <gilboad>
Component: editingAssignee: AOO issues mailing list <issues>
Status: CONFIRMED --- QA Contact:
Severity: Trivial    
Priority: P2 CC: eric.savary, issues, jim, kaplanlior, knmc, netanel, tal
Version: OOo 3.2Flags: tal: 4.1.4_release_blocker?
Target Milestone: ---   
Hardware: All   
OS: Linux, all   
Issue Type: DEFECT Latest Confirmation in: 4.1.4
Developer Difficulty: ---
Attachments:
Description Flags
Corrupted file.
none
Document - original
none
Document - Word2K
none
Document - Word95, corrupted.
none
screen shot of the problem in 3.2.1
none
screen shot of the problem in 3.3RC1
none
orig odt file
none
the file after exporting to word 6 format (same result for w95), 3.3-M15 (native)
none
File from AOO 4.1.4 none

Description gilboa 2010-09-13 16:18:34 UTC
Hello,

Managed to reproduce this bug twice on my Fedora 13/x86_64 machine.
Steps:
1. Open new OOWriter document.
2. Write a Hebrew document.
3. Save file as Word/2K .doc file.
4. Open file using Writer.

- Gilboa
Comment 1 gilboa 2010-09-13 16:19:08 UTC
Created attachment 71667 [details]
Corrupted file.
Comment 2 eric.savary 2010-09-13 16:37:32 UTC
When trying to "save as" your document in OOo 3.3 the filter listbox proposed
Word 95 which indicates you have saved as Word 95 and not Word 2000.

Furthermore I couldn't notice a change in format when saving on SuSE with the
native OOo build.

Thus:
- Either you choose the wrong filter while saving
- or you are using a non native (from your distribution) OOo version which shows
Word 2000 but saves to Word 95 (?).

Anyway, not reproducible e.g. invalid.
Comment 3 eric.savary 2010-09-13 16:37:50 UTC
Closed
Comment 4 gilboa 2010-09-14 11:29:45 UTC
Forgive me for reopening the issue.
Seems that I was mistaken, and I did unmistakably saved the document as Word95 

Never the less, I believe you mis-understood the bug report.

Take document A. (Minrav-tiles-initial.odt)
Save it as word 2K. (Minrav-tiles-initial-2000.doc)
Save it as word 95. (Minrav-tiles-initial-95.doc)

Assuming that you cannot read Hebrew, you'll should be able to notice that the 
Word95 version is completely corrupted and uses the wrong character set.

Now, it's entirely possible that Word95 doesn't currently support UTF8, hence
breaking non-English text completely. In such a case, trying to save any unicode
document in this format should be disabled.
If Word95 does currently support UTF8, this is a bug.

- Gilboa
Comment 5 gilboa 2010-09-14 11:30:33 UTC
Created attachment 71671 [details]
Document - original
Comment 6 gilboa 2010-09-14 11:30:53 UTC
Created attachment 71672 [details]
Document - Word2K
Comment 7 gilboa 2010-09-14 11:31:23 UTC
Created attachment 71673 [details]
Document - Word95, corrupted.
Comment 8 kaplanlior 2010-09-14 22:40:44 UTC
Reproducible on OpenOffice.org 3.2.1 OOO320m19 (Build:9505)

The corruption happens when exporting directly from ODT to word95. So the word2k
can only show it's not a general odt-> doc problem.
Comment 9 eric.savary 2010-09-15 09:12:28 UTC
THis only happens when exporting to Word 95.
Changing summary accordingly.

@hbrinkm: MS Office XP seems to be able to load and save Hebrew to Word 95
format. Hebrew text is lost when saving to Word 95 in OOo.

Don't know which effort we can put into fixing this for a 15 years ole file
format...
Comment 10 kaplanlior 2010-10-19 13:00:00 UTC
This also happens with word 6 format, not only word 95. And the problem is in
the export. I could open my old documents which were created 15 years ago.

@es: it might be an old file format, but might be important for archives.
Comment 11 kaplanlior 2010-10-19 13:05:08 UTC
Created attachment 72104 [details]
screen shot of the problem in 3.2.1
Comment 12 kaplanlior 2010-10-19 13:07:04 UTC
Created attachment 72105 [details]
screen shot of the problem in 3.3RC1
Comment 13 kaplanlior 2010-10-19 13:09:26 UTC
There's seem to be an improvement regarding this bug in 3.3RC1. The word95 and
word6 files not look like they the wrong encoding instead of being completely
corrupted. I've attached two screen shots to better illustrate the change.
Comment 14 netanel 2010-12-17 12:16:58 UTC
This issue afects not only Hebrew, but also Arabic and Persian (AR + FA), and
all east-asian languages (Japanese, Korean, Chienese...): export to word-6 /
word-95 format causes text to be unreadable.
(note that orig file need to be closed before loading the exported file)
Comment 15 netanel 2010-12-17 12:19:30 UTC
Created attachment 75362 [details]
orig odt file
Comment 16 netanel 2010-12-17 12:21:55 UTC
Created attachment 75363 [details]
the file after exporting to word 6 format (same result for w95), 3.3-M15 (native)
Comment 17 Tal 2017-04-15 19:47:30 UTC
Verified in v4.1.3, Mac OS: Creating and saving a file in "Word 95 (doc)", but when the file the characters are corrupt. 

Suggestion for quick fix: Remove the 95 format at all, as 97/2000/xp (doc) format works fine.
Comment 18 Tal 2017-04-15 19:50:19 UTC
Raised Priority to P2 and asked as blocker. If Writer can't save and open simple text files in all its stated formats, then I think it's a show stopper. Either remove what isn't supported well (word 95), or fix it.
Comment 19 Marcus 2017-05-20 11:05:09 UTC
Reset assigne to the default "issues@openoffice.apache.org".
Comment 20 Jim Jagielski 2017-05-23 13:51:06 UTC
We have 4.1.4-dev packages available... Can we get confirmation that it still exists in HEAD of aoo-414 ?
Comment 21 Keith N. McKenna 2017-12-04 23:17:58 UTC
Created attachment 86289 [details]
File from AOO 4.1.4

confirmed in 4.1.4
Steps
opened original odt document
save as to word 95 document
closed AOO
opened newly created word 95 document in AOO
appeared to be the same as the original corrupted document