Issue 99031

Summary: Text Encoded with CR&LF: 0a or 0d alone next to text loses text
Product: Writer Reporter: nicklevinson <nick_levinson>
Component: editingAssignee: AOO issues mailing list <issues>
Status: CONFIRMED --- QA Contact:
Severity: Trivial    
Priority: P3 CC: issues
Version: OOo 3.0.1   
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: DEFECT Latest Confirmation in: ---
Developer Difficulty: ---
Attachments:
Description Flags
See last post for case re hex 0d. none

Description nicklevinson 2009-02-09 06:39:43 UTC
To reproduce:

1. In Writer (without JRE (Java Runtime Environment)), create and save a file as
Text Encoded with charset 8859-1 and paragraphing as CR&LF.

2. Go somewhere to find qualifying strings. Suggested Web pages are mentioned
below and you might make and edit a file using a text editor like gedit 2.10.2
using UTF-8 and a hex editor like KHexEdit 0.8.6. You can paste into gedit and
verify with KHexEdit or, probably, you can type into gedit and, with KHexEdit,
delete an unwanted nonprinting character and move subsequent characters left one
position.

3. Find text strings that end in a hex 0A character or include in the middle a
hex 0D character but not both together.

4. Copy the string and paste into the Text Encoded document.

5. The text ending in a lone 0x0A will fail to paste.

6. Save. Close. Open. If the ASCII filter settings dialog asks if you want
paragraph breaking at CR, change to CR&LF (that being my default and I want
consistency). The text following the lone 0x0D will disappear.

7. Close (without saving). Reopen. If the ASCII filter settings dialog asks if
you want paragraph breaking at CR, leave it that way. The text following the
lone 0x0D will reappear.

I don't use JRE since it crashed my system, but, while I'm not able to use OOo
Base, I'm not sure it matters for this problem.

When closing and reopening a saved Text Encoded file that has been saved with
paragraph breaking at CR&LF and charset 8859-1, some content disappears. The
content lost is from a hex 0D to the end of the paragraph containing the
character; however, the loss is only if the hex 0D is not followed by hex 0A but
is followed by visible text. Preceding text is not affected. No warning of loss
is given by OOo; I only found out when I needed to reopen a file.

Closely related (and therefore in this report) is that when a text string is
followed by a space and the hex 0A character alone, the total string cannot be
pasted into the Text Encoded file. Pasting results in nothing appearing. It can,
however, be pasted into gedit, which is how I identified the hex character
involved (using a hex editor on the file saved in gedit), but copying from gedit
and into the Writer Text Encoded file fails. Thus, it doesn't matter from where
the string was originally copied (it was originally copied from
http://bugzilla.gnome.org/show_bug.cgi?id=570931 (the string "Add me to CC list"
but only when selected so as to include a trailing space)).

On the other hand, I discovered the hex-0D text-loss problem when copying text
from fields in the Bug Report Wizard at <https://bugs.opera.com/wizard/>, as
accessed Feb. 7, 2009. Their software apparently adds the 0x0d unsolicited when
I paste text into their field; I assume they need it for breaking paragraphs
within a field.

If, in Writer, a Save command is given without closing the file, text does not
disappear at the time of the save (it can be seen using a hex editor), although
it will still disappear later. I do not know if the disappearance relative to
Writer occurs upon closure or upon reopening, but all reopening was done without
having quit Writer and opening as read-only or read-write makes no difference.

If the file has been saved but has not been closed, it is possible to recover
the text that will be lost after closing by using Undo. Thus, Save per se does
not cause the loss.

In the hex 0D case, when text is lost while opening with CR&LF but recovered by
closing and reopening with only CR set for paragraph breaks, that is not a
permanent preservation. If the file is always opened as CR&LF and new text is
written at the place of loss, the file is saved and closed, the user quits OOo,
and Writer and the file are opened later, the lost content is permanently lost.

Expected behavior is that the Text Encoded format using CR&LF and 8859-1 will
ignore or delete the nonprinting characters, depending on the character, but not
delete any visible characters. It should not add hex 0A to a lonely hex 0D or
vice versa, lest that change the page layout by creating a new paragraph break
not in the original.

(Recovering data: Tangential note: If you're losing data and want it back, try
closing without writing or saving, then reopening with the filter settings at
whatever the file wants, such as CR or LF. If that doesn't work, try the other
setting. If that displays what you want, save-as to a new file name, close, and
reopen. If that succeeded, delete the file you no longer need.)

Thank you.

-- 
Nick
Comment 1 eric.savary 2009-02-12 15:28:48 UTC
@nicklevinson: Feel free to reopen when:
- you will have tested this again in a current version.
- you can give a precise and SHORT description of what you do get and expect.
The last comment applies not only for this issues but for all issues you write.

Please understand we don't have all the time needed to read saga-descriptions
which, for sure, may be reduced to:
"- I open this file
- copy that
- paste it there
- save as X with Y parameters.
-> reopening I get this

Should be: that"

Thank you!
Comment 2 eric.savary 2009-02-12 15:29:11 UTC
Closed
Comment 3 nicklevinson 2009-02-13 11:45:17 UTC
I replicated both cases in OOo 3.0.1.

To see the problem of hex 0a alone:

1. Create a new document, save as type Text Encoded, and set paragraph breaking
to CR&LF.

2. Log into http://bugzilla.gnome.org/show_bug.cgi?id=570931 or any similar page.

3. Wipe to select "Add me to CC list" including the apparent trailing space
(thereby including a hex 0a).

4. Paste into the Writer document.

What happens is that pasting fails. It should have pasted.

To see the problem of hex 0d alone, use the file I'm going to upload to this
issue report. It is a Writer 3.0.1 Text Encoded file, named 0d-alone-in-test.txt.

1. Open it as read-only.

2. When the filter dialog appears, set paragraph breaking to CR&LF and charset
to Western Europe (ISO 8859-1).

The contents are self-explanatory. After the hex 0d, the balance of the
paragraph disappears.

The reason for making the file via a hex editor is that the effect is caused by
copying from certain Web pages under certain conditions, namely, when text has a
hex 0a but not a hex 0d0a and is pasted that way into a Writer Text Encoded
document with a CR&LF setting. The hex editing achieves the same effect. Since
the disappearance of text after pasting it in, especially when no warning is
given, means that users' files would often be corrupted in a way users don't
realize has afflicted them until too late to recover missing content, the
problem is serious.

I hope this is clearer. In general, I give more details because many of us have
somewhat different defaults and habits and I see the problem inherent in not
giving enough information. Both the 0a-alone and 0d-alone cases are in this one
issue because they're likely to have the same cause.

Thank you.

-- 
Nick
Comment 4 nicklevinson 2009-02-13 12:09:16 UTC
Created attachment 60138 [details]
See last post for case re hex 0d.
Comment 5 eric.savary 2009-02-24 17:17:14 UTC
@MBA:

- Open attached txt document
-> Text is missing after ":"
Comment 6 Marcus 2017-05-20 11:15:39 UTC
Reset assigne to the default "issues@openoffice.apache.org".