Apache OpenOffice (AOO) Bugzilla – Issue 99031
Text Encoded with CR&LF: 0a or 0d alone next to text loses text
Last modified: 2017-05-20 11:15:39 UTC
To reproduce: 1. In Writer (without JRE (Java Runtime Environment)), create and save a file as Text Encoded with charset 8859-1 and paragraphing as CR&LF. 2. Go somewhere to find qualifying strings. Suggested Web pages are mentioned below and you might make and edit a file using a text editor like gedit 2.10.2 using UTF-8 and a hex editor like KHexEdit 0.8.6. You can paste into gedit and verify with KHexEdit or, probably, you can type into gedit and, with KHexEdit, delete an unwanted nonprinting character and move subsequent characters left one position. 3. Find text strings that end in a hex 0A character or include in the middle a hex 0D character but not both together. 4. Copy the string and paste into the Text Encoded document. 5. The text ending in a lone 0x0A will fail to paste. 6. Save. Close. Open. If the ASCII filter settings dialog asks if you want paragraph breaking at CR, change to CR&LF (that being my default and I want consistency). The text following the lone 0x0D will disappear. 7. Close (without saving). Reopen. If the ASCII filter settings dialog asks if you want paragraph breaking at CR, leave it that way. The text following the lone 0x0D will reappear. I don't use JRE since it crashed my system, but, while I'm not able to use OOo Base, I'm not sure it matters for this problem. When closing and reopening a saved Text Encoded file that has been saved with paragraph breaking at CR&LF and charset 8859-1, some content disappears. The content lost is from a hex 0D to the end of the paragraph containing the character; however, the loss is only if the hex 0D is not followed by hex 0A but is followed by visible text. Preceding text is not affected. No warning of loss is given by OOo; I only found out when I needed to reopen a file. Closely related (and therefore in this report) is that when a text string is followed by a space and the hex 0A character alone, the total string cannot be pasted into the Text Encoded file. Pasting results in nothing appearing. It can, however, be pasted into gedit, which is how I identified the hex character involved (using a hex editor on the file saved in gedit), but copying from gedit and into the Writer Text Encoded file fails. Thus, it doesn't matter from where the string was originally copied (it was originally copied from http://bugzilla.gnome.org/show_bug.cgi?id=570931 (the string "Add me to CC list" but only when selected so as to include a trailing space)). On the other hand, I discovered the hex-0D text-loss problem when copying text from fields in the Bug Report Wizard at <https://bugs.opera.com/wizard/>, as accessed Feb. 7, 2009. Their software apparently adds the 0x0d unsolicited when I paste text into their field; I assume they need it for breaking paragraphs within a field. If, in Writer, a Save command is given without closing the file, text does not disappear at the time of the save (it can be seen using a hex editor), although it will still disappear later. I do not know if the disappearance relative to Writer occurs upon closure or upon reopening, but all reopening was done without having quit Writer and opening as read-only or read-write makes no difference. If the file has been saved but has not been closed, it is possible to recover the text that will be lost after closing by using Undo. Thus, Save per se does not cause the loss. In the hex 0D case, when text is lost while opening with CR&LF but recovered by closing and reopening with only CR set for paragraph breaks, that is not a permanent preservation. If the file is always opened as CR&LF and new text is written at the place of loss, the file is saved and closed, the user quits OOo, and Writer and the file are opened later, the lost content is permanently lost. Expected behavior is that the Text Encoded format using CR&LF and 8859-1 will ignore or delete the nonprinting characters, depending on the character, but not delete any visible characters. It should not add hex 0A to a lonely hex 0D or vice versa, lest that change the page layout by creating a new paragraph break not in the original. (Recovering data: Tangential note: If you're losing data and want it back, try closing without writing or saving, then reopening with the filter settings at whatever the file wants, such as CR or LF. If that doesn't work, try the other setting. If that displays what you want, save-as to a new file name, close, and reopen. If that succeeded, delete the file you no longer need.) Thank you. -- Nick
@nicklevinson: Feel free to reopen when: - you will have tested this again in a current version. - you can give a precise and SHORT description of what you do get and expect. The last comment applies not only for this issues but for all issues you write. Please understand we don't have all the time needed to read saga-descriptions which, for sure, may be reduced to: "- I open this file - copy that - paste it there - save as X with Y parameters. -> reopening I get this Should be: that" Thank you!
Closed
I replicated both cases in OOo 3.0.1. To see the problem of hex 0a alone: 1. Create a new document, save as type Text Encoded, and set paragraph breaking to CR&LF. 2. Log into http://bugzilla.gnome.org/show_bug.cgi?id=570931 or any similar page. 3. Wipe to select "Add me to CC list" including the apparent trailing space (thereby including a hex 0a). 4. Paste into the Writer document. What happens is that pasting fails. It should have pasted. To see the problem of hex 0d alone, use the file I'm going to upload to this issue report. It is a Writer 3.0.1 Text Encoded file, named 0d-alone-in-test.txt. 1. Open it as read-only. 2. When the filter dialog appears, set paragraph breaking to CR&LF and charset to Western Europe (ISO 8859-1). The contents are self-explanatory. After the hex 0d, the balance of the paragraph disappears. The reason for making the file via a hex editor is that the effect is caused by copying from certain Web pages under certain conditions, namely, when text has a hex 0a but not a hex 0d0a and is pasted that way into a Writer Text Encoded document with a CR&LF setting. The hex editing achieves the same effect. Since the disappearance of text after pasting it in, especially when no warning is given, means that users' files would often be corrupted in a way users don't realize has afflicted them until too late to recover missing content, the problem is serious. I hope this is clearer. In general, I give more details because many of us have somewhat different defaults and habits and I see the problem inherent in not giving enough information. Both the 0a-alone and 0d-alone cases are in this one issue because they're likely to have the same cause. Thank you. -- Nick
Created attachment 60138 [details] See last post for case re hex 0d.
@MBA: - Open attached txt document -> Text is missing after ":"
Reset assigne to the default "issues@openoffice.apache.org".