Apache OpenOffice (AOO) Bugzilla – Issue 59185
Search and Replace ignores max-character-limit of paragraphs (may cause data loss)
Last modified: 2017-05-20 11:15:12 UTC
Steps to reproduce the problem * Open the attached doc "longissue.odt". It is a 33 page doc, with blank lines created using two paragraph breaks, and single paragraph breaks after single sentences. Note at the very end of the document a line: "This is the last line. In the original doc, the above three paragraphs were repeated 100 times." 100 is counted using a Numer range field. Typically, a more advanced user may want to clean up this document, turning the two consecutive paragraph breaks into one paragraph break, and removing the enters between the lines to combine them to one paragraph. The following procedure works for short pieces of text, but on a moderately size doc such as this, it causes severe data loss. * Find "$" replace with "!token!", regular expressions checked. As yet another issue with regex, not all paragraph marks end up being replaced by !token!. Repeat the find/replace operation hittng "replace all" again. the search now takes significantly longer, and the wakefull eye already sees that by now document already is missing its end and is corruptes, so we could already stop here. However, usually, one would notice this only later. * Now Find "!Token!!Token!" replace with "!para!", regex not checked to later on replace all two consecutive paragraph breaks with one. * Replace "!Token!" with " " (space) regex not checked. * Replace "!para!" with "\n" regex checked to insert paragraph breaks again. Note that we are left with 15 pages only, the last line cut of. Note also that the Fields have been corrupted: "Instance x" counts to 10, then further on the fields are removed. This is an issue resulting in severe data loss and document corruption which therefore should have a high priority.
Created attachment 32271 [details] Document to reproduce the document corruption issue with regex search
confirmed on Windows XP Pro SP2 with OOo 2.0.1 RC4
Confirmed on OO0 2.0, Suse Linux 9.3. At the end, _all_ of my field numbers were corrupt, including the first 10.
Reassigtned to SBA.
Reassigned to SBA.
SBA: P2 is the correct Prio for data loss. The office itself is running and still usable. Prio set to P2.
Indeed, on oooforum.org, it was braught to my attention that OOo seems to have a limit of 64 K for a single paragraph. Therefore, this issue is a consequence of issue 17171. Issue 17171 has a priority of only 4, although in many different ways, it may be at the cause of data loss.
SBA: Reassigned to OS. Target set to OOo 2.03
*** Issue 42899 has been marked as a duplicate of this issue. ***
extended summary to match the real issue here. original summary "Regular expressions search replace may result in severe data loss" Search and replace concatenates paragraphs even when the result exceeds the maximum character limit of 65534 characters (see issue 17171). Search-and-replace should force paragraph-breaks at that limit to avaoid data-loss (and maybe report an error that the limit was reached).
I think the targer 2.0.3 is still too optimistic; it looks like a pretty exotic problem
According to http://www.openoffice.org/scdocs/ddIssues_EnterModify.html#priority prio changed to P3 Target adjusted
move target to 3.x according http://wiki.services.openoffice.org/wiki/Target_3x
Reset assigne to the default "issues@openoffice.apache.org".