Apache OpenOffice (AOO) Bugzilla – Issue 83289
Regex: empty Writer table cells not found with ^$
Last modified: 2013-02-07 22:37:07 UTC
From the regex wiki HowTo: "^$ will match an empty paragraph, which can be replaced by say nothing, in order to remove the empty paragraph. Note that ^red$ matches a paragraph with only 'red' in it - replacing this with nothing leaves an empty paragraph - the paragraph marks at either end are not replaced. It may help to regard ^$ on its own as a special syntax, unique to OOo. Unfortunately, because OOo has taken over this syntax, it seems you cannot use ^$ to find empty cells in a table (nor empty Calc cells)." We ought to be able to find empty table cells, hence this enhancement request. I guess this might be destined for the Great Regex Rethink when and if that happens. (Note that issue 44688 is about this behaviour in Calc. I raise this as a new issue, because the response to issue 44688 is that the behaviour is by design and otherwise might cause performance troubles - there is no such worry here with Writer table cells.)
Reassigned to SBA.
"Real" (= newly inserted) empty paragraphs in tables do get found. But "the last one" is a little tricky. An "empty" table cell still looks like it has an "empty paragraph" in it. But this one can not be removed. Thus it makes no sense to select it via "Find and Replace" unless the replace string "brings the paragraph back" (i.e. in order to fill the empty table cells this way). This is not transparent for the user :-( Confirmed. Reassigned to requirements
This issue relates to the fact that regex in OOo cannot go across paragraphs. E.g. you cannot change something like ^[:space:]*$^[:space:]*([:print:]) to $1 if you want, for instance, replace all manually formatted paragraphs with something else. This is a real exemple: sometimes we get texts that are totally manually formatted (from an ocr): all newlines are made using crlf, and all paragraphs are made by double-crlf. If I want to correct the formatting, I need to use msword. Nor you can insert paragraphs as a part of replacement. Hence the need to treat $^ as the special case, because it's the only way to get rid of empty paragraphs. Neither $ nor ^ represent a symbol by themselves, the "paragraph symbol" stays outside of them, it should not be affected by any replacement of the search like ^.*$ or even ^$, and there should be a way to specify the "paragraph symbol" in a query just as any other ordinary symbol. This would make the handle of ^$ consistent with all the other cases, making ^$ match the empty cell in a table (as it would not be tied to the "paragraph symbol" anymore, just to the beginning and the end of a block of text).