Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing
|Summary:||Bad document correction -line breaks removal|
|Component:||editing||Assignee:||AOO issues mailing list <issues>|
|Status:||CONFIRMED ---||QA Contact:|
|Issue Type:||ENHANCEMENT||Latest Confirmation in:||---|
|Issue Depends on:||75524|
Description tuharsky 2007-03-19 15:49:40 UTC
This is related to Issue 75524. One option of "Bad Document Correcting Tool" could offer the intelligent removal of unnecessarry line breaks, being the sub-option of general BDC Tool. Purpose: Some users type the text on PC the same way they did on mechanical typewriters -they give a line break at the end of every line. Such document is impossible to format, one must manually delete the line breaks. Moreover, if the document suffered some printer-aided reformatting, the situation is even worse -You have for example single line of text continuing on the next line (single word or a few) and THEN suddenly the line break. Next line performs similary and so on. I'm talking about the same effect as in the mail clients that put line breaks automatically. Then You open the mail in other mail client, forward it etc. At the end, You have the mentioned ugly corrupted formatting of text. So, the option should offer a convenient way of automatical removal of such mis-breaked lines. An algorithm is to be made to do the proper mis-breaked line detection, for the start some simple set of rules could do: 1, The text section should be considered as "intended consistent", if there is no empty line. Other words, even if the text contains line breaks, it is considered as "should be consistent" if it dosen't contain ENTIRELY EMPTY line. Other words, the text between two empty lines is considered as single consistent block. The "line mis-breaks" should be removed on this general basis, with more fine tuned heuristics rules as follows: 2, The line is considered "intentionally ended (with line break that should remain untouched)" if it's length is less than, say, 3/4 of the full line length. 3, If in the defined "intended consistent" block there are lines, that are just a few (up to, say, 20) letters longer than full line length (so that just few characters are in the next line and then ended with line break), it is considered as probably line mis-break. 4, The lines, that contain bullets or numbering at the beginning, are considered as intentionally (regulary) ended, thus the line break at the end of such line should remain untouched. 5, If the whole line is based on different font than the majority of the "intended consistent" block, the probability of line mis-break is smaller; the line could also represent kinda header. Please, add more rules if You wish. In general, the function would analyse the text block, or whole document if selected, and remove the line breaks that are suspected of being "unintentional" or "mis-used".
Comment 1 tuharsky 2007-03-19 16:05:23 UTC
6, The probability of line mis-break is higher, if the line starts with down-letter 7, The probability of line mis-break is higher, if the previous line dosent's end with dot or similar interpunct sign. 8, The probability of line mis-break is higher, if the previous line ends with comma 9, The probability of line mis-break is significantly lower, if the majority of lines in the "intended consistent" block also end up with comma (because this could suggest some kind of bulletting)
Comment 2 michael.ruess 2007-03-19 16:44:44 UTC
Reassigned to requirements.
Comment 3 tuharsky 2007-03-20 08:07:04 UTC
Hi, mru "Enhancement is an improvement to an existing feature. Feature is an addition to the software to add a piece of functionality that does not yet exist." Do You mean such functionality already exists and just needs to be improved? I'd like to see it, thus I could better cooperate with improvements..