Issue 75550 - AutoCorrect option for recognizing typed pagenumber
Summary: AutoCorrect option for recognizing typed pagenumber
Status: CONFIRMED
Alias: None
Product: Writer
Classification: Application
Component: editing (show other issues)
Version: OOo 2.1
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on: 75524
Blocks:
  Show dependency tree
 
Reported: 2007-03-20 11:30 UTC by tuharsky
Modified: 2013-02-07 22:36 UTC (History)
1 user (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description tuharsky 2007-03-20 11:30:53 UTC
This is another option for the Bad Document Correction Tool (Issue 75524).
The purpose of the option is: detect and remove hardwired page numbering.

Sometimes the user dosen't know, how to put proper page numbering into document,
so he does it as on mechanical typewriter. He places some -1- or similar text on
the bottom of every page.

Since the pages are somewhat fuzzy, after intentional or printer-aided
reformatting, the "page numbering" is corrupted. If the pages are "shortened",
the "page numbers" are moving up on every page. If page is "prolonged", the
"page numbers" leak to the beginning of next page. Either way, they're bad.

The option should detect them somehow and remove completely, let the user make
his numbering, or offer call to the Page Numbering Wizzard (issue 7065).


The detection of "hardwired page numbering" could be a difficult task if done
automatically, however much simpler if the user is asked to cooperate.

For example, the user could be asked to select the first sample of the
numbering. OOo could do some corrections of the sample (spacebars removal before
and after the pattern, line break removal etc). Even variations of the patterns
could be considered to allow flexibility and respect discrepancies and typos.
For example, if the user selected some "- 1 -" sample, then also variants "-
1-", "-1-", "-1 -" etc should be considered.

The help of the user would make the search much easier for the OpenOffice.org.
Not even the patterns could be detected more precisely than if they were totally
unknown. Also the distances between the "hardwired page numbers" could be
guessed from the user selected sample.

OpenOffice.org should then search the patterns. All lines, that contain only a
number or somehow resembles the searched patterns, are suspected. OOo should
search similar pattern with number increased. Then it could do some statistical
analysis. Gauss-curve could be computated, representing, how often are such
patterns numbers repeating inside the document. If the curve is very sharp and
concentrated, meaning that such a pattern is repeated in quite predictible
intervals, then we have probably found the "hardwired numbering" and such lines
could be safely removed.

Please note:
This option of Bad Document Correction Tool should take place BEFORE the Line
breaks correction (Issue 75549) and probably before any other options, because
the less corrections have been already taken to the document, the bigger chance
of precise results of "hardwired page numbers" detection.
Comment 1 michael.ruess 2007-03-20 13:03:51 UTC
Reassigned to requirements.