Issue 128492 - Buggy Index generator
Summary: Buggy Index generator
Status: UNCONFIRMED
Alias: None
Product: Writer
Classification: Application
Component: code (show other issues)
Version: 4.1.11
Hardware: All Windows 10
: P5 (lowest) Normal (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on: 23541 40930
Blocks:
  Show dependency tree
 
Reported: 2021-10-31 22:40 UTC by Larry
Modified: 2021-11-14 10:53 UTC (History)
2 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Shows the issues (264.78 KB, multipart/zip)
2021-11-14 10:49 UTC, Larry
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description Larry 2021-10-31 22:40:58 UTC
Generated Index is inconsistent and some of it makes no sense.
There seem to be two underlying problems:

1. OO Writer seems to mark index terms inside the document to be indexed. Even after terms are removed from the concordance file past index terms appear in the Index. In addition the new version of OO does not even show places where a document has been marked. SUGGESTIONS: (A) if marking is necessary it should be done in a temporary document with a known name to be created each time the Index is rebuilt. (B) there should be an option in the Index definition panel to clear any markings in a document should it have been marked in the past.

2. automatically inserted hyphenation marks should be ignored for purpose of indexing. Thus COMPUTER COM-PUTER COM-PU-TER should all be considered as COMPUTER. There should also be an option to remove all optional hyphenation markers.

The above issues were noted in the past.
Comment 1 Larry 2021-11-01 15:44:04 UTC
PS. After posting the above saw that gray background marked terms can be made visible by view > Field Shadings
Comment 2 Peter 2021-11-01 21:08:24 UTC
What kind of Index did you try?

About:
"OO Writer seems to mark index terms inside the document to be indexed. Even after terms are removed from the concordance file past index terms appear in the Index."

If you look into the Navigator in the sidepane, there should be an option Index. If you open the point, you get a list of all indexes. You can Click on the Index in question and in the menue select Index -> Update. In this moment OpenOffice will update all displayed indexes.
You can also Click the Outdated Index.


About:
"In addition the new version of OO does not even show places where a document has been marked. "

As you noted yourself, it depends on your Options.

About:
"(B) there should be an option in the Index definition panel to clear any markings in a document should it have been marked in the past"
You mean remove the Index? You can do that with right click, Index -> remove.

About:
"2. automatically inserted hyphenation marks should be ignored for purpose of indexing. Thus COMPUTER COM-PUTER COM-PU-TER should all be considered as COMPUTER. There should also be an option to remove all optional hyphenation markers."
Indeed this is not recognized. Starting Point for a Dev would be the break iterator I guess. Check [1]

[1] https://wiki.openoffice.org/wiki/Writer/Text_Formatting
Comment 3 Peter 2021-11-01 21:16:54 UTC
The Hyphen identification is covered in -> 23541
In kind one could argue 40930 is similar. But not the same since it is About word boundaries. So maybe if we add hyphens we could also give an option to define other none boundary breaking characters.
Comment 4 Larry 2021-11-01 22:23:46 UTC
“If you look into the Navigator in the sidepane, there should be an option Index. If you open the point, you get a list of all indexes. You can Click on the Index in question and in the menue select Index -> Update. In this moment OpenOffice will update all displayed indexes.
You can also Click the Outdated Index.”
Updating the Index, even if the Index parameters are edited, retains all the old “marked” (gray background terns etc). And the errors recur. 

MY RESPONSE:

(The indexer “marks” up the document based on the Concordance file and even if some terms in the file are later dropped and the Index is updated the markings are retained in the document and the obsolete terms etc still appear in the updated Index.
 
The best and maybe only solution seems to be an option to clear all marked text somehow, which should be simple to implement.
***


"(B) there should be an option in the Index definition panel to clear any
markings in a document should it have been marked in the past"
You mean remove the Index? You can do that with right click, Index -> remove. 

MY RESPONSE:

Removing the Index leaves the document marked up. That is the source of the problem.
An option is needed to remove all markings.
Comment 5 oooforum (fr) 2021-11-10 18:57:40 UTC
Save our time and provide:
- a sample document
- two screenshot with noted and expected result
- a step-by-step explanation to reproducing
Comment 6 Larry 2021-11-14 10:49:48 UTC
Created attachment 87073 [details]
Shows the issues
Comment 7 Larry 2021-11-14 10:53:10 UTC
Sent ZIP file containing:

Sample .odf file
Concordance file
Screen shots

ISSUES:

1. How to remove gray from ALL terms marked by gray (which is set by indexer) with a single command or macro?

2. corporation twice in Index, once hyphenated

3. hyphenated only appears once although once it has hyphen