Apache OpenOffice (AOO) Bugzilla – Issue 67649
concordance files + multilungal entries; utf-8 don't work
Last modified: 2019-02-01 06:05:37 UTC
For example: Among the words I need for alphabetical index are few in a different language than my document, e.g. in Polish where diacritic marks are relevant (Michał Załęcki), but also in Russian etc. That's no problem in Writer and the document itself. But if I try to create the concordance file manually and save the file in unicode encoding (utf-8) words like the example above aren't indexed. And if I open the file in 'Edit concordance file' screen special characters are changed and became faulty. Also if I try to create a concordance file only within the 'New concordance file' screen and type them there - a) typed in with keybord or b) choosing character from charset map and insert via double click - I only see correct characters until I click the ok button. All what I tried ended up in the same dilemma. Terms I wanted to include in the index aren't indexed. To me it seems that the feature "using a concordance file" cannot handle utf-8 in general or don't handle utf-8 properly for all characters.
Created attachment 37942 [details] writer document
Created attachment 37943 [details] concordance file sdi utf-8
Reassigned to ES.
added myself to cc list
I cannot reproduce ther problem. Differenciate: a) entries in the document b) entries in the sdi file I see that aome entries from a) are broken. But all entries of b) are ok. So something wrong happened while generating (searching and marking the entries) the index. To cure the problem: - delete all entries in the doc - reapply the sdi. Any comments?
For the record, your experience is similar to mine. We use a concordence file to create the index for the User Guide. It is not complete because we lost our indexer but no problems, such as described in this issue, have shown up. Very strange but worthwhile monitoring progress.
ES wrote: To cure the problem: - delete all entries in the doc - reapply the sdi. This does not solve the problem of establishing concordance between text in open office with diacritic characters and concordance file. Have ever tried to create a concordance file including diacritic characters (it seems open office does not support for example latin-A extended for creating alphabetical index by using concordance file). Please create a writer document only with two entries: Böhm and Chałasiński to see whether it works - it would be helpful!
That's what I already did first using your file (and I still have no problem creating a new doc, new concordance file with thos 2 names). We need to find out where the problem is: - your locales (system, document, OOo)? - the way you create the sdi: -> I have no problem using the UI (I copy/pasted the names from the issue into the sdi table) -> I have no problem exporting an edited sdi as utf-8 and loading it in the index dialog. So what do you do exactly? Please also describe your system.
PC Windows XP Professional Locales: System - German (default) additional keyboard input locales: - English-US - Polish - Russian OpenOffice 2.0.3 - language: German writer document:language German (general). (But I also choose language setting only for document and tried also English and Polish without success re index by using concordance file.) For sdi-file I tried: a) created file in EM-Editor and saved as utf-8, checked BOM (also tried unchecked BOM) b) created file with GUI in Openoffice writer. Special characters in concordance file when using option inside writer: "edit file" or "new" I - typed in using Polish keyboard or - right click mouse: insert > special characters or - paste and copy like you have done Index doesn't work. One difference between case a and b above: If I create concordance file in GUI OO writer, German-umlaut is okay (but no Polish diacritics); If I use the external sdi file German-umlaut also becomes wrong characters. If I open the concordance file for editing in GUI writer, the encoding of the not correctly recognized characters differs between a and b. Case a -loaded external sdi-file, created with EmEditor, I see: Cha?asi?ski and Böhm; Case b - reopen sdi-file, created with GUI within OO writer, I see: ChaÅ‚asiÅ„sk amd Böhm What could be wrong?
ES->OS: as tested... It seems to be a Windows problem only (no problem on Linux). The Condordence dialog and/or import/export in not Unicode compliant. Though this (surprisingly) never worked (neither in OOo 1.1.5), this prevents people writing texts in non strict ASCII to use this function. -> 2.x.
I also encountered this problem on Windows ME when attempting to cut and paste Latin Extended-A and Latin Extended Additional to the concordance file dialogue. I can edit the file in Open Office, and type the extended characters OK, but the changes are not saved.
move target to 3.x according http://wiki.services.openoffice.org/wiki/Target_3x
I have this problem, very important to solve it! Please! Thank a lot Massimo
any news?
Reset assigne to the default "issues@openoffice.apache.org".
*** Issue 128023 has been marked as a duplicate of this issue. ***
Hi, Reproduced with french UI With Windows, your concordance file must be encoding in ANSI With Linux, your concordance file must be encoding in UFT-8 It's not funny if the text document is open with two different OS...
I think this should be solve when we overhaul our String Implementation. I set this Bug depends on the Overhaul bug, because both are closely related, but do not have the same goal. Maybe It makes sense to fix this before the more complex overhaul?