Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing
|Summary:||concordance files + multilungal entries; utf-8 don't work|
|Component:||editing||Assignee:||AOO issues mailing list <issues>|
|Status:||CONFIRMED ---||QA Contact:|
|Priority:||P3||CC:||gerry, issues, jeffooo, petko|
|Issue Type:||DEFECT||Latest Confirmation in:||---|
|Issue Depends on:||128019|
Description uko_571 2006-07-21 10:58:36 UTC
For example: Among the words I need for alphabetical index are few in a different language than my document, e.g. in Polish where diacritic marks are relevant (Michał Załęcki), but also in Russian etc. That's no problem in Writer and the document itself. But if I try to create the concordance file manually and save the file in unicode encoding (utf-8) words like the example above aren't indexed. And if I open the file in 'Edit concordance file' screen special characters are changed and became faulty. Also if I try to create a concordance file only within the 'New concordance file' screen and type them there - a) typed in with keybord or b) choosing character from charset map and insert via double click - I only see correct characters until I click the ok button. All what I tried ended up in the same dilemma. Terms I wanted to include in the index aren't indexed. To me it seems that the feature "using a concordance file" cannot handle utf-8 in general or don't handle utf-8 properly for all characters.
Comment 2 uko_571 2006-07-21 11:03:37 UTC
Created attachment 37943 [details] concordance file sdi utf-8
Comment 3 michael.ruess 2006-07-21 11:29:59 UTC
Reassigned to ES.
Comment 4 grsingleton 2006-07-21 13:09:56 UTC
added myself to cc list
Comment 5 eric.savary 2006-08-28 14:36:10 UTC
I cannot reproduce ther problem. Differenciate: a) entries in the document b) entries in the sdi file I see that aome entries from a) are broken. But all entries of b) are ok. So something wrong happened while generating (searching and marking the entries) the index. To cure the problem: - delete all entries in the doc - reapply the sdi. Any comments?
Comment 6 grsingleton 2006-08-28 14:47:41 UTC
For the record, your experience is similar to mine. We use a concordence file to create the index for the User Guide. It is not complete because we lost our indexer but no problems, such as described in this issue, have shown up. Very strange but worthwhile monitoring progress.
Comment 7 uko_571 2006-08-29 12:22:27 UTC
ES wrote: To cure the problem: - delete all entries in the doc - reapply the sdi. This does not solve the problem of establishing concordance between text in open office with diacritic characters and concordance file. Have ever tried to create a concordance file including diacritic characters (it seems open office does not support for example latin-A extended for creating alphabetical index by using concordance file). Please create a writer document only with two entries: Böhm and Chałasiński to see whether it works - it would be helpful!
Comment 8 eric.savary 2006-08-29 12:44:29 UTC
That's what I already did first using your file (and I still have no problem creating a new doc, new concordance file with thos 2 names). We need to find out where the problem is: - your locales (system, document, OOo)? - the way you create the sdi: -> I have no problem using the UI (I copy/pasted the names from the issue into the sdi table) -> I have no problem exporting an edited sdi as utf-8 and loading it in the index dialog. So what do you do exactly? Please also describe your system.
Comment 9 uko_571 2006-08-29 15:13:00 UTC
PC Windows XP Professional Locales: System - German (default) additional keyboard input locales: - English-US - Polish - Russian OpenOffice 2.0.3 - language: German writer document:language German (general). (But I also choose language setting only for document and tried also English and Polish without success re index by using concordance file.) For sdi-file I tried: a) created file in EM-Editor and saved as utf-8, checked BOM (also tried unchecked BOM) b) created file with GUI in Openoffice writer. Special characters in concordance file when using option inside writer: "edit file" or "new" I - typed in using Polish keyboard or - right click mouse: insert > special characters or - paste and copy like you have done Index doesn't work. One difference between case a and b above: If I create concordance file in GUI OO writer, German-umlaut is okay (but no Polish diacritics); If I use the external sdi file German-umlaut also becomes wrong characters. If I open the concordance file for editing in GUI writer, the encoding of the not correctly recognized characters differs between a and b. Case a -loaded external sdi-file, created with EmEditor, I see: Cha?asi?ski and Böhm; Case b - reopen sdi-file, created with GUI within OO writer, I see: ChaÅ‚asiÅ„sk amd BÃ¶hm What could be wrong?
Comment 10 eric.savary 2006-10-12 12:49:15 UTC
ES->OS: as tested... It seems to be a Windows problem only (no problem on Linux). The Condordence dialog and/or import/export in not Unicode compliant. Though this (surprisingly) never worked (neither in OOo 1.1.5), this prevents people writing texts in non strict ASCII to use this function. -> 2.x.
Comment 11 pesala 2006-10-22 22:18:41 UTC
I also encountered this problem on Windows ME when attempting to cut and paste Latin Extended-A and Latin Extended Additional to the concordance file dialogue. I can edit the file in Open Office, and type the extended characters OK, but the changes are not saved.
Comment 12 Martin Hollmichel 2007-09-10 13:36:13 UTC
move target to 3.x according http://wiki.services.openoffice.org/wiki/Target_3x
Comment 13 soundspaces 2010-12-04 14:32:47 UTC
I have this problem, very important to solve it! Please! Thank a lot Massimo
Comment 14 soundspaces 2011-01-03 16:17:45 UTC
Comment 15 Marcus 2017-05-20 11:15:41 UTC
Reset assigne to the default "email@example.com".
Comment 16 oooforum (fr) 2019-01-29 10:46:16 UTC
*** Issue 128023 has been marked as a duplicate of this issue. ***
Comment 17 jeffooo 2019-01-29 16:19:26 UTC
Hi, Reproduced with french UI With Windows, your concordance file must be encoding in ANSI With Linux, your concordance file must be encoding in UFT-8 It's not funny if the text document is open with two different OS...
Comment 18 Peter 2019-02-01 06:05:12 UTC
I think this should be solve when we overhaul our String Implementation. I set this Bug depends on the Overhaul bug, because both are closely related, but do not have the same goal. Maybe It makes sense to fix this before the more complex overhaul?