Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing
|Summary:||wrong sorting result for paragraphs in Writer (long text)|
|Component:||code||Assignee:||AOO issues mailing list <issues>|
|Status:||CONFIRMED ---||QA Contact:|
|Priority:||P3||CC:||eric.savary, issues, rb.henschel|
|Issue Type:||DEFECT||Latest Confirmation in:||---|
Description hardy314 2009-10-21 21:16:19 UTC
I will attach a .txt document (a free dictionary wordlist) with a lot of text in it to demonstrate the wrong sorting. It happend on WinXP (SP3) with OO 3.1.1. This is how to reproduce the misbehavior: - Open the attached .txt file with OO-Writer (you might need to set import options correctly, it should display a a wordlist french - german) - The text file is basically a dictionary, it consists of MANY paragraphs in the following structure: french word TAB german word NEW LINE french word TAB german word NEW LINE french word TAB german word NEW LINE ... - Use the menu "edit" - "mark all" to mark the whole text - use the menu "extras" - "sort" - pick the following (useful) options: key one = TRUE, column = 1, type = alphanum, order = ascending other keys = FALSE directions = lines separator = tab language = french (or german or english, doesn't matter for the bug) consider capitals = NO - now press OK to sort the whole file. What we want is many textlines that are sorted alphabetically, according to the text before the TAB. - result: we get a somehow sorted text file. It's difficult to see at the first glance, what's wrong, because the file is so HUGE. But after careful inspection, you can see that the result is totally random! :( For example a properly sorted file should have lines starting with Z at the end of it. Our result doesn't. Additionally the sorting procedure results in different results if you just sort the result of the first sorting again and again. But it does not seem to converge to a usefully sorted file :( I was impress at how fast OO sorts such a big file. But the wrong results soon turned this feeling to frustration. For me it seems to be a problem of the sorting algorithm itself, which would be a very embarassing thing, right?
Comment 1 hardy314 2009-10-21 21:18:22 UTC
Created attachment 65522 [details] long txt file to show the sorting bug in writer (dictionary lines, zipped)
Comment 2 Regina Henschel 2009-10-21 22:31:48 UTC
The text has more than 65536 lines. You should sort it in parts.
Comment 3 eric.savary 2009-10-21 23:08:49 UTC
Confirmed: after some (about 10 pages) the sort restarts to "a" My "nose" tells me that Regina is somehow right (the file is too big for the function) but I could not find an exact correlation between the word/line/character count and the sort restart. I tested this also in OOO320m2 and the effect is worst than in 3.1.1. @MBA: please take over. @hardy314: for what I could see, yes, we have a bug and we should fix it! <support> Now just to help you reaching your goals and apart from "we have a bug": manipulating a HUGE amount of *pure* text doesn't sound to me as a task for word processor... You could get better (and faster?) results on this using dedicated tools like those in a Unix shell (if you work on Unix) or using "Cygwin" (please google for it! ;) ) on Windows. The answer to your needs would be: sort -d Babel-Franz-Deutsch_2.txt > Babel-Franz-Deutsch_3.txt If you'd ask for coffee, I'd advice you a coffee machine. Not OOo! ;) </support>
Comment 4 hardy314 2009-10-22 19:02:32 UTC
@es: Thank you for the <support> part. I will try your suggested alternatives. You are totally right with your coffee machine comparison, I was just giving OO a quick try :) since I am using it a lot otherwise.