Issue 106163 - wrong sorting result for paragraphs in Writer (long text)
Summary: wrong sorting result for paragraphs in Writer (long text)
Status: CONFIRMED
Alias: None
Product: Writer
Classification: Application
Component: code (show other issues)
Version: OOo 3.1.1
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords: oooqa
Depends on:
Blocks:
 
Reported: 2009-10-21 21:16 UTC by hardy314
Modified: 2017-05-20 11:17 UTC (History)
3 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
long txt file to show the sorting bug in writer (dictionary lines, zipped) (864.37 KB, application/x-gzip)
2009-10-21 21:18 UTC, hardy314
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description hardy314 2009-10-21 21:16:19 UTC
I will attach a .txt document (a free dictionary wordlist) with a lot of text in
it to demonstrate the wrong sorting. It happend on WinXP (SP3) with OO 3.1.1.

This is how to reproduce the misbehavior:

- Open the attached .txt file with OO-Writer (you might need to set import
options correctly, it should display a a wordlist french - german)

- The text file is basically a dictionary, it consists of MANY paragraphs in the
following structure: 
   french word TAB german word NEW LINE
   french word TAB german word NEW LINE
   french word TAB german word NEW LINE
   ...

- Use the menu "edit" - "mark all" to mark the whole text

- use the menu "extras" - "sort"

- pick the following (useful) options: 
     key one = TRUE, column = 1, type = alphanum, order = ascending
     other keys = FALSE
     directions = lines
     separator = tab
     language = french  (or german or english, doesn't matter for the bug)
     consider capitals = NO

- now press OK to sort the whole file. What we want is many textlines that are
sorted alphabetically, according to the text before the TAB.

- result: we get a somehow sorted text file. It's difficult to see at the first
glance, what's wrong, because the file is so HUGE. But after careful inspection,
you can see that the result is totally random! :(

For example a properly sorted file should have lines starting with Z at the end
of it. Our result doesn't.

Additionally the sorting procedure results in different results if you just sort
the result of the first sorting again and again. But it does not seem to
converge to a usefully sorted file :(

I was impress at how fast OO sorts such a big file. But the wrong results soon
turned this feeling to frustration.

For me it seems to be a problem of the sorting algorithm itself, which would be
a very embarassing thing, right?
Comment 1 hardy314 2009-10-21 21:18:22 UTC
Created attachment 65522 [details]
long txt file to show the sorting bug in writer (dictionary lines, zipped)
Comment 2 Regina Henschel 2009-10-21 22:31:48 UTC
The text has more than 65536 lines. You should sort it in parts.
Comment 3 eric.savary 2009-10-21 23:08:49 UTC
Confirmed: after some (about 10 pages) the sort restarts to "a"

My "nose" tells me that Regina is somehow right (the file is too big for the
function) but I could not find an exact correlation between the
word/line/character count and the sort restart.

I tested this also in OOO320m2 and the effect is worst than in 3.1.1.

@MBA: please take over.

@hardy314: for what I could see, yes, we have a bug and we should fix it!

<support>
Now just to help you reaching your goals and apart from "we have a bug":
manipulating a HUGE amount of *pure* text doesn't sound to me as a task for word
processor...

You could get better (and faster?) results on this using dedicated tools like
those in a Unix shell (if you work on Unix) or using "Cygwin" (please google for
it! ;) ) on Windows.

The answer to your needs would be:

sort -d Babel-Franz-Deutsch_2.txt > Babel-Franz-Deutsch_3.txt

If you'd ask for coffee, I'd advice you a coffee machine. Not OOo! ;)
</support>
Comment 4 hardy314 2009-10-22 19:02:32 UTC
@es: Thank you for the <support> part. I will try your suggested alternatives.
You are totally right with your coffee machine comparison, I was just giving OO
a quick try :) since I am using it a lot otherwise.
Comment 5 Marcus 2017-05-20 11:17:54 UTC
Reset assigne to the default "issues@openoffice.apache.org".