Issue 53779 - querying DISTINCT values from a dBase table is incredibly slow with larger data sets
Summary: querying DISTINCT values from a dBase table is incredibly slow with larger da...
Alias: None
Product: Base
Classification: Application
Component: code (show other issues)
Version: 680m124
Hardware: All All
: P3 Trivial with 13 votes (vote)
Target Milestone: OOo 3.0
Assignee: marc.neumann
QA Contact: issues@dba
Keywords: performance
Depends on:
Reported: 2005-08-25 11:53 UTC by Frank Schönheit
Modified: 2008-09-21 16:01 UTC (History)
1 user (show)

See Also:
Issue Type: PATCH
Latest Confirmation in: ---
Developer Difficulty: ---

document(s) to reproduce the bug case (132.39 KB, application/x-compressed)
2005-08-25 11:56 UTC, Frank Schönheit
no flags Details
Proposed patch for testing (14.58 KB, text/plain)
2008-05-01 06:18 UTC, jcottrell
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description Frank Schönheit 2005-08-25 11:53:11 UTC
- extract the attached archive to a location of your choice. It contains a
  database document (.odb) and some dBase files
- open the database document
- Edit|Database|Properties -> adjust the path to the dBase files to the location
  you just extracted the archive to
- open the query "performance_distinct", which does a
    SELECT DISTINCT * FROM performance
  where the "performance" table contains 5 columns and 500 records
=> this takes about 5:30 minutes on a 3GHz Athlon XP

This is incredibly slow, IMO, and we should investigate where this can be improved
Comment 1 Frank Schönheit 2005-08-25 11:56:41 UTC
Created attachment 29052 [details]
document(s) to reproduce the bug case
Comment 2 Frank Schönheit 2005-08-25 11:58:26 UTC
this blocks a sensible fix for issue 51151, which would require that Calc moves
to SELECT DISTINCT when collecting data for the data pilot. At the moment, this
is not possible becausse our dBase implementation (actually, our file-base
implementation) is so slow with this.
Comment 3 Frank Schönheit 2008-01-18 14:10:28 UTC
Such a serious performance issue should be fixed earlier than "OOo Later", /me
Comment 4 jcottrell 2008-05-01 06:16:42 UTC
I've been taking a look at this, and I think I've got a suitable patch that I
will attach for testing.  I've removed the old O(n^2) code for finding distinct
values and rearranged things such that we: sort on all fields, make one pass
through the rows and eliminate duplicates (relying on the full sort), then
re-sort if necessary.

This is my first OOo patch, so it's likely that I've missed something important.
I'm certainly open to criticism.
Comment 5 jcottrell 2008-05-01 06:18:00 UTC
Created attachment 53295 [details]
Proposed patch for testing
Comment 6 Frank Schönheit 2008-05-01 20:13:38 UTC
fs->jcottrell: Thanks for the patch! Changing issue type to PATCH then.

Ocke, can you please review the patch, and work with jcottrell to improve it, if
necessary? Thanks.
Comment 7 ocke.janssen 2008-05-02 12:03:18 UTC
Thank you for the patch. No it only takes some seconds to show the result :-)
If you got more patches, they are very very welcome ;-)
Comment 8 ocke.janssen 2008-05-02 12:03:45 UTC
Fixed in cws dba30c
Comment 9 jcottrell 2008-05-03 03:27:57 UTC
Great, glad I could be of some help.  And thanks for the quick feedback; that's
a nice welcome to the project.

I'll be trying to get more familiar with this codebase and handle issues where
possible, so I'm looking forward to working with you guys.
Comment 10 ocke.janssen 2008-05-14 09:54:03 UTC
Please verify. Thanks.
Comment 11 jcottrell 2008-05-15 03:23:26 UTC
Sorry, is that directed at me or QA?
Comment 12 Frank Schönheit 2008-05-17 11:32:31 UTC
Comment 13 marc.neumann 2008-05-27 14:38:22 UTC
verified in CWS dba30c

find more information about this CWS, like when it is available in the master
builds, in EIS, the Environment Information System:
Comment 14 gibi 2008-09-21 12:06:31 UTC
Verified on 3.0.0rc2 Linux X64:
(result comes in less than one second on a standard laptop)
Comment 15 Frank Schönheit 2008-09-21 16:01:59 UTC
closing then. gibi, thanks for the feedback.