Issue 89976 - Calc Performance of computations
Summary: Calc Performance of computations
Status: CLOSED FIXED
Alias: None
Product: Calc
Classification: Application
Component: code (show other issues)
Version: OOo 2.4.0
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: oc
QA Contact: issues@sc
URL:
Keywords: oooqa, performance
Depends on:
Blocks:
 
Reported: 2008-05-26 18:37 UTC by discoleo
Modified: 2013-08-07 15:14 UTC (History)
3 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
It takes ages to open the spreadsheet. It took me way longer than one hour to generate it in the first place. (2.05 MB, application/vnd.oasis.opendocument.spreadsheet)
2008-05-26 18:40 UTC, discoleo
no flags Details
Detailed Test Case: after opening (can be opened), fill column C to test the performance. (1.33 MB, application/vnd.oasis.opendocument.spreadsheet)
2008-05-27 21:46 UTC, discoleo
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description discoleo 2008-05-26 18:37:05 UTC
I will attach a test-spreadsheet to test the Calc-performance. See also Caolan's
message on the dev-list:
http://sc.openoffice.org/servlets/ReadMsg?list=dev&msgNo=3064

It took me way over one hour to generate the spreadsheet. Unfortunately, I did
close it and now when opening it again to finish the work, it seems that I need
another hour. I do not have the patience anymore.

The spreadsheet contains in the first column 65535 random numbers. The 2nd
column (B) contains = 10*A + 1. This is to move away from 0 and use slightly
larger numbers. The true median on this sample is somewhere around 0.96.

Then I performed a pseudo-jackknife procedure on this population and calculated
the median for every smaller sample. (Actually, I took the opposite way - and
the pseudo-jackknife is biased, because I dropped values at the same end.)

Then I computed the percentiles for the resulting 65535 medians.

BUGS:

1.) When reopening the spreadsheet, it takes a *HUGE* amount of time to
recompute the spreadsheet

2.) The function to compute the median could be improved (there exist
fast-alternatives)

3.) But I believe that the performance-penalty comes from somewhere else in the
code. It is just a gut-feeling. [Somewhere in the formatting...]

I intended to build in a variable to change the first value in the series, BUT I
did NOT do it in the beginning, and now - because it takes so long to open the
spreadsheet - I am not able to do it anymore. That part will be non-functional
(for everyone who manages to open the spreadsheet).
Comment 1 discoleo 2008-05-26 18:40:02 UTC
Created attachment 53952 [details]
It takes ages to open the spreadsheet. It took me way longer than one hour to generate it in the first place.
Comment 2 kyoshida 2008-05-27 14:44:32 UTC
confirmed.  This is indeed very slow.
Comment 3 discoleo 2008-05-27 21:42:58 UTC
In the meantime I am pretty convinced that it is not very likely to open and
work with the initially attached spreadsheet. IF Calc uses a naive algorithm to
update downstream cells that have changed, than the worst-case scenario involves
sorting a number of 6.6 million times a column of 65,536 elements. This will
take a huge amount of time.

Therefore, I will attach a 2nd spreadsheet, with the C-column unfilled (except
for the first 3 cells). Everyone can now fill as many cells in the C-column as
he wishes. Filling the first 1,000 is likely to happen in real-time, filling the
first 10,000 will take some time on slower computers (like mine), filling 30,000
cells will likely take quite long even on a fast PC, and filling ALL 65,535
cells will take some ages. ;-)

[I anticipate that the operation behaves like O(n^2) or worse, so for every
additional cell in column C, the time increases quite significantly.]

I added further comments in the spreadsheet. Please read them carefully.
Especially interesting is the fact that Calc will display at some point in time:
"Adapt row height" and this takes quite some time, even ore than probably the
sort proper.
Comment 4 discoleo 2008-05-27 21:46:21 UTC
Created attachment 53999 [details]
Detailed Test Case: after opening (can be opened), fill column C to test the performance.
Comment 5 ooo 2008-09-04 16:09:08 UTC
Grabbing issue.
Comment 6 ooo 2008-09-19 13:38:14 UTC
The comments made in the second attached document are not correct. While it is
true that the values are sorted and not ranked in-place and the result maybe
could be cached, the assumption that recalculation starts when the first cell
is inserted and then restarts over and over again for every cell is plain
wrong. Cells are notified before calculation after all cells are inserted. I'm
profiling to see where the bottle neck is.
Comment 7 ooo 2009-01-09 18:11:10 UTC
In cws calcperf03:

revision 266105
sc/source/core/inc/interpre.hxx
sc/source/core/tool/interpr3.cxx

The functions MEDIAN, PERCENTILE, QUARTILE, LARGE, SMALL now use
::std::nth_element() instead of a fully sorted array.

Filling entire column C as described in the second attachment on my machine
needed around 12 minutes, which I think isn't that bad, given that Excel2007
needed 15 minutes to accomplish the task :-)
Comment 8 ooo 2009-01-25 21:54:59 UTC
Reassigning to QA for verification.
Comment 9 oc 2009-02-06 14:07:56 UTC
verified in internal build cws_calcperf03
Comment 10 vladimir_hitekschool 2009-04-19 19:58:14 UTC
Issue 89976 has been fixed in master version OOo-dev 3.1 .0 (OOO310m9 
Build:9396) for Windows XP and can be closed.	
Comment 11 jirkaz 2009-06-21 01:11:58 UTC
Hello Mr. Vladimir Hitekschool:

I have checked this issue in Go-oo 3.1 with czech langpack
(OpenOffice.org 3.1.0
OOO310m11
build 9399
build 2009-05-27).
Although this version is newer than described "OOO310m9 Build:9396", the problem
with very slow opening of test file
(http://www.openoffice.org/nonav/issues/showattachment.cgi/53952/test-huge%20calculations-Median.ods)
is here again...

Thank you for your answer.
Comment 12 thorsten.ziehm 2009-07-20 14:55:50 UTC
This issue is closed automatically and wasn't rechecked in a current version of
OOo. The fixed issue should be integrated in OOo since more than half a year. If
you think this issue isn't fixed in a current version (OOo 3.1), please reopen
it and change the field 'Target Milestone' accordingly.

If you want to download a current version of OOo =>
http://download.openoffice.org/index.html
If you want to know more about the handling of fixed/verified issues =>
http://wiki.services.openoffice.org/wiki/Handle_fixed_verified_issues
Comment 13 thorsten.ziehm 2009-07-20 15:36:14 UTC
Sorry this issue was wrongly closed. This issue will be reopened automatically.
And will be set after that back to fixed/verified.
Comment 14 thorsten.ziehm 2009-07-20 15:40:41 UTC
Set to state 'fixed'.
Comment 15 thorsten.ziehm 2009-07-20 15:44:39 UTC
Set back to state 'verified/fixed'.

Again. Sorry for the mass of mails.
Comment 16 thorsten.ziehm 2010-02-22 14:48:57 UTC
This issue is closed automatically. It should be fixed in a version with is
available for longer than half a year (OOo 3.1). If you think this issue isn't
fixed in the current version (OOo 3.2) please reopen it. But then please pay
attention about the field 'target milestone'.
The closure was approved by the Release Status Meeting at 22nd of February 2010
and it is based on the issue handling guideline for fixed/verified issues  :
http://wiki.services.openoffice.org/wiki/Handle_fixed_verified_issues
Comment 17 jirkaz 2010-03-22 22:37:14 UTC
thorstenziehm:

Hmmm....I would like to reopen this bug, but I can't (I haven't privileges for
it)...