Apache OpenOffice (AOO) Bugzilla – Issue 89976
Calc Performance of computations
Last modified: 2013-08-07 15:14:39 UTC
I will attach a test-spreadsheet to test the Calc-performance. See also Caolan's message on the dev-list: http://sc.openoffice.org/servlets/ReadMsg?list=dev&msgNo=3064 It took me way over one hour to generate the spreadsheet. Unfortunately, I did close it and now when opening it again to finish the work, it seems that I need another hour. I do not have the patience anymore. The spreadsheet contains in the first column 65535 random numbers. The 2nd column (B) contains = 10*A + 1. This is to move away from 0 and use slightly larger numbers. The true median on this sample is somewhere around 0.96. Then I performed a pseudo-jackknife procedure on this population and calculated the median for every smaller sample. (Actually, I took the opposite way - and the pseudo-jackknife is biased, because I dropped values at the same end.) Then I computed the percentiles for the resulting 65535 medians. BUGS: 1.) When reopening the spreadsheet, it takes a *HUGE* amount of time to recompute the spreadsheet 2.) The function to compute the median could be improved (there exist fast-alternatives) 3.) But I believe that the performance-penalty comes from somewhere else in the code. It is just a gut-feeling. [Somewhere in the formatting...] I intended to build in a variable to change the first value in the series, BUT I did NOT do it in the beginning, and now - because it takes so long to open the spreadsheet - I am not able to do it anymore. That part will be non-functional (for everyone who manages to open the spreadsheet).
Created attachment 53952 [details] It takes ages to open the spreadsheet. It took me way longer than one hour to generate it in the first place.
confirmed. This is indeed very slow.
In the meantime I am pretty convinced that it is not very likely to open and work with the initially attached spreadsheet. IF Calc uses a naive algorithm to update downstream cells that have changed, than the worst-case scenario involves sorting a number of 6.6 million times a column of 65,536 elements. This will take a huge amount of time. Therefore, I will attach a 2nd spreadsheet, with the C-column unfilled (except for the first 3 cells). Everyone can now fill as many cells in the C-column as he wishes. Filling the first 1,000 is likely to happen in real-time, filling the first 10,000 will take some time on slower computers (like mine), filling 30,000 cells will likely take quite long even on a fast PC, and filling ALL 65,535 cells will take some ages. ;-) [I anticipate that the operation behaves like O(n^2) or worse, so for every additional cell in column C, the time increases quite significantly.] I added further comments in the spreadsheet. Please read them carefully. Especially interesting is the fact that Calc will display at some point in time: "Adapt row height" and this takes quite some time, even ore than probably the sort proper.
Created attachment 53999 [details] Detailed Test Case: after opening (can be opened), fill column C to test the performance.
Grabbing issue.
The comments made in the second attached document are not correct. While it is true that the values are sorted and not ranked in-place and the result maybe could be cached, the assumption that recalculation starts when the first cell is inserted and then restarts over and over again for every cell is plain wrong. Cells are notified before calculation after all cells are inserted. I'm profiling to see where the bottle neck is.
In cws calcperf03: revision 266105 sc/source/core/inc/interpre.hxx sc/source/core/tool/interpr3.cxx The functions MEDIAN, PERCENTILE, QUARTILE, LARGE, SMALL now use ::std::nth_element() instead of a fully sorted array. Filling entire column C as described in the second attachment on my machine needed around 12 minutes, which I think isn't that bad, given that Excel2007 needed 15 minutes to accomplish the task :-)
Reassigning to QA for verification.
verified in internal build cws_calcperf03
Issue 89976 has been fixed in master version OOo-dev 3.1 .0 (OOO310m9 Build:9396) for Windows XP and can be closed.
Hello Mr. Vladimir Hitekschool: I have checked this issue in Go-oo 3.1 with czech langpack (OpenOffice.org 3.1.0 OOO310m11 build 9399 build 2009-05-27). Although this version is newer than described "OOO310m9 Build:9396", the problem with very slow opening of test file (http://www.openoffice.org/nonav/issues/showattachment.cgi/53952/test-huge%20calculations-Median.ods) is here again... Thank you for your answer.
This issue is closed automatically and wasn't rechecked in a current version of OOo. The fixed issue should be integrated in OOo since more than half a year. If you think this issue isn't fixed in a current version (OOo 3.1), please reopen it and change the field 'Target Milestone' accordingly. If you want to download a current version of OOo => http://download.openoffice.org/index.html If you want to know more about the handling of fixed/verified issues => http://wiki.services.openoffice.org/wiki/Handle_fixed_verified_issues
Sorry this issue was wrongly closed. This issue will be reopened automatically. And will be set after that back to fixed/verified.
Set to state 'fixed'.
Set back to state 'verified/fixed'. Again. Sorry for the mass of mails.
This issue is closed automatically. It should be fixed in a version with is available for longer than half a year (OOo 3.1). If you think this issue isn't fixed in the current version (OOo 3.2) please reopen it. But then please pay attention about the field 'target milestone'. The closure was approved by the Release Status Meeting at 22nd of February 2010 and it is based on the issue handling guideline for fixed/verified issues : http://wiki.services.openoffice.org/wiki/Handle_fixed_verified_issues
thorstenziehm: Hmmm....I would like to reopen this bug, but I can't (I haven't privileges for it)...