Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing 
Summary:  statistics functions need review  

Product:  Calc  Reporter:  philmperry <philperry>  
Component:  code  Assignee:  AOO issues mailing list <issues>  
Status:  ACCEPTED   QA Contact:  
Severity:  Trivial  
Priority:  P3  CC:  994phij, issues, khirano, pagalmes.lists, rb.henschel  
Version:  OOo 1.1 RC  Keywords:  needhelp  
Target Milestone:    
Hardware:  All  
OS:  All  
Issue Type:  TASK  Latest Confirmation in:    
Developer Difficulty:    
Issue Depends on:  27597, 32833, 26022, 26836, 47296, 69069, 74704, 78250, 86294, 89691, 90703, 90759, 101316  
Issue Blocks:  
Attachments: 

Description
philmperry
20030827 03:18:39 UTC
You opened a task, so it's your turn. Frank OK, Frank, so what do I do now? What I'm trying to do is to bring the issue of numerical accuracy and stability to the attention of those who work on Calc functions. Perhaps I should change it from a task to something else? I took a guess that a "task" would be the right category. It's been almost 20 years (gaah!) since I took a numerical analysis course, so to say that I'm a bit rusty on this would be an understatement! I'm willing to help look for test suites, reports, books, and articles on this sort of thing, but I'm not really qualified to meddle with the code. I just want to make sure that someone is listening and putting to good use whatever information I find, rather than talking to myself (I do too much of that already!). Hi Eike, can you have a look at this ? I think it's your construction site so you can give him better feedback than I. Frank First of all, the history of OpenOffice.org is that it arose from StarOffice that wasn't developed by volunteers at all but by contracted employees instead. Even today, about 90% of developers are not volunteers but still paid by Sun. Most Calc statistical functions were programmed by a "doctor of mathematics", which of course doesn't imply that there are no errors or that they are failsafe and have no significant roundoff errors in specific situations. A good example is the standard deviation of 90000001, 90000002, 90000003 mentioned in http://www.agresearch.cri.nz/Science/Statistics/exceluse1.htm where we seem to use the same "naive" algorithm as the competitive product does, I guess. Thanks for those URLs anyways. I would happily accept patches of better algorithms if someone would take the burdon and review current implementations. For Calc's rehabilitation I must say that I don't think we're doing worse than any other spreadsheet application ;) However, AFAIK it is common knowledge amongst statisticians that you just can't use a spreadsheet for doing real analysis. Source code is in sc/source/core/tool/interpr*.cxx, most statistical functions in interpr3.cxx and interpr5.cxx, if anyone wants to start on it? I'm just a user, but also an actuary. I would be glad to help with a review of some of the financial calculations (I have written code in BASIC  several versions  and, years ago, FORTRAN). Actuarial emphasis was on difference equations which translates into computational formulas much easier than analytical statistical functions (for example, the complex Tschebischev functions are replaceable, with no loss of accuracy, by the much simpler Jenkens fifthdifference equations). If you want my input, please let me know which functions you would want me to review. Note that I recognize that I am not perfect, that looking at something too long makes for overlooking mistakes and I would want someone else to review anything I posted. Robert, Thanks for your offer. As I already wrote, sc/source/core/tool/interpr[35].cxx contain most of the statistical funtions, the method names (ScInterpreter::Sc...) are mostly "selfspeaking", unfortunately not all of them and comments are rare. Some basic functions like computing the variance are in interpr1.cxx. Of course an obvious glitch like the standard deviation of 90000001, 90000002, 90000003 would be nice to have fixed by using a better algorithm. The standard deviation is computed in interpr1.cxx method ScInterpreter::ScStDev(), which in turn makes use of ScInterpreter::GetStVarParams(). Additionally, the result is affected by ::rtl::math::approxSub(). Those approx...() functions are used to eliminate roundoff errors of finite precision representation of infinite binary digits periods. For example, 0.1+0.20.3 normally doesn't result in exactly 0.0 if not computed using such approx functions. The approx functions may be found in sal/inc/rtl/math.hxx Adjusting target milestone Having the "not determined" milestone target was on purpose here, since it isn't clear when and by whom all the work will be done. Definitely not a "I will do this for OOo2.0" task. Added the needhelp keyword, and reset target to "not determined". *** Issue 27002 has been marked as a duplicate of this issue. *** Set PleaseHelp target. Note: Standard deviation and related functions are fixed with issue 22811. CORREL, COVAR, PEARSON, RSQ, STEYX, SLOPE, INTERCEPT, FORECAST are fixed with issue 78250. *** Issue 79944 has been marked as a duplicate of this issue. *** Created attachment 52850 [details]
patches for gammadist, chidist and inverse dist, details in the readme inside the zip
add me to CC Note to myself: content of patches.zip attached is obsoleted by further work received by mail. Taken over by issue 90703. GAMMADIST, CHIDIST, GAMMAINV, CHIINV fixed with issue 90703. Much improvement for BETAINV, FINV, TINV as well. Reset assigne to the default "issues@openoffice.apache.org". 