Issue 92042

Summary: Provide Non-parametric Correlation Functions
Product: Calc Reporter: dsimcha <dsimcha>
Component: programmingAssignee: AOO issues mailing list <issues>
Status: CONFIRMED --- QA Contact:
Severity: Trivial    
Priority: P3 CC: issues, rb.henschel
Version: OOo 3.0 Beta 2   
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: ENHANCEMENT Latest Confirmation in: ---
Developer Difficulty: ---
Attachments:
Description Flags
Partial patch. none

Description dsimcha 2008-07-23 15:08:35 UTC
As someone who deals with non-normally distributed data on a daily basis, I
would find it very useful if OOo Calc provided some non-parametric correlation
functions such as Spearman Rho or Kendall Tau correlation, in addition to
Pearson correlation.  

If it is agreed that these functions should be included, I'll look into creating
them and submitting a patch.
Comment 1 Regina Henschel 2008-07-23 17:05:19 UTC
This is a valid enhancement request, so I will set it to new.

But I think, you should go another way. I have looked throw the draft of the
OpenDokumentFormula document
(http://www.oasis-open.org/committees/documents.php?wg_abbrev=office-formula). I
cannot find anything, that sounds like the mentioned correlations. Please have a
look yourself (chapter 2.1). Or do you know any spreadsheet applications that
provides these functions? If it is not yet in that document, it will took a
long, long way to get it into the standard, if at all.

That does not mean, that OOo should not provide such function, but why as core
function? A quicker and -from point of view- better way is to provide an
extension with those statistical functions. You might want to have a look at
http://wiki.services.openoffice.org/wiki/Extending_Chart_by_external_components
and especially at the "R and Calc Project"
http://wiki.services.openoffice.org/wiki/R_and_Calc
Comment 2 dsimcha 2008-07-23 17:54:31 UTC
R and Calc is definitely a good creation, but non-parametric correlation is a
relatively simple thing that should be in any basic stats library.  Furthermore,
R and Calc stuff doesn't integrate as well as builtins. 
Comment 3 dsimcha 2008-08-18 00:38:56 UTC
Here is a partial patch for Kendall correlation.  Since a column in OOo Calc can
be up to 65536 elements, reasonable performance requires a non-straightforward
implementation of the Kendall correlation algorithm.  The straightforward
implementation runs in O(N^2), this one runs in O(N log N).  As I am not
familiar with the code base as a whole, and setting up a proper build
environment looks fairly difficult, I was not able to create a complete patch
that integrates this function into the interpreter, or to test the integration
of this function into OOo as a whole, but the core functionality for Kendall
correlation is there.  I would guess that from this point, integrating this
functionality would be trivial for someone who is familiar with the code base
and has a proper build environment set up already.
Comment 4 dsimcha 2008-08-18 00:40:11 UTC
Created attachment 55815 [details]
Partial patch.