Issue 66589 - R Statistical Software
Summary: R Statistical Software
Alias: None
Product: Calc
Classification: Application
Component: code (show other issues)
Version: OOo 2.0.2
Hardware: All All
: P3 Trivial with 4 votes (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
Depends on:
Reported: 2006-06-20 13:12 UTC by discoleo
Modified: 2013-08-07 15:12 UTC (History)
2 users (show)

See Also:
Issue Type: FEATURE
Latest Confirmation in: ---
Developer Difficulty: ---


Note You need to log in before you can comment on or make changes to this issue.
Description discoleo 2006-06-20 13:12:16 UTC
This feature request will deal with the OOo-Calc program, more specifically with
its statistical functions.

I have sometimes the feeling that OOo tries to reinvent the wheel. Sometimes we
may need a square wheel, but more often the time-proven round wheel is the
better choice.

How does my remark apply to Calc? While the statistical functions offered by
Calc are reasonable, they cannot compete with dedicated statistical programmes.
(I will post a separate, similar request for gawk support.)

Fortunately, there is an open source state-of-the-art statistical program called
R ( Therefore I think it is reasonable to integrate R
with Calc, instead of writing new statistical functions.

My 2 wishes are:
1. the OOo help file should mention the existence of R (and provide the web
address for download)
2. Calc should be integrated tightly with R: I'll detail this in the next paragraph.

When I perform some serious statistical analysis, I have to export first a file
from Calc and then import it into R. This is time-consuming. I would like to
open a pipeline to R and perform the calculations within Calc, bypassing the
need for an intermediate file. (This should be possible, because R is open
source, too.)

I will briefly mention some advantages of R:
 - over 100 additional packages available
   e.g. Fisher exact-test, bootstrap procedure and other non-parametric tests
 - linear regression models (including glm, generalized linear regression), as
well as non-linear models
 - it is actually a statistical programming language, therefore one can write
his own statistic
 - complex statistical tests: ROC-curves, Cox-proportional hazards;
 - availability for both UNIX and MS Windows systems
Comment 1 kyoshida 2006-06-21 23:07:00 UTC

I can't speak for the entire OO.o dev team, but I can tell you this.  R is
released under GPL, and we are LGPL with a copyright assignment requirement. 
What this means is that we cannot integrate R into our build officially, because
their license does not allow it.

Open-source doesn't necessarily mean that we can all share code.  There are
various (IMO way too many) open-source licenses out there, and many of them are
not compatible with each other.  So, in this case, we can not do a full
integraiton with R, but perhaps we could write something that would optionally
call R by fork and exec if available.

Just my 2 cents.

Comment 2 discoleo 2006-06-22 10:17:30 UTC
[QUOTE] we can not do a full integraiton with R [/quote]

There is a small misunderstanding, I did NOT mean include the code of R, instead:

1. mention in the help the download site
2. implement in Calc a mechanism to pipeline the data into R (if necessary
contact the R-team - I could contact the R-team myself - to implement necessary
changes to support such a pipelining, but most UNIX aplications actually have
mechanisms in place for such a feature and I wonder if R does not have already
such a mechanism)

Many thanks,

Leonard Mada
Comment 3 kyoshida 2006-06-24 00:29:08 UTC
If R consists of a shared library that OO.o can dlopen to reference its internal
functions, then R integration can be done somewhat painlessly.  If not, then
we'd need to do "fork" & "exec" and do text-pipelining to R executable and parse
the output.  It's a bit of a pain parsing the output, but it's doable.

I would see it implemented as an external UNO component with its own separate
dialog.  Calling R for built-in cell functions is probably not desirable because
of the large overhead involved (imagine if statistical functions in 1000 cells
need to be re-calculated all at the same time!), but calling R from a separate
tool dialog should be fine, I think.

Comment 4 kyoshida 2006-06-24 01:28:19 UTC
@discoleo: we have this wiki page:

to outline what we need in a statistical analysis package & how it could be
implemented.  If would be nice to get your ideas on this page.

Comment 5 frank 2006-06-30 10:45:43 UTC
one for requirements