Apache OpenOffice (AOO) Bugzilla – Issue 66589

R Statistical Software

Last modified: 2013-08-07 15:12:27 UTC

This feature request will deal with the OOo-Calc program, more specifically with its statistical functions. I have sometimes the feeling that OOo tries to reinvent the wheel. Sometimes we may need a square wheel, but more often the time-proven round wheel is the better choice. How does my remark apply to Calc? While the statistical functions offered by Calc are reasonable, they cannot compete with dedicated statistical programmes. (I will post a separate, similar request for gawk support.) Fortunately, there is an open source state-of-the-art statistical program called R (http://www.R-project.org). Therefore I think it is reasonable to integrate R with Calc, instead of writing new statistical functions. My 2 wishes are: 1. the OOo help file should mention the existence of R (and provide the web address for download) 2. Calc should be integrated tightly with R: I'll detail this in the next paragraph. When I perform some serious statistical analysis, I have to export first a file from Calc and then import it into R. This is time-consuming. I would like to open a pipeline to R and perform the calculations within Calc, bypassing the need for an intermediate file. (This should be possible, because R is open source, too.) I will briefly mention some advantages of R: - over 100 additional packages available e.g. Fisher exact-test, bootstrap procedure and other non-parametric tests - linear regression models (including glm, generalized linear regression), as well as non-linear models - it is actually a statistical programming language, therefore one can write his own statistic - complex statistical tests: ROC-curves, Cox-proportional hazards; - availability for both UNIX and MS Windows systems

Hi, I can't speak for the entire OO.o dev team, but I can tell you this. R is released under GPL, and we are LGPL with a copyright assignment requirement. What this means is that we cannot integrate R into our build officially, because their license does not allow it. Open-source doesn't necessarily mean that we can all share code. There are various (IMO way too many) open-source licenses out there, and many of them are not compatible with each other. So, in this case, we can not do a full integraiton with R, but perhaps we could write something that would optionally call R by fork and exec if available. Just my 2 cents. Kohei

[QUOTE] we can not do a full integraiton with R [/quote] There is a small misunderstanding, I did NOT mean include the code of R, instead: 1. mention in the help the download site 2. implement in Calc a mechanism to pipeline the data into R (if necessary contact the R-team - I could contact the R-team myself - to implement necessary changes to support such a pipelining, but most UNIX aplications actually have mechanisms in place for such a feature and I wonder if R does not have already such a mechanism) Many thanks, Leonard Mada

If R consists of a shared library that OO.o can dlopen to reference its internal functions, then R integration can be done somewhat painlessly. If not, then we'd need to do "fork" & "exec" and do text-pipelining to R executable and parse the output. It's a bit of a pain parsing the output, but it's doable. I would see it implemented as an external UNO component with its own separate dialog. Calling R for built-in cell functions is probably not desirable because of the large overhead involved (imagine if statistical functions in 1000 cells need to be re-calculated all at the same time!), but calling R from a separate tool dialog should be fine, I think. Kohei

@discoleo: we have this wiki page: http://wiki.services.openoffice.org/wiki/Statistical_Data_Analysis_Tool to outline what we need in a statistical analysis package & how it could be implemented. If would be nice to get your ideas on this page. Thanks, Kohei

one for requirements