Apache OpenOffice (AOO) Bugzilla – Issue 20819

add polynomial regression type

Last modified: 2017-11-30 22:15:53 UTC

I sat down to do my Physics homework, and was disapointed to find that I had to restart the dreaded excel in order to use a second degree polynomial trendline. I would like you to add the feature enabling the use of x degree polynomial trendlines.

duplicate *** This issue has been marked as a duplicate of 5289 ***

closed

reopen this one because the duplicate is a much to huge container bug - this one can better be handled because it concentrates on a single feature.

->bm: Please take care on this feature.

*** Issue 21592 has been marked as a duplicate of this issue. ***

Set platform and OS to "All" as this applies to all.

*** Issue 77930 has been marked as a duplicate of this issue. ***

Hello, I don't if it could help, but I would like to indicate the (old) contribution of Laurent Godard, which can be found here : http://oooconv.free.fr/fitoo/download/CorelPoly.sxc On opposite to FitOO which deals with non linear regression through an optimization method, CorelPoly find the polynomial coefficients, with positive AND negative power terms. I suggest to begin with only second order polynomial equation : Y = a + b.X + c.X² which should be the more needed. Please note that Issue 77930 gives clear details about this feature. Best regards Laurent BP

The macro with CorelPoly has bad correlation coefficient. Please find enclosed another version which try to correct the problem. I did also my best to translate it in english.

Created attachment 48443 [details] New version of CorelPoly which correct correlation coefficient

This is absolutely a basic feature, i don't understand why there are more advanced regression types such as logaritmepic or poware when a polynomial regression is lacking. Besides, the rest of the regression are very limited. There are no options to manually adjust regressions. This features must be added in the next realeases, it is basic.

Hello, @hugovazquez : For me it's must have feature, as well for engineering. But saying that would never help programmers. At least you should vote for this issue. I think this issue should have more than 9 votes. To help going a little forward: this feature requires to modify dialog box of Regression Curve and Statistics, as for "moving average" regression type (issue 66819). These two issues may be resolved simultaneously. The minimum to do (to be compatible with MS-Excel), is to have the order of the polynom (default should be 2) as explained in issue 77930. To go farther, minimum (default 0) and maximum orders should be given, as in the macro of Laurent Godard (see attachment). Exponential terms are useless for me, but may be Laurent Godard has examples where it is necessary. All these values require to be added in the dialog boxes. One problem which may occur during calculation of coefficients, is a non inversible matrix. I don't know which conditions are necessary to avoid this problem. Another problem in case of negative orders, will be with null values of x. Hope it could help resolving this feature... for OOo 3.0? Laurent BP

I try to understand why this type of regression was not developped previously. I noticed that all other types only need two variables to be calculated. Polynomial regression requires (n+1) variables. Is it the stopper point of this issue ?

I had thought that OOo 2.4 would finally fix this, and so was a bit disappointed (did I miss something?). For people who use spreadsheets for 'everyday' quantitative correlations, the lack of polynomial regression equations in OOo Chart is an obvious deficit.

*** Issue 88559 has been marked as a duplicate of this issue. ***

Add me to CC

*** Issue 90758 has been marked as a duplicate of this issue. ***

change owner

*** Issue 91187 has been marked as a duplicate of this issue. ***

People interested in this "defect" or "feature" should put a vote... or 2 so that the developers realize how fundamental this is to spreadsheets. I'm now convinced that Sun should send their programmers off for a week to work with users so that they can see for themselves how spreadsheets are used. Hopefully, this will motivate them to work on things that are fundamentally missing.

@enmane, please refrain from bashing. This is not the only issue that is most important to users. It is just a matter of fact that I cannot solve them all at the same time. If you have developing skills, you might want to help? Also a specification or a file format change request would be helpful. And yes, please use your votes! It makes arguing easier.

iha, I'm sorry - that _really_ wasn't meant as bashing at all. I just fundamentally believe that many _core_ features that users need are missing and I have to ask myself why that might be. I'm aware that SUN has _many_ developers working on OO but yet the core features are missing. This can only mean they are working on things that the "users" don't value as much. In product development, it is imperative that the design team do market studies and test group studies to find out what is important to the customers. I'm just suggesting that this might be missing - it was _not_ intended as a bash but more of a recommendation that SUN do some serious shadowing of their users to see how they use a spreadsheet. 1) Speed is still an issues 2) Humans can interpret information through graphics quickly. This is the reason Calc _needs_ a good graphics package. 3) Data is just data. Scientists, engineers, managers, and most others want to learn from that data which means applying statistics and trendlines to that data. This is a core feature, no exceptions. I hope to graduate in about 6-8 weeks. Let me know if you are serious about your willingness to work with me on this. I can help with the curve fitting theories but probably not the actual implementation of the code. I've offered my help before on trendlines and it wasn't taken. I'm more of a engineer/mathematician than a programmer although I do quite a bit of programming for my research - I don't know how the OO is put together. If the theory is missing or unknown, then that is where I can help. Again, I'm tied up for about 6-8 weeks but can point a programmer in the right direction for learning about curve fitting to minimize error...

@enmane, thanks for clarification. An algorithm would be a first step. Input would be a sequence of x and y double values and the requested order (if the user wants to decide that, maybe in addition a minimum exponent?). Output would be a sequence of coefficients and exponents somehow. Some of the input values might be NAN or not finite, that must be taken into account also. The algorithm must be independent from spreadsheet access as it must work also for charts from writer tables or charts with own data for example in impress. Maybe you can have a look at the attached proposal and prepare a c++ algorithm that is independent from spreadsheet API? Thanks! And good luck for your examination.

Hello, In order to not invent again the wheel, there is a macro in the attachment of this issue. It is written in OOo Basic, and algorithm is quite straightforward.

Please could this be brought forward? The maths involved in implementing this feature is really not that hard, and it's a very useful feature provided by Excel that many scientific users will immediately notice.

Please could this be brought forward? The maths involved in implementing this feature is really not that hard, and it's a very useful feature provided by Excel that many scientific users will immediately notice. Also, I'd add that this feature needs to permit the use the select the order (maximum power of 'x') that they want to have in their polynomial regression. Implementing just quadratic regression would be a much less useful/practical solution.

Hello, I propose a new version of CorelPoly as a Calc template with few enhancement from previous version.

Created attachment 57685 [details] New version of CorelPoly as template

@jumbo444, thanks! I am sorry to say, that I still have not the time to look at this issue more deeply. And still there is the problem with the file format! This feature cannot be saved with ODF1.2. Best would be to implement this feature, thus it is known what fileformat changes are really needed. Then propose the necessary file format changes to the OASIS TC and wait for their approval. Then correct the implementation dependent on what the OASIS TC has decided. And then integrate the feature into the master codeline. If someone out there wants to implement this feature just drop me a note and start! Help is welcome!

FWIW I noticed recently that someone (not me) has notes this as a key 'missing feature' in OpenOffice Calc: http://en.wikipedia.org/wiki/OpenOffice.org_Calc Please could this somehow be arranged to have elevated priority, in recognition of the large number of education and scientific users that OpenOffice has?

As a mechanics engineer, I often use polynomial trendlines (degree 2 is generally enough) on MS XL at work, to modelize xy data. Trying to work on my files at home with OOo Calc, I realize that OOo 3.0 does not support this! I consider that this feature is necessary!

@aveldro : as mentioned above, it is useless to ask for a feature if you do not vote for it. With only 36 votes, it is less than 41 votes for bubble charts for instance. So developers may focus only on what is more asked...

I thought I did vote. Obviously, it didn't work. It is done now. I am surprised that this is not considered as an important feature. Gnumeric has it for instance.

Hello, Gathering information about polynomial trend line, I noticed that (again) solution is in front our eyes. An easy way to get polynomial trend line equation is given in Help of trend line dialog box : simply use LINEST function, with proper construction of X data. From Help : The polynomial regression equation A polynomial regression curve cannot be added automatically. You must calculate this curve manually. Create a table with the columns x, x², x³, … , xⁿ, y up to the desired degree n. Use the formula =LINEST(Data_Y,Data_X) with the complete range x to xⁿ (without headings) as Data_X. The first row of the LINEST output contains the coefficients of the regression polynomial, with the coefficient of xⁿ at the leftmost position.

Thanks to jumbo444 for pointing out the work-around, but it does not, in my opinion, solve the problem. The method is far more complicated and slower than clicking a few boxes in a dialog, is prone to error (how many times will I forget that the coefficients are listed in reverse order!?), and requires there be space in the spreadsheet to add the the trend line values. In my workplace, the technicians who work up data would find this a poor substitute for a polynomial option in the Add Trend Line dialog. It will, however, be a real help for many people to have this method available.

Hello, I am looking for help about user experience with polynomial regression curve. As said previously by Ingrid, we need file format modification prior been able to solve this issue. Please join graphics mailing list to discuss about what is needed. A first draft has been written for a file format change in ODF, in order to be able to add polynomial regression curve. It is obvious that asking a file format change only for polynomial regression curve is unproductive. That's why the proposition include other important issues for regression curve: - moving average trend line (issue 66819) - force intercept (issue 34093) - extrapolate (issue 5085) Do you see some other features that are missing to trend lines? I made the comparison with MS-Excel only, but other spreadsheets may have interesting features. We need argument to obtain file format modification. Since how long do other spreadsheets have polynomial trend lines? Which concrete applications do you have with polynomial trend line, not only in maths and physics? ... With this four first features, there are things that should be discussed before asking to OASIS TC a file format change: - polynomial trend line: exponential terms is an option, but is it necessary? - moving average trend line: by default it is prior moving average. What about adding central moving average? Is it necessary, or will there be too many parameters, most of them useless? Many thanks in advance for your answers on graphics mailing list.

This is the last feature that still retains me and my collegues in Excel at work. Please, do this file format modification. Thanks

I absolutely need polynomial fits for my engineering analyses, most recently to display device repeatability during temperature cycling. For this case, I need access to the polynomial, as I subtracted the polynomial from the data to produce a graph of the cyclic errors. It's been suggested I could write a programme to create polynomial fits - and in principle I could. But if it really is a job that can be done by a user in a sensible time-scale, why is it taking so long to find its way into OpenOffice? So, in spite of everything, it's back to EXCEL. BTW, I'd really like to be able to use minimax algorithms instead of regression. It would also be good to be able to choose the weighting that I apply to the fit (for example data density versus uniform along the line). But the regression technique used by EXCEL does the job (with only a modicum of poet-adjustment), so I'll happily settle for that.

I don't get it how this missing feature could have been overlooked for so long time. This is no rocket science. The math behind is trivial to implement and indeed it's needed by many, many users. So, please, developers wake up and try to listen to needs of the actual users. Thank you.

I'm puzzled by the comment about "the problem with the file format". Is the problem in extracting the data, or saving it? In either case, what is the difference between the requirements for polynomials and those for linear regression? While on the subject of the existing fits, although exponential and logarithmic fits are useful in principle, they are only of use if the expression fits both constants (e.g c*e^(a*x). It's also often necessary to use the fitted equation: - it's probably possible at the moment, and I've failed to find it.

hedleyrokos: I believe the format issue is related to the number of parameters, linear, log, exponential regression only need no more than 2 parameters, with polynomial regression you need more than 2, and you cannot determine the maximun parameters you need theoretically. Maybe the odf format are not designed with this cases in mind, so it needs to be extended to save polynomial regression results.

hephooey wrote: "polynomial regression you need more than 2, and you cannot determine the maximum parameters you need theoretically" Maybe I'm misunderstanding what you wrote? My understanding is that generally the user defines the order of fit that is required, so the number of parameters should be fixed at that point. Maybe this is the reason that EXCEL (*$£%&) only allows orders up to 6? BTW, my personal preference is an orthogonally-generated Chebyshev fit (see for example http://mathworld.wolfram.com/ChebyshevPolynomialoftheFirstKind.html equation 24). This has the potential advantage that adding an order would not change the values in previous orders, so these can be generated one-at-a-time if required (i.e. only one output variable per order).

I'm a physics student as well and have found it completely impossible to do any of my homework in OpenOffice because of this issue. As a cheapskate, I cringe at the thought of possibly being forced to buy microsoft office just to get my homework done without all the hassle I'm going through now (which include mooching off of my friend's computers and the school computers to use excel). I would appreciate it if the issue was given some more attention and, yes, I did vote or it.

For people threatening to use non-open-source software to perform their curve-fitting tasks, note the excellent Fityk software by Marcin Wojdyr. http://www.unipress.waw.pl/fityk/. It's got more curve-fitting options that both Excel and OpenOffice will ever have, but is still quite simple to use.

Unfortunately, my use of non-open-source software is anything but a threat. I find myself compelled to use this in order to operate reasonably effectively. I expect that jdpipe is correct that fityk (in common with other singe purpose mathematical tools) is capable, flexible, and easy to use in isolation. The rub is that, however technically capable such isolated tools may be, any need for multiple interfaces for different purposes means that they are in no way a substitute for an integrated tool - provided that the integrated tool is adequate for the requirement. (Please fixyt).

I think there is a lot of fitting functions that could be needed for engineers, in Calc could be a good option add new capabilities for this specific topic. An opensource software used to make plots and regressions is Gnuplot that has Gnufit, and furthermore export graphs as svg or eps files. http://portal.ictp.it/icts/ictpguide/graphics/gnuplot.html Gnuplot make nonlinear regression for a wide variety of functions involving more than 2 unknown parameters.

Come'on guys, time to get on this issue. It has been under consideration since 2003. I just heard from a guy who would love to switch to OOo, but cannot because of this issue.

The addition of this feature would allow me to pass definitively to OOo with my students

Still waiting, but using LINEST meanwhile. The problem with this is that using it for real work clutters up the spreadsheet horribly. However, as I often want to plot the difference between the derived formula and the data, it does have some advantages. The optimum for this would be to be able to add the fitted equation directly into a single column of the spreadsheet. Would this be subject to the same sort of issues as creating a higher-order trendline directly on a graph?

Poly fitting still needed. This is still blocking leaving M$ Excel.

I'm now in the midst of trying to find not-too-onerous workarounds for one of my students because OOo still doesn't have even 2nd order polynomial fitting.

Hello, As exposed above, workarounds already exist: LINEST function, template file attached to this issue. As resolution inside OOo code will take a long time, an extension is in preparation to propose a new solution. However, it is not yet ready due to inversion of matrix in particular cases. To see what I mean, just test this example with corelpoly template (or with http://www.xuru.org/rt/PR.asp#CopyPaste ) with a 3rd order polynomial: X Y 2000 2577 2002 2870 2004 3197 2006 3049 2008 3422 2010 4007 Fitted values are negative! :-( Just be patient, I will post here when the extension will be publicly available.

add me to cc

I do not get any negative fitted value, neither with calculating from the LINEST results nor with the TREND function. If you draw the fitted line into the chart, it look correct.

Actually, LINEST gives positive values for fitted values. It is then the best solution in this case. CorelPoly macro from Laurent Godard (see template attached to this issue) is more user friendly than LINEST but is useless in this case. However, with less severe values (decrease 2000 to all X values in my previous example), it works very well. The goal of the future extension is to gather the benefit of both workarounds. But I agree, it is only a workaround, not the solution of this issue.

@jumbo444 a) You need upwards of 9 SF in many of the coefficients to achieve a decent fit. Realistically, we need 11 digits for this case, which is a lot when inverting a poorly-conditioned matrix using brute-force methods. b) The fact that even the algorithm used at XUXU* works reasonably** if you transpose X to be symmetrical about zero (i.e. -5, -3, -1, 1, 3, 5) may be indicative of other possible useful techniques. *I notice that for moderately large zero order coefficients, XUXU truncates zero-order coefficient at the decimal point even when the resolution is much lower than for the other coefficients - is this indicative of other strange behaviour? **But even here adding third-order fits to successive recorded errors provides successively better fits - I assume that this indicates significant residual issues with the algorithms?

Seems like Octaves pinv does a good job in this case, i.e. ap= pinv(A'*A)*(A'*Y) where A is the expanded X. Could that be a starting point?

OK, I'll take it back pinv didn't really do the job. Linest does ones you understand how to use it.

Well, that wasn't true either. Open Office Linest delivers a very poor result compared to Excel or Octave's Polyfit, see Attachment Polynomial_test.ppt. As you can see Excels Linest gives exactly the same result as the polynomial trend line. And Octave's Polyfit returns the same polynome as Excel. I suppose it would be possible to call Octave from an OO function to produce the polynome (perhaps a bit slow though).

Hello, I am pleased to inform you that Marcin Gutman has build an extension based on Laurent Godard's macro CorelPoly. It can be downloaded at http://extensions.services.openoffice.org/node/4387? It is not the solution of this issue but it looks, from my point of view, has a good workaround. Please add your comments on this extension NOT HERE, but on extension web-page. Comments and users feedback are welcome. Next versions will include more robust method thanks to LINEST function, and some translation.

Created attachment 71853 [details] Polynomial test case

Created attachment 71988 [details] Polyfit returns coefficient of a polynome.

So, I made some functions, e.g. Polyfit, polyval, that gives the same result as Excel and Octave. It uses QR factorisation and Householder transformations. Not to many error checking though. It is available in the testpoly.ods attachment.

@m95vebj: very good point for your functions! Much better than LINEST function. I noticed a slight difference of the coefficients from the 8th figure, of compared with MS-Excel calculation. I hope it could be integrated in CorelPolyGUI extension.

Why not, All though putting e.g. "{=POLYVAL(POLYFIT(C9:C14;D9:D14;3);C9:C14)}" in the column next to your Y's will give you nicely fitted values (as requested earlier in this issue). But it is still early development and needs more effort on error handling and bug testing. Noticed also that result from Octaves polyfit deviates slightly from Excel Linest (while complaining that inverse matrix is singular to machine precision). But the difference is small. I suppose one should have a number of different methods of solving the problem, and to choose the most adequate.

Hello, CorelPolyGUI extension has been updated http://extensions.services.openoffice.org/en/node/4634 It contains more robust method (QR decomposition, with help of m95vebj), a simpler way to use "force intercept" in case of linear problem, and translation in French and Polish.

Sorry for the broken link. It is better to use: http://extensions.services.openoffice.org/en/project/CorelPolyGUI Latest version 0.3.2 corrects minor bugs.

Hello, Thanks to Marcin, new version of CorelPoly (v 1.0!) available on http://extensions-test.libreoffice.org/extension-center/improved-trend-lines/ and soon on OOo repository. Now all classic regression are included.

Ouch! Installed LibreOffice calc and announced in my lab that we are going to use Open office from now on instead of proprietary MS Office 2007 for our work, and was regretting my announcement within 5 minutes when started to draw a Lab Report's polynomial graph. Polynomial type curve is the one which is mostly used in our test reports, so unless Oo developers can find time to accommodate this feature, our kind is unable to use it as their primary productivity tool... I wish I had skills to fix this issue at my own

(In reply to comment #70) While waiting for a final solution, there are ways to overcome this shortcomming e.g. http://extensions.services.openoffice.org/en/project/CorelPolyGUI and http://extensions.services.openoffice.org/en/project/SLAMLib

(In reply to comment #72) Thank you Björn for prompt response, I tried CorelPolyGUI and its a cool workaround. Anyway, will keep waiting until this feature gets included in its proper place, because these workarounds require a serious amount of extra work when it comes to generating and modifying 50-200 sheets per day. Good thing is, Oo can do it one way or the other :) Cheerz

*** Issue 123707 has been marked as a duplicate of this issue. ***

*** Issue 87299 has been marked as a duplicate of this issue. ***

Oh, no poly regression - looks like I will be using WPS spreadsheets instead, seeing as open office cannot be considered as a serious spreadsheet application until it has a poly regression for dummies style feature ;-) Sorry this is not good enough when consumers have so much choice elsewhere these days, shame because I love the open office interface much more.

(In reply to Joseph from comment #76) > Oh, no poly regression You should try CorelPolyGUI extension to get all you need. http://extensions.services.openoffice.org/en/project/CorelPolyGUI

Let me mention that LibreOffice has got polynomial regression since 2014, see https://wiki.documentfoundation.org/ReleaseNotes/4.2#Chart.