Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing
|Summary:||Force linear regression curve to go through zero point (origin) or other Y-value defined|
|Component:||chart||Assignee:||AOO issues mailing list <issues>|
|Status:||CONFIRMED ---||QA Contact:|
|Priority:||P3||CC:||alek.email, belegdol, issues, jbf.faure, jumbo4444, kamataki, robert.pollak, stefan.weigel, tony.galmiche.ooo|
|Version:||3.3.0 or older (OOo)||Keywords:||ms_interoperability, oooqa, rfe_eval_ok|
|Issue Type:||FEATURE||Latest Confirmation in:||---|
|Issue Depends on:|
Description denisl 2004-09-12 20:22:00 UTC
Whith 2 columns data such as X Y 0 0 1 1 2 2 3 3 Select XY Chart as plot type, then Insert -> Statistics -> Linear Regression There is no option as in the LINEST function to force a regression through the zero point and graph the corresponding line. Issue 33883 is somewhat related.
Comment 1 denisl 2004-09-12 20:24:52 UTC
Oups, The numerical example is not very good, X Y 0 0 1 1 2 2.5 3 3 Would be better
Comment 2 kla 2004-09-17 13:48:54 UTC
HI Bettina, one for you. tk
Comment 3 kla 2004-09-17 13:50:21 UTC
Comment 4 bettina.haberer 2004-11-29 18:23:29 UTC
Reassigned to Ingrid.
Comment 5 Matthias Basler 2005-03-29 12:59:28 UTC
I support this request. As scientifically working student I know that some (linear!) regressions need to go through the origin, by definition. Currently it is not possible to visualize these in charts. Is this already incorporated into the proposal for the new chart engine? One very simple solution is to just add an additional regression type "Linear regression through origin" to the list of available regressions in the data series dialog.
Comment 6 IngridvdM 2005-04-14 15:34:08 UTC
We should introduce this missing feature. Even more it should be possible to force the regression line to go through a customizable value - not limitting us to 0. ->bm: Please take care of this issue.
Comment 7 oooer 2006-07-11 21:00:39 UTC
"As scientifically working student" you should question if this a valid thing to do. If a physical relationship is known to go through zero and your data does not, you should analyse why that is and write an appropriate conclusion not falsify your results by arbitarily forcing a false fit. One common reason for this effect (apart from experimental error) is that least squares fit assumes that the y residuals are far greater than the x residuals. ie. x is a controlled variable with negligable error s.t. rx/ry -> 0 If this is not the case, the derivation of the least squares line fit is NOT mathematically valid. I have seen examples if dispersed data where the least squares fit was "visibly" wrong by a significant degree. The data did not fit the above requirement and the fit was garbage. Most people are quite ignorant of this important limitation in using linear regression. Least squares can also be strongly influenced by an erroneous data point (a flyer). One further cause is that most implementations of least squares minimise the squares of the y residuals not the perpendiculars to the fitted line. I see little use of this "feature" as suggested. If a user wants to add a fictive slope to his data he can simply add a line close to the lin. regression fit then remove the true line. Adding this as a function will simply mislead non-technical users into thinking that this is an alternative VALID regression fit. It is not. That would seem undersirable.
Comment 8 bjoern.milcke 2006-07-12 09:29:13 UTC
->oooer: Thanks for you comments. You are probably right, that forcing a regression curve to go through a certain point might lead to wrong results without the user noticing this. Well, we have one vote, one comment here FOR having the feature and one AGAINST. I must admit I do not have too much practical experience with regression, so I cannot really vote for or against this feature. The only point that is a FOR is, that MS has this feature in Excel (that's why this issue has the interoperability keyword), thus we will need this feature for improving our import filters. Of course, we still have to decide about a UI for it. And maybe we can show a warning dialog that tells a user that what he does might not be what he wants. ->oooer: Is there a way to analyze the data and find out if limiting the curve to a certain point makes sense or not? I mean, I could imagine that if your curve passes the point only by a very small amount (relative error), it might not completely fake the result. Of course, the question still remains, why you would want to adjust the curve to pass a certain point. It is only a regression or trend curve, not something where you can read data of.
Comment 9 Matthias Basler 2006-07-12 18:01:53 UTC
Thanks for your thoughtful comment, oooer. May I suggest following compromise: There SHOULD be a feature to have a graph going through the point of origin, but it should NOT be termed "linear regression", since it is none, as you pointed out. Maybe it could be termed "Best linear fit though point of origin" or anything similar that makes a quick point without being confused with linear regression or other inappropriate statistical terms. (Suggestions welcome.) Adding a line by hand is not a solution at all, since the whole "idea" of the feature is, that the slope is calculated from the data. When drawing the line by hand one would have to calculate the line's angle manually beforehand. Beside that, "drawing" on diagrams is imho never a good idea, since the diagram may change in size and content, so the drawn line wouldn't fit any more after any change.
Comment 10 oooer 2006-07-14 19:38:53 UTC
@bm hmm, I did not realise this was a cloning issue. I recognise the idea of maximising compatability of import filters but I would have hoped that rather than being a partial clone of a not-too-recent MSO, things like this would be where OpenOffice could distinguish itself as a superior product. The other problem with copying MSO is that users will expect the _same_ false results or else they will complain the OpenOffice does not work correctly, so you will need to guess how they force the false zero. There seem to be too different senarios: the import filter and if/how to add this functionality to Calc UI. In the case of the import filter the idea of a dlg is good. I would favour quite explicit warning that this is included for compatability only and will produce an invalid fit and incorrect slope. Preferably with a help button taking them to some text that explains the limitations of lin.regression and the implications of cheating the fit. Once aware of the choice the user can decide what to do. As a feature for Calc UI, I think ANY attempt at lin. regression should post a warning as to the limitations of the method that will be used and a link to more info (I see that as the duty of the program since the user cannot know how it is done and will assume in good faith that the methodology is valid). @matthias >>Adding a line by hand is not a solution at all, since the whole "idea" of the feature is, that the slope is calculated from the data. That's just the problem, calculating a slope that comes out WROMG is not calculating from the data but gives the impression it is. It's worse than having to do a fit by eye because it has a undue bona fide of having been calculated by a computer. Maybe it would make more sence to provide an easy means to fit a line by eye that will follow any resizing etc. in the same way as an applied regression formula apparently does and then show the formula. I'm thinking of the way that gimp or blender let you create a line or bezier curve and pull it around until it fits. At least for the linear case this would be easy to program. Link it to the chart so that is scales and moves with it. This will allow those interested in bending the truth to fit thier data to do what they need to do and also allow a valid means of fitting a line to data that do not fit the criteria for applying least squares analysis. I'm not suggesting we make the use get out pen and paper here. The human brain is much better at seeing the "best fit" than an incorrectly applied bit of maths. I cannot even imagine the ammount of innocently derived yet spurious "scientific results" caused by inappropriate use of linear regression. One major cause of this (appart from the quality of science education) is spreadsheet software that makes it as easy as clicking a button. I think OpenOffice has a chance to innovate rather than follow here. Thanks for your interest in my comments.
Comment 11 oooer 2006-07-14 20:41:32 UTC
>> since the diagram may change in size and content, so the drawn line wouldn't >> fit any more after any change. @matthias sorry, I did not understand your point until I reread this. If a user fits a line by eye , he will know it is no longer valid if he changes the data. If he sets an automatic regression to inappropriate data it will never fit but he will believe it does. The former would seem to be the better choice.
Comment 12 Matthias Basler 2006-07-16 22:28:26 UTC
@oooer: You see this from a strict scientifically point of view when you say "bend the truth". I may point out that Office suites are used (and are to be used) by all sorts of people, some of which might not even care about scientific statistics, just want a nice fitting for visualization. Take f.e. a businessman that wants to visualize the mean growth of their company's profits and which knows that it had 0 profits in, lets say, 2000, because it the company was just founded this year. He/she wants a diagram with a line going through the "(2000, 0)" point simply for the look of it. Also sometimes constraints such as "fixed points" simply exist. Sometimes it is better for a computer model to prefer a slightly wrong relationship formula over the correct regression, simply to avoid computation side effects and artefacts, or to simplify calculation. You'd maybe call this unscientifically, but such tradeoffs are imho justified in science - as long as they are documented. (I won't go deeper in this - this mailing list is the wrong place for such debates.) Specifically to your comments: - Having warning dialogs if the user does something "unscienfically" would be very appropriate in a scientific software like an actual statistics program. For a "general" office suite I find this out of scope. Imho OOo cannot (and should not even try) to find out when, lets say, diagramm types are inappropriate, diagram colors are hard to tell apart, formulae are nonsense or anything of this sort. > - "The human brain is much better at seeing the "best fit" than an incorrectly > applied bit of maths." You seem to have a much better brain for this, since with my brain I would not dare to fit a line manually, except for a rough sketch. ;-) But seriously, people don't use OOo's diagram features to "paint". And I don't like this "penalty" approach: "If you are not doing strict statistics, use OOo Draw and paint your line by hand."
Comment 13 IngridvdM 2006-11-06 21:34:41 UTC
*** Issue 71191 has been marked as a duplicate of this issue. ***
Comment 14 belegdol 2007-11-06 19:34:43 UTC
Any updates on this? REGLINP already supports a fixed intercept, so...
Comment 15 belegdol 2007-11-06 20:51:05 UTC
Umm, I mean LINEST. /me loves translating the functions.
Comment 16 IngridvdM 2008-07-03 11:39:22 UTC
Comment 17 paul8paul 2008-07-03 13:20:47 UTC
I agree. This is very important function for me as a science student. I can't stop using Microsoft Office until I can do this in Open Office - at least for linear regression. In fact in the interest of flexibility I think you should be able to choose exactly where it intersects either axis. I know there are scientific arguments for and against this function but what difference does it make who is scientifically right or wrong. If this is a function that users want to use then Open Office should be able to do it as simply and quickly as possible.
Comment 18 paul8paul 2008-07-03 13:26:44 UTC
Its also necessary to be able to extrapolate the line as much as desired in either direction.
Comment 19 IngridvdM 2008-07-22 14:21:34 UTC
reset to new
Comment 20 Regina Henschel 2008-11-23 01:13:54 UTC
*** Issue 96461 has been marked as a duplicate of this issue. ***
Comment 21 dskene 2008-11-23 03:21:16 UTC
There are many comments about this not being important as it is "faking data". That simply isn't the case. In the real world and not just on paper, there are error tolerances in what can be measured and they are often not exact. This is why you need a trendline in the first place. If you have only a small number of points there is a higher probability of the fit not meeting reality. Now, those who suggest that the data really does go through a different y intercept there are examples where it simply must be through a given point, such as zero. For example, if you plot concentration versus light absorbance then zero concentration must have zero absorbance. Extension of a spring might be another example. If there is no mass on the spring it needs to have a zero extension. The data in these examples would be MORE incorrect to have them pass through a y intercept other than zero.
Comment 22 Stefan Weigel 2008-12-12 20:12:46 UTC
We should not only be able to force the regression curve to go through zero, but also through any other definable y axis intercept. I will attach a screenshot from main competitor as an example (sorry only in german).
Comment 23 Stefan Weigel 2008-12-12 20:14:05 UTC
Created attachment 58788 [details] Screenshot from competitor. Define y axis intercept in options dialog of regression curve.
Comment 24 shamran 2009-03-01 14:41:43 UTC
I have used OO for a number of years, and this feature is one I Keep missing every so often. So please find the resources to implement this feature.
Comment 25 lka 2010-05-13 10:21:59 UTC
Same as shamran (and totally agree with dskene's post). So UP!
Comment 26 jumbo444 2010-09-27 08:55:00 UTC
Hello, I am pleased to inform you that Marcin Gutman has build an extension based on Laurent Godard's macro CorelPoly. It can be downloaded at http://extensions.services.openoffice.org/node/4387? It is not the solution of this issue but it looks, from my point of view, has a good workaround. In addition to polynomial regression, it also enable "force intercept" option. Please add your comments on this extension NOT HERE, but on extension web-page. Comments and users feedback are welcome. Next versions will include more robust method thanks to LINEST function, and some translation.
Comment 27 jumbo444 2010-12-22 08:36:47 UTC
Hello, CorelPolyGUI extension has been updated http://extensions.services.openoffice.org/en/node/4634 It contains more robust method (QR decomposition), a simpler way to use "force intercept" in case of linear problem, and translation in French and Polish.
Comment 28 jumbo444 2010-12-23 21:06:08 UTC
Sorry for the broken link. It is better to use: http://extensions.services.openoffice.org/en/project/CorelPolyGUI Latest version 0.3.2 corrects minor bugs.
Comment 29 Laurent BP 2011-10-04 08:55:13 UTC
Hello, New version of CorelPoly (v 1.0!) available on http://extensions-test.libreoffice.org/extension-center/improved-trend-lines/ and soon on OOo repository. Now all classic regression are included, with force intercept option.