Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing |
Summary: | normsinv() inverse standard normal distribution law lacks accuracy | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Calc | Reporter: | bindusara <bindu2> | ||||||
Component: | ui | Assignee: | AOO issues mailing list <issues> | ||||||
Status: | ACCEPTED --- | QA Contact: | |||||||
Severity: | Trivial | ||||||||
Priority: | P3 | CC: | issues | ||||||
Version: | OOo 1.0.2 | Keywords: | needhelp | ||||||
Target Milestone: | --- | ||||||||
Hardware: | All | ||||||||
OS: | All | ||||||||
Issue Type: | ENHANCEMENT | Latest Confirmation in: | --- | ||||||
Developer Difficulty: | --- | ||||||||
Issue Depends on: | |||||||||
Issue Blocks: | 18704 | ||||||||
Attachments: |
|
Description
bindusara
2004-08-11 09:40:03 UTC
Hi Eike, I think this is your construction site. Frank Accepted. Frank already set the PleaseHelp target. So if you're familiar with this topic and can come up with proper algorithms we would be glad to include your contribution. The current implementation may be found in file sc/source/core/tool/interpr3.cxx method ScInterpreter::ScSNormInv() which uses gaussinv() as 2-3 other functions do as well. Before submitting code please sign the JCA, see http://www.openoffice.org/contributing.html Thanks Eike I don't know C (I did some a long time ago) but I can help by explaining a way of doing it. ScSNormInv uses gaussinv has you said, but gaussinv is not a "real" function, it's an approximation. The results I found shows that the approximation made is bad (for ScSNormInv at least) . I don't for the others. Maybe I'll check if I have the time. As normsdist() has a good accuracy ,I suggest using it. The mathematical being behind this stuf is : if i know x, i can calculate y = normsdist (x) (simple... my function is good enough in Ooo) . But if I know "y", how finding "x"? The usual way is to use newton's algorithm. It's very efficient, but i choose a simpler one: dichotomia. Let's say we want x with 1.10E-8 accuracy. the algorithm looks like that : y = NormInv(x) { /* We have normsdist(-5,99) = 0,000000001049 normsdist(5,99) = 0,999999998951 We set our searching segment. */ min = -5.99 max = 5.99 // same beginning double x = GetDouble(); if (x < 0.0 || x > 1.0) SetIllegalArgument(); else if (x == 0.0 || x == 1.0) SetNoValue(); /* outside the external borders of our searching segment , we use the old method. We know the result will be with an accuracy better than 1.10E-8.*/ else if (x <= 0,000000001049 || x >= 0,999999998951 ) { PushDouble(gaussinv(x)); } else { while abs(min-max) > 1.10E-8 { middle = (min + max) / 2 my_y = normsdist (middle) if ( y = my_y ) --> jackpot, we're lucky, we stop the algorithm return middle else { if (y < my_y ) max = middle else min = middle } } /* End of loop --> we know that y is in the segment [ y - min-1.10E-8, y + min-1.10E-8] (with our definition of the normal standard law normsdist , not the real law). It means that we're sure that normsdist(NormInv(x)) and x have the same 7 decimals after coma, and probably more. */ return middle } } I hope it will helps .. and that it is accurate. If you want me to test this, you can send me the results in a spreadsheet with the following datas for example x NormInv(x) 0.0000000001 0.000000001 *10 ... up to 0.1 0.0000001 . + 0.0002677 up to 1 Hi bindusara, Your algorithm looks viable. Would you mind filling in the JCA mentioned above (see http://www.openoffice.org/contributing.html ), and state your real name here in this issue, so I can look it up when it appears in http://www.openoffice.org/copyright/copyrightapproved.html Yes, it sounds silly that I request you doing so for 10 lines or so of code, but that's what we're required to do, I can't help it. Thanks Eike I won't fax it (I don't have any fax nearby) but I mail it to you.Due to time travel between France and USA, It won't reach its destination before 4 ou 5 days I think. My real name : Benoit Altenbourger Hi Benoit, Sorry, this issue went out of my sight until I rescanned for issues with the PleaseHelp target.. I implemented a change according to your algorithm, using it for all functions where gaussinv was involved, as all are related and shouldn't produce different results: NORMSINV(), NORMINV(), LOGINV() and CONFIDENCE(). I'll attach a testcase document with 4 sections, each having the results of these functions for the values you requested, and a column NORMSDIST(NORMSINV(x)) that should produce results similar to x. The sections are: 1. results of old algorithm 2. results of new algorithm with a precision of fabs(max-min)<=1E-8 3. results of new algorithm with a precision of fabs(max-min)<=1E-12 4. the function as it calculates in the version where it is currently loaded, may also be used to export to other products' file formats You'll notice the difference between #2 and #3 easily in rows 15-24 for the NORMSDIST(NORMSINV(x)) columns. Note that the displayed precision is rounded by the number format used, you'll see the real value in the edit line if you position the cell cursor on some value. To me the values look much better now, and also compare to the SciLab values you mentioned. Please let me know if everything is allright. Thanks Eike Created attachment 28147 [details]
testcase for algorithm change
Created attachment 28148 [details]
the diff of the code change
With the patch from issue 26836, we have a much improved gaussinv approximation. In the NORMSDIST(NORMSINV(x)) test, sometimes the new gaussinv seems to be better, sometimes the iteration from above. We'll have to take a closer look if this change is still needed at all. Reset assigne to the default "issues@openoffice.apache.org". |