Apache OpenOffice (AOO) Bugzilla – Issue 32833

normsinv() inverse standard normal distribution law lacks accuracy

Last modified: 2017-05-20 11:11:20 UTC

Here are some exemple values of the inverse standard normal distribution law. normsinv norminv X Ooo Value Scilab Value Abs(Scilab-Ooo) 0,000949697119 -3,105814092945 -3,10552833 0,000285763 0,161041497179 -0,990146864303 -0,99018645 3,95857E-05 0,161177168196 -0,989591316210 -0,989631359 4,00428E-05 0,161312839213 -0,989036073018 -0,989076573 4,05E-05 0,195908948548 -0,856173183693 -0,856325252 0,000152068 0,323982388596 -0,456163497224 -0,456591379 0,000427882 0,485837911877 -0,035408970385 -0,03550655 9,75793E-05 etc. These value are not very good... in fact, they are worse than the results i've found with my own algorithms, and with Excel (which is not a reference for calculating). The implementation of standard normal distribution law normsdist() seems good (i find results approximating yours with 10E-8 precision), so normsinv should be too, because it should uses theses values. I personnally uses a dichotomia finding process, and I reach a good result with values for normsdist similar to yours (accuracy to SCILAB is 10E-8) My conclusion : the algorithm finding normsinv is bad. I did not try seeing how your formulas are written .. maybe it's an API that is used? In that case, it's not a good one for this function. PS : i'm not an expert in mathematics, but i think i can help for this (on a theorical point of view) cause i had to write these 2 functions ex-nihilo in a professional application.

Hi Eike, I think this is your construction site. Frank

Accepted. Frank already set the PleaseHelp target. So if you're familiar with this topic and can come up with proper algorithms we would be glad to include your contribution. The current implementation may be found in file sc/source/core/tool/interpr3.cxx method ScInterpreter::ScSNormInv() which uses gaussinv() as 2-3 other functions do as well. Before submitting code please sign the JCA, see http://www.openoffice.org/contributing.html Thanks Eike

I don't know C (I did some a long time ago) but I can help by explaining a way of doing it. ScSNormInv uses gaussinv has you said, but gaussinv is not a "real" function, it's an approximation. The results I found shows that the approximation made is bad (for ScSNormInv at least) . I don't for the others. Maybe I'll check if I have the time. As normsdist() has a good accuracy ,I suggest using it. The mathematical being behind this stuf is : if i know x, i can calculate y = normsdist (x) (simple... my function is good enough in Ooo) . But if I know "y", how finding "x"? The usual way is to use newton's algorithm. It's very efficient, but i choose a simpler one: dichotomia. Let's say we want x with 1.10E-8 accuracy. the algorithm looks like that : y = NormInv(x) { /* We have normsdist(-5,99) = 0,000000001049 normsdist(5,99) = 0,999999998951 We set our searching segment. */ min = -5.99 max = 5.99 // same beginning double x = GetDouble(); if (x < 0.0 || x > 1.0) SetIllegalArgument(); else if (x == 0.0 || x == 1.0) SetNoValue(); /* outside the external borders of our searching segment , we use the old method. We know the result will be with an accuracy better than 1.10E-8.*/ else if (x <= 0,000000001049 || x >= 0,999999998951 ) { PushDouble(gaussinv(x)); } else { while abs(min-max) > 1.10E-8 { middle = (min + max) / 2 my_y = normsdist (middle) if ( y = my_y ) --> jackpot, we're lucky, we stop the algorithm return middle else { if (y < my_y ) max = middle else min = middle } } /* End of loop --> we know that y is in the segment [ y - min-1.10E-8, y + min-1.10E-8] (with our definition of the normal standard law normsdist , not the real law). It means that we're sure that normsdist(NormInv(x)) and x have the same 7 decimals after coma, and probably more. */ return middle } } I hope it will helps .. and that it is accurate. If you want me to test this, you can send me the results in a spreadsheet with the following datas for example x NormInv(x) 0.0000000001 0.000000001 *10 ... up to 0.1 0.0000001 . + 0.0002677 up to 1

Hi bindusara, Your algorithm looks viable. Would you mind filling in the JCA mentioned above (see http://www.openoffice.org/contributing.html ), and state your real name here in this issue, so I can look it up when it appears in http://www.openoffice.org/copyright/copyrightapproved.html Yes, it sounds silly that I request you doing so for 10 lines or so of code, but that's what we're required to do, I can't help it. Thanks Eike

I won't fax it (I don't have any fax nearby) but I mail it to you.Due to time travel between France and USA, It won't reach its destination before 4 ou 5 days I think. My real name : Benoit Altenbourger

Hi Benoit, Sorry, this issue went out of my sight until I rescanned for issues with the PleaseHelp target.. I implemented a change according to your algorithm, using it for all functions where gaussinv was involved, as all are related and shouldn't produce different results: NORMSINV(), NORMINV(), LOGINV() and CONFIDENCE(). I'll attach a testcase document with 4 sections, each having the results of these functions for the values you requested, and a column NORMSDIST(NORMSINV(x)) that should produce results similar to x. The sections are: 1. results of old algorithm 2. results of new algorithm with a precision of fabs(max-min)<=1E-8 3. results of new algorithm with a precision of fabs(max-min)<=1E-12 4. the function as it calculates in the version where it is currently loaded, may also be used to export to other products' file formats You'll notice the difference between #2 and #3 easily in rows 15-24 for the NORMSDIST(NORMSINV(x)) columns. Note that the displayed precision is rounded by the number format used, you'll see the real value in the edit line if you position the cell cursor on some value. To me the values look much better now, and also compare to the SciLab values you mentioned. Please let me know if everything is allright. Thanks Eike

Created attachment 28147 [details] testcase for algorithm change

Created attachment 28148 [details] the diff of the code change

With the patch from issue 26836, we have a much improved gaussinv approximation. In the NORMSDIST(NORMSINV(x)) test, sometimes the new gaussinv seems to be better, sometimes the iteration from above. We'll have to take a closer look if this change is still needed at all.

Reset assigne to the default "issues@openoffice.apache.org".