Issue 32833

Summary: normsinv() inverse standard normal distribution law lacks accuracy
Product: Calc Reporter: bindusara <bindu2>
Component: uiAssignee: AOO issues mailing list <issues>
Status: ACCEPTED --- QA Contact:
Severity: Trivial    
Priority: P3 CC: issues
Version: OOo 1.0.2Keywords: needhelp
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: ENHANCEMENT Latest Confirmation in: ---
Developer Difficulty: ---
Issue Depends on:    
Issue Blocks: 18704    
Attachments:
Description Flags
testcase for algorithm change
none
the diff of the code change none

Description bindusara 2004-08-11 09:40:03 UTC
Here are some exemple values of the inverse standard normal distribution law. 

                 normsinv        norminv        
X	         Ooo Value	 Scilab Value	Abs(Scilab-Ooo)
0,000949697119	-3,105814092945	-3,10552833	0,000285763
0,161041497179	-0,990146864303	-0,99018645	3,95857E-05
0,161177168196	-0,989591316210	-0,989631359	4,00428E-05
0,161312839213	-0,989036073018	-0,989076573	4,05E-05
0,195908948548	-0,856173183693	-0,856325252	0,000152068
0,323982388596	-0,456163497224	-0,456591379	0,000427882
0,485837911877	-0,035408970385	-0,03550655	9,75793E-05
etc.

These value are not very good... in fact, they are worse than the results i've 
found with my own algorithms, and with Excel (which is not a reference for 
calculating).
The implementation of standard normal distribution law normsdist() seems good 
(i find  results approximating yours with 10E-8 precision), so normsinv should 
be too, because it should uses theses values. 

I personnally uses a  dichotomia finding process, and I reach a good result 
with values for normsdist similar to yours (accuracy to SCILAB is 10E-8)

My conclusion : the algorithm finding normsinv is bad. 

I did not try seeing how your formulas are written .. maybe it's an API that is 
used? In that case, it's not a good one for this function.

PS : i'm not an expert in mathematics, but i think i can help for this (on a 
theorical point of view) cause i had to write these 2 functions ex-nihilo in a 
professional application.
Comment 1 frank 2004-08-11 10:14:18 UTC
Hi Eike,

I think this is your construction site.

Frank
Comment 2 ooo 2004-08-11 14:27:05 UTC
Accepted. Frank already set the PleaseHelp target. So if you're familiar with
this topic and can come up with proper algorithms we would be glad to include
your contribution. The current implementation may be found in file
sc/source/core/tool/interpr3.cxx method ScInterpreter::ScSNormInv() which uses
gaussinv() as 2-3 other functions do as well.
Before submitting code please sign the JCA, see
http://www.openoffice.org/contributing.html

Thanks
Eike
Comment 3 bindusara 2004-08-14 18:21:11 UTC
 I don't know C (I did some a long time ago) but I can help by explaining a
 way of doing it.

 ScSNormInv uses gaussinv has you said, but gaussinv is not a "real"
 function, it's an approximation. The results I found shows that the
 approximation made is bad (for ScSNormInv at least) . I don't for the
 others. Maybe I'll check if I have the time.

 As normsdist() has a good accuracy ,I suggest using it.

 The mathematical being behind this stuf is : if i know x, i can calculate
 y = normsdist (x) (simple... my function is good enough in Ooo) .  But if
 I know "y", how finding "x"? The usual way is to use newton's algorithm.
 It's very efficient, but i choose a simpler one: dichotomia.

 Let's say we want x with  1.10E-8 accuracy.

 the algorithm looks like that : 


 y = NormInv(x)
 {


  /* We have
normsdist(-5,99) = 0,000000001049
normsdist(5,99)	 = 0,999999998951
We set our searching segment.
*/
  min = -5.99
  max = 5.99

  // same beginning
  double x = GetDouble();
  if (x < 0.0 || x > 1.0)
      SetIllegalArgument();
  else if (x == 0.0 || x == 1.0)
      SetNoValue();
/* outside the external borders of our searching segment , we use the old
method. We know the result will be with an accuracy better than 1.10E-8.*/

  else if (x <= 0,000000001049  || x >= 0,999999998951 )
           {
            PushDouble(gaussinv(x));
           }
  else
           {
            while abs(min-max) > 1.10E-8
               {
                middle = (min + max) / 2
                my_y =  normsdist (middle)
                if ( y = my_y ) --> jackpot, we're lucky, we stop the algorithm
                   return middle
                else {
                   if (y < my_y )  max = middle
                   else            min = middle
                   }

                 }

                 /* End of loop --> we know that  y  is in the segment [ y -
min-1.10E-8, y + min-1.10E-8] (with our definition of the normal standard law
normsdist , not the real law). It means that we're sure that
normsdist(NormInv(x)) and x have the same 7 decimals after coma, and probably
more. */

                 return middle

                 }
 }


I hope it will helps .. and that it is accurate. 
If you want me to test this, you can send me the results in a spreadsheet with
the following datas for example
x                 NormInv(x)
0.0000000001
0.000000001
*10
...
up to 0.1
0.0000001
 . + 0.0002677
up to 1
Comment 4 ooo 2004-08-16 11:19:36 UTC
Hi bindusara,

Your algorithm looks viable. Would you mind filling in the JCA mentioned above
(see http://www.openoffice.org/contributing.html ), and state your real name
here in this issue, so I can look it up when it appears in
http://www.openoffice.org/copyright/copyrightapproved.html

Yes, it sounds silly that I request you doing so for 10 lines or so of code, but
that's what we're required to do, I can't help it.

Thanks
Eike
Comment 5 bindusara 2004-08-16 12:24:10 UTC
I won't fax it (I don't have any fax nearby) but I mail it to you.Due to time 
travel between France and USA, It won't reach its destination before 4 ou 5 
days I think. 

My real name : Benoit Altenbourger
Comment 6 ooo 2005-07-22 17:38:47 UTC
Hi Benoit,

Sorry, this issue went out of my sight until I rescanned for issues with
the PleaseHelp target.. I implemented a change according to your
algorithm, using it for all functions where gaussinv was involved, as
all are related and shouldn't produce different results: NORMSINV(),
NORMINV(), LOGINV() and CONFIDENCE(). I'll attach a testcase document
with 4 sections, each having the results of these functions for the
values you requested, and a column NORMSDIST(NORMSINV(x)) that should
produce results similar to x. The sections are:

1. results of old algorithm
2. results of new algorithm with a precision of fabs(max-min)<=1E-8
3. results of new algorithm with a precision of fabs(max-min)<=1E-12
4. the function as it calculates in the version where it is currently
   loaded, may also be used to export to other products' file formats

You'll notice the difference between #2 and #3 easily in rows 15-24 for
the NORMSDIST(NORMSINV(x)) columns. Note that the displayed precision is
rounded by the number format used, you'll see the real value in the edit
line if you position the cell cursor on some value.

To me the values look much better now, and also compare to the SciLab
values you mentioned.

Please let me know if everything is allright.

Thanks
  Eike
Comment 7 ooo 2005-07-22 17:40:02 UTC
Created attachment 28147 [details]
testcase for algorithm change
Comment 8 ooo 2005-07-22 17:44:47 UTC
Created attachment 28148 [details]
the diff of the code change
Comment 9 niklas.nebel 2005-10-11 19:29:09 UTC
With the patch from issue 26836, we have a much improved gaussinv approximation.
In the NORMSDIST(NORMSINV(x)) test, sometimes the new gaussinv seems to be
better, sometimes the iteration from above. We'll have to take a closer look if
this change is still needed at all.
Comment 10 Marcus 2017-05-20 11:11:20 UTC
Reset assigne to the default "issues@openoffice.apache.org".