Issue 69069

Summary: NORMSDIST, POISSON statistical functions are not accurate
Product: Calc Reporter: open_me <rtvvupxyay3oa5b>
Component: programmingAssignee: oc
Status: CLOSED FIXED QA Contact: issues@sc <issues>
Severity: Trivial    
Priority: P3 CC: ambrogio.de.lorenzo, issues, rb.henschel
Version: OOo 2.0.4   
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: PATCH Latest Confirmation in: ---
Developer Difficulty: ---
Issue Depends on: 97091, 100119    
Issue Blocks: 18704    
Attachments:
Description Flags
Results of the Replication
none
Enhanced sheet
none
Avoid overflow and underflow in POISSON
none
Comparison with Excel and Gnumeric; example with large and with very large lambda none

Description open_me 2006-08-29 17:27:15 UTC
OOo Calc produces results that are not accurate. Affected are some of the
statistical functions.

I used http://www.csdassn.org/software_reports/gnumeric.pdf and replicated the
tests for OOo Calc. I did only the first half of the tests; you may want to
check the second half, too...

As it stands, OOo Calc offers functions the user cannot rely on.
Comment 1 open_me 2006-08-29 17:27:57 UTC
Created attachment 38835 [details]
Results of the Replication
Comment 2 9point9 2006-11-01 12:26:57 UTC
Confirmed in OOo 2.0.2 with the spreadsheet provided.
Comment 3 gercokees 2006-11-20 13:28:20 UTC
Hi,
There was a small error in the sheet provided. I corrected this, an added a
column  where i divided the OOo Calc value by the exact value. Where this does
not equal one (1) there is a problem (marked with blue). I will attatch this sheet.
So, there is a problem with:
=POISSON(134;200;1) (should equal 0,000000445617 but gives an error)
=POISSON(200;200;1) (should equal  0,518797 but gives an error)
=POISSON(250;200;1) (should equal 0,999715 but gives an error)
=NORMSDIST(-8,2) (should equal 1,20194E-016 but is 8% lower)
=NORMSDIST(-8,32) (should equal 5,20557E-017 but is 6% higer)
Confirmed this isue...
Comment 4 gercokees 2006-11-20 13:30:30 UTC
Created attachment 40765 [details]
Enhanced sheet
Comment 5 frank 2007-01-12 11:37:29 UTC
Hi Eike,

please have a look at this one. Maybe something can be changed based on the
openformula spec. ?

Frank
Comment 6 ooo 2007-07-25 11:42:00 UTC
Sorry for "dependency spam" when changing from "depends on" to "blocks" issue 18704.
Comment 7 ooo 2008-05-30 16:04:58 UTC
Propably not doable in time frame for 3.0, retargeting to 3.1
Comment 8 Regina Henschel 2008-07-01 17:51:52 UTC
*** Issue 91269 has been marked as a duplicate of this issue. ***
Comment 9 Regina Henschel 2008-07-01 17:52:35 UTC
*** Issue 91269 has been marked as a duplicate of this issue. ***
Comment 10 Regina Henschel 2008-12-10 10:35:11 UTC
The wrong results in NORMSDIST are due to cancellation for small negative
values, where gauss() is near -0.5
The problem can be solved in two ways:
(1) Use NORMSDIST(x)= 0.5*ERFC(-x/SQRT(2)). Unfortunaly ERFC is only an addin
function, see my issue 97091.
(2) Use NORMSDIST(x) 
        = 0.5+0.5*GetLowRegIGamma(0.5,0.5*x*x) for x>=0
        = 0.5*GetUpRegIGamma(0.5,0.5*x*x)      for x<0

I would prefer the solution (1), because GetLowRegIGamma and GetUpRegIGamma use
iterations which might be slow in some cases and erfc uses a short polynomial.
Comment 11 Regina Henschel 2009-03-25 14:14:15 UTC
The improvement of the accuracy of NORMSDIST is included in the patch in issue
100119.
Comment 12 Regina Henschel 2009-04-21 21:21:42 UTC
The problems in POISSON have two reasons:
The implementation calculated the factorial in the formula as separate sub-term,
which will overflow for large x.
It calculated exp(-lambda), which will underflow for large lambda.

The second reason was not only in the cumulative part but in the mass part too.
(The other problem in the mass part had been fixed in issue 26022 already.)

To avoid the overflow in factorial, I change the calculation order.
To avoid the underflow in exp(-lambda), I made a case distinction. For large
lambda a different method is used. That is not as accurate as the iteration in
the case of small lambda, but it still gives 12-13 significant digits.
Comment 13 Regina Henschel 2009-04-21 21:23:11 UTC
Created attachment 61713 [details]
Avoid overflow and underflow in POISSON
Comment 14 Regina Henschel 2009-04-22 10:04:45 UTC
Created attachment 61725 [details]
Comparison with Excel and Gnumeric; example with large and with very large lambda
Comment 15 ooo 2009-04-22 10:32:47 UTC
Thank you! Scheduling for inclusion in CWS odff06.
Comment 16 ooo 2009-04-29 23:40:18 UTC
In cws odff06:

revision 271396
sc/source/core/tool/interpr3.cxx

I'm impressed :-)  Thanks again!

Btw, just changed  bool kum  to  bool bCumulative  and used GetBool() instead of
GetDouble() for the argument and adapted the condition from  (kum == 0.0)  to 
(!bCumulative).
Comment 17 ooo 2009-09-03 21:05:58 UTC
Reassigning to QA for verification.
Comment 18 oc 2009-09-09 15:52:01 UTC
verified in internal build cws_odff06