Apache OpenOffice (AOO) Bugzilla – Issue 75279

improvement for BINOMDIST

Last modified: 2017-05-20 10:30:57 UTC

The function BINOMDIST has a very early intern overflow. For example BINOMDIST(k;1458;0,4;0) gives an error #VALUE! You can avoid this, if you optimize multiplication and division during calculation. I'll attach a spreadsheet with a BASIC macro, where you can see how to do it. Use "Tabelle2" to compare my macro with the build-in function. The macro is included in the spreadsheet. Besides, the macro has a better precision.

Created attachment 43640 [details] spreadsheet including the macro

reassigning features and enhancements to user requirements@openoffice.org which will be the default owner for those tasks (was introduced some time ago)

In the meantime the Beta function has been implemented. It can be used for higher number of trials. For example for probability mass function: PushDouble( exp( x*log(p)+(n-x)*log(q)-log(n+1.0)-GetLogBeta(n-x+1.0,x+1.0) ) );

Created attachment 75585 [details] Screenshot to compare OOo with Gnumeric

The attached document shows some critical values as chart. The document contains a picture with the same chart in Gnumeric, where can see, that it should be a straight line. (The spreadsheet contains a function TTT in which I have tested the mentioned solution with the Beta function. That will give no results for you.)

Nice, please go ahead :-) This is not a feature requirement, reassigning to general spreadsheet owner.

In theory it should work with PushDouble(GetBetaDistPDF(p, x+1.0, n-x+1.0)/(n+1)); in probability mass function case and with PushDouble(GetBetaDist(1.0-p,n-x,x+1)); in cumulative distribution function case. For probability mass function that works well, but for cumulative distribution function there are some zeros, where there should not be zeros, although this solution is already better than the current implementation. I suspect an underflow, likely in lcl_GetBetaHelperContFrac, but I have not investigate it yet.

Created attachment 75895 [details] Patch against DEV300m100, comments in issue and file testCases

Created attachment 75896 [details] commented test cases for B and BINOMDIST

Problems and solutions ====================== For illustrations see the attached document "testCases.ods". 1. Getting zero in GetBetaDist doesn't come from lcl_GetBetaHelperContFrac but because of missing an underflow case. I have added that case. 2. When q^n and p^n both underflow, the value of BINOMDIST or B respectively is calculated by approximation with beta distribution. This gives an accuracy of about 13 decimal digits. This is the really new part in my patch. [Sheet "no value"] 3. When p is near 1 then q=1-p has severe cancellation errors. But nothing can be done, because when the interpreter gets the value p it is already in binary double and the correct decimal value cannot be reconstructed. [Sheet "small n"] 4. The current solution has a line fSum = 1.0-fFactor and than subtracts values in a loop. This leads to errors, when fFactor< machine epsilon. I detect this case and use sum up instead. [Sheet "constant value"] 5. The sum up parts can result in values greater than 1. Such values are impossible for cumulative distribution functions. I force the value to 1 in those cases. [Sheet "illegal value"] 6. The mass function is used inside BINOMDIST and in B. Therefore I separate it into a method. It cannot be a local function because it uses GetBetaDistPDF. 7. Summing up values of a range is used inside B and inside BINOMDIST. Therefore I separate is into the local function lcl_GetBinomDistRange 8. I have changed ULONG (now sal_uLong) to sal_int32. This will work-and has until now-, because higher values of n will not reach the for-loop because p^n and q^n will underflow. 9. The current version calculates with denormalized values of fFactor. This gives wrong results for fFactor near zero. I use the threshold ::std::numeric_limits<double>::min() instead. [Sheet "B range"] 10. Calculations near the thresholds ::std::numeric_limits<double>::min() and fMachEps might not be optimal. But I think, that it is not worth to detect whether for example 1E-308 would be a better threshold than ::std::numeric_limits<double>::min(). 11. Calculation of cases B(n;p;n-k;n) are wrong in case q^n underflow. In this cases the summand p^n is missing. I have rewritten the loops and initializations in lcl_GetBinomDistRange. I use: sum from j=xs to xe {(n choose j) * (q/p)^(n-j) * p^n} =sum from j=xs to xe {(n choose n-j) * (q/p)^(n-j) * p^n} substitute i=n-j =sum from i=n-xe to n-xs { (n choose i) * (q/p)^i * p^n} [Sheet "B range"]

Thanks, grabbing issue.

In cws calc66: changeset e95dfb8df998 http://hg.services.openoffice.org/cws/calc66/changeset/e95dfb8df998 M sc/source/core/inc/interpre.hxx M sc/source/core/tool/interpr3.cxx You can observe the progress and possible integration date of CWS calc66 at http://tools.services.openoffice.org/EIS2/cws.ShowCWS?Path=DEV300%2Fcalc66

Reassigning to QA for verification.

thomas.klarhoefer@oracle.com