Suppose you flip a coin 10 times. What is the probability you’ll get exactly 2 heads? This is an example of a binomial probability problem. In math terms n is the number of trials (10 for the example), k is the number of “successes” (2), and p is the probability of getting a success in a single trial. Because heads and tails are equally likely, p = 0.5 in this example.
From Wikipedia, the equation for calculating Pr(k | n,p) is:
The trickiest part of computing binomial probabilities exactly is calculating the Choose(n,k) function because Choose() can return astronomically large values for even moderate size of n and k. For example, Choose(500, 100) =
One solution is to use a BigInteger class that can handle arbitrarily large integers. I coded up this approach for binomial probabilities in an earlier blog post.
However, there is a traditional alternative that was the main option in the days before computers — use the Normal distribution to approximate the binomial distribution.
The basic idea is easy but there are a ton of interrelated details. 1.) There are ways to calculate the approximate area under the standard Normal (aka Gaussian) curve from -infinity to some value z.
2.) The binomial distribution for k successes in n trials with p the probability of a success in a single trial, is fairly close to the Normal distribution where z = (k – mean) / sd, where mean = n * p and sd = sqrt(n * p * (1-p)).
3.) To account for the way the actual binomial distribution, which is rectangles, goes past the curve of the Normal distribution, you add 1/2 to k. This is called “continuity correction”.
I’ve left out pages and pages and pages of additional details.
Anyway, just for fun I coded up an implementation that approximates the binomial distribution using the Normal distribution, with C#. My approximation gives identical results to the R language approximation.
Very interesting little exercise.