Chi-Squared Probabilities by Simulation

The chi-squared (aka chi-square) distribution is used when you want to determine if a set of observed counts matches a set of expected counts. For example, suppose you have a normal six-sided dice and you roll it 6,000 times. You’d expect to get about 1,000 of each outcome.

Suppose you got (900, 1100, 1000, 800, 1200, 1000) of each outcome. You got a lot fewer 4-spots (800) than expected and a lot more 5-spots (1200). If the dice is really fair, how likely is it that you’d see results this far off?

You can calculate a chi-squared statistic and then look up the probability that you could get the observed results. Calculating the statistic is easy but calculating the probability is not so easy. The standard approach is to use ACM Algorithm 299 (which in turn uses Algorithm 209).


Just for fun I thought I’d see if I could compute chi-squared probabilities by using a simulation: my program would roll a dice a lot of times and calculate the probability of a given chi-squared value.

Well, the simulation approach sort of worked, but to get consistent results I had to run the simulation many thousands of times. Interestingly, because the chi-square statistic is an estimate, and ACM 299 is an estimate, the simulation approach actually gives results that are more accurate than the traditional technique. Bottom line: the simulation approach is interesting, but not useful except in very weird scenarios.

This entry was posted in Machine Learning. Bookmark the permalink.