Chi-Square Tests using R

I wrote an article titled “Chi-Square Tests using R” in the March 2016 issue of Visual Studio Magazine. See


Chi-Square tests are relatively easy to perform but it’s not so easy to explain when they’re used. There are three main problem scenarios where a chi-square test can be used. All three involve situations where you have counts of data.

The first example in the article shows a chi-square test for equal counts. You have a normal dice that has 6 sides. You roll the dice 60 times. If the dice is fair you’d expect to get about 10 of each possible result. But even if the dice is fair you probably won’t get exactly 10 of each result. A chi-square test can tell you the probability that your dice is fair given the observed counts of results.

The second example tests if a roulette wheel is fair. An American style roulette wheel has 18 red numbers, 18 black numbers, and 2 green numbers (0 and 00). So, if the wheel is fair, and you spin the wheel many times, you’d expect to get 18/38 = 47.37% red numbers, 18/38 = 47.37% black numbers, and 2/18 = 5.27% green numbers. A chi-square test can tell you the probability the wheel is fair given observed counts of results.

The third example in the article is called a test for independence of two factors. Suppose you have 110 males and 90 females. Each person uses some Web site with low, medium, or high usage. A chi-square test can tell you the probability that the two factors, sex and usage, are independent (meaning males and females use the Web site in roughly the same way) or not (meaning usage differs by sex somehow).

Once you set the data up for a chi-square test, performing the test is easy. The hard parts are knowing when to use a chi-square test, and how to carefully interpret the results.

This entry was posted in Machine Learning. Bookmark the permalink.