Several of my blog entries here have described various mathematical techniques that are really useful in software testing. One of the most very useful techniques is the chi-square test for goodness of fit. This test applies to many situations, but is not often used in practice. I suspect this is because most of the testers I know do not understand the chi-square test and therefore do not recognize situations when the test is useful. The chi-square test can be used to determine how well a set of actual results match a corresponding set of expected results. Here’s an example. Suppose you have some software system that is supposed to randomly spit out a total of 100 of the 5 letters ‘A’ through ‘E’. Therefore, you’d expect to get 20 of each letter, subject to a certain amount of variation. Suppose you run the system and get this actual data:
A = 12, B = 10, C = 20, D = 30, E = 28
The chi-square statistic is the sum of the squared differences between each observed and expected pair of numbers divided by the expected number. In this case the chi-square statistic is (12-20)2/20 + (10-20)2/20 (20-20)2/20 + (30-20)2/20 + (28-20)2/20 = 3.2 + 5.0 + 0.0 + 5.0 + 3.2 = 16.4. The number f degrees of freedom for chi-square is just k-1, the number of categories minus one, in this case df = k-1 = 5-1 = 4. Now we can look up the 95% critical value from any stats book and find it is 9.49. Because our calculated chi-square value of 16.4 is greater than the critical value of 9.49, we conclude that the software system is not performing as it should — there is less than a 5% chance we’d get the observed data if the system is actually spitting out evenly distributed letters. (Excel can do chi-square — see the image below — the .000324 is the probability that the observed numbers match the expected numbers). There’s a lot more to chi-square but once you realize the situations in which chi-square can be used, you’ll be surprised at just how useful chi-square is in software testing.