The Kolmogorov-Smirnov (KS) test is a classical statistics technique that can be used to compare a set of observed values with a set of expected values, or compare a set of values with a known distribution.

Suppose you have n = 8 movie ratings where each rating is a number between 1.0 and 5.0 — (1.2, 2.3, 2.4, 2.6, 2.7, 2.9, 3.8, 4.6). It looks like the ratings are low. Is there statistical evidence that the ratings are not evenly (uniform) distributed?

The key idea of KS is to compare observed with expected, but you compare observed cumulative frequencies with expected cumulative frequencies. I constructed this table:

rating obs exp co ce cof cef -------------------------------------- 1.0 - 1.5 1 1 1 1 .125 .125 1.5 - 2.0 0 1 1 2 .125 .250 2.0 - 2.5 2 1 3 3 .375 .375 2.5 - 3.0 3 1 6 4 .750 .500 <- .250 3.0 - 3.5 0 1 6 5 .750 .625 3.5 - 4.0 1 1 7 6 .875 .750 4.0 - 4.5 0 1 7 7 .875 .875 4.5 - 5.0 1 1 8 8 1.00 1.00

First, because there are 8 observation values, I divided the ratings into 8 ranges. The obs column is the observed frequency (number of ratings) in each rating range. The exp is the expected number of ratings in each range if the ratings are evenly distributed — 1 rating in each range. The co is the cumulative count (running total) of observed ratings — because KS works with cumulative frequencies. The ce is the cumulative expected count in each range.

The cof and cef columns are the cumulative observed frequencies and the cumulative expected frequencies — which is just the previous two columns divided by 8.

Now, for KS, you find the largest difference between cumulative observed frequency and cumulative expected frequency. In this example the largest difference is 0.250. Now you look up the so-called critical value of KS for n = 8 from a statistics reference. The critical value, for a 5% significance level for n = 8 is 0.4096. Because the calculated KS statistic of 0.250 is less than the critical value, we conclude there isn't enough evidence to say that the ratings aren't evenly distributed. (Very tricky to phrase.)

If the calculated KS statistic had been greater than 0.4096 we could have concluded that there's evidence (at a 5% significance level) that the movie ratings are not evenly distributed.

There are many details to the KS test, but this blog post should give you a start. In particular, KS is often used to infer if a set of data is Normal (bell-shaped curve) distributed. The tricky part here is calculating the expected frequencies.

In a variation of the KS test, you compare two sets of values to determine if the come from the same distribution. For the example above, if some values were at the midpoint of each rating range, they'd be: (1.25, 1.75, 2.25, 2.75, 3.25, 3.75, 4.25, 4.75). Using SciPy, I ran a two-sample KS test and got the same results (the 0.928954777402 is the probability the two sets of numbers come from the same distribution, so again there's not enough evidence to say the ratings aren't uniformly distributed).

The Kolmogorov-Smirnov test is similar in some respects to the chi-square goodness of fit test. However, the chi-square test works directly with observed and expected counts, not cumulative frequencies.

Classical statistics techniques like KS are primitive and almost laughably crude compared to modern machine learning techniques. But classical statistics can still be useful every now and then.

*In honor of National Women’s Month I did an Internet search for terms related to goodness of fit and famous women. I got results pointing to Seattle business woman Lou Graham who ran a highly successful gentleman’s club in the early 1900s. See https://en.wikipedia.org/wiki/Lou_Graham_(Seattle_madame).*