Statistics and Anscombe’s Quartet

A colleague of mine recently reminded me of an example I used often when I was teaching statistics. The example is called Anscombe’s Quartet. In the image below there are four data sets, each with 11 pairs of (x,y) values. When you graph the four data sets, they appear quite different.

However, the means and variances of the four data sets are all equal!

Red
Mean X = 9.00  Mean Y = 7.50  Var X = 11.00  Var Y = 4.125

Blue
Mean X = 9.00  Mean Y = 7.50  Var X = 11.00  Var Y = 4.125

Green
Mean X = 9.00  Mean Y = 7.50  Var X = 11.00  Var Y = 4.125

Orange
Mean X = 9.00  Mean Y = 7.50  Var X = 11.00  Var Y = 4.125

And if you compute the linear regression line for each set, you also get identical results: Y = 3.00 + 0.5X.

The moral of the story is that in many cases it’s not enough to calculate summary statistics. Sometimes you should graph your data too.

Advertisements
This entry was posted in Machine Learning. Bookmark the permalink.