Statistics and Anscombe’s Quartet

A colleague of mine recently reminded me of an example I used often when I was teaching statistics. The example is called Anscombe’s Quartet. In the image below there are four data sets, each with 11 pairs of (x,y) values. When you graph the four data sets, they appear quite different.

However, the means and variances of the four data sets are all equal!

Red
Mean X = 9.00  Mean Y = 7.50  Var X = 11.00  Var Y = 4.125

Blue
Mean X = 9.00  Mean Y = 7.50  Var X = 11.00  Var Y = 4.125

Green
Mean X = 9.00  Mean Y = 7.50  Var X = 11.00  Var Y = 4.125

Orange
Mean X = 9.00  Mean Y = 7.50  Var X = 11.00  Var Y = 4.125

And if you compute the linear regression line for each set, you also get identical results: Y = 3.00 + 0.5X.

The moral of the story is that in many cases it’s not enough to calculate summary statistics. Sometimes you should graph your data too.

Advertisements
This entry was posted in Machine Learning. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s