Pearson r vs. Covariance

When I was teaching statistics, many of my students were confused between correlation and covariance. In a nutshell, correlation is a general term, so you really want to know the difference between Pearson r (the most common form of correlation) and covariance. Pearson r and covariance both are a measure of how the values of two variables are related. If you normalize data, the Pearson r and covariance are exactly the same.


In the Excel sheet in the image, I have some XY data. The Pearson r is 0.99195 indicating near perfect correlation. The covariance of the XY data is 11.25 which doesn’t have an immediate interpretation.

I create normalized X’Y’ data by taking each value, subtracting the mean, and then dividing by the standard deviation. When I calculate the covariance of the normalized data, the result is 0.99195 – the Pearson r of the source data. Put slightly differently, the Pearson r is a specialized form of covariance.

This entry was posted in Machine Learning, Miscellaneous. Bookmark the permalink.