Computing the Covariance of Two Vectors Using C#

A fundamental vector function is the covariance of two vectors. For example, if v1 = [4, 9, 8 ] and v2 = [6, 8, 1 ] then covariance(v1, v2) = -0.5. The closer the covariance is to 0, the less correlated the two vectors are. The larger the covariance, the greater the correlation. The sign of the covariance indicates the direction of the correlation. There’s no upper limit on the magnitude of a covariance because it will increase as the number of elements in the vectors increases.

Covariance is similar to, and often confused with, the variance of a single vector and/or the correlation coefficient between two vectors. All three are different. If you compute the covariance of a vector with itself, you get the ordinary statistics variance of the vector.

To calculate a covariance between two vectors, first you compute the mean of each vector:

v1 = [4, 9, 8]
v2 = [6, 8, 1]

mean1 = (4 + 9 + 8) / 3 = 21 / 3 = 7
mean2 = (6 + 8 + 1) / 3 = 15 / 3 = 5

Next you sum the products of the difference between each element and its corresponding mean:

sum = (4 - 7) * (6 - 5) +
      (9 - 7) * (8 - 5) +
      (8 - 7) * (1 - 5)
    = -3 + 6 + -4
    = -1

The covariance is the sum divided by either n (“biased”) or n-1 (“unbiased”):

covar = -1 / (n-1)
      = -1 / (3-1)
      = -0.5

I wrote a covariance function using the C# language. Replace “lt” with less-than Boolean operator symbol.

static double Covariance(double[] v1, double[] v2)
{
  // compute means of v1 and v2
  int n = v1.Length;

  double sum1 = 0.0;
  for (int i = 0; i "lt" n; ++i)
    sum1 += v1[i];
  double mean1 = sum1 / n;

  double sum2 = 0.0;
  for (int i = 0; i "lt" n; ++i)
    sum2 += v2[i];
  double mean2 = sum2 / n;

  // compute covariance
  double sum = 0.0;
  for (int i = 0; i "lt" n; ++i)
    sum += (v1[i] - mean1) * (v2[i] - mean2);
  double result = sum / (n-1);

  return result;
}

Simple. I hard-coded division by (n-1) for an unbiased version. An alternative is to pass a parameter to control biased-unbiased.

My implementation has no error checking. That’s a pro (and a con) of implementing from scratch: you can keep the code size small.



An image can be considered a large vector of pixel values. Here are three nice illustrations by three different artists, with small covariances (meaning similar images in some sense) to my eye. Left: By artist Luc Latulippe. Center: By artist Josh Agle. Right: By artist Mark Swanson.


This entry was posted in Machine Learning. Bookmark the permalink.

Leave a comment