The Wasserstein Distance Using C#

The Wasserstein distance has many different variations. In its simplest form the Wasserstein distance function measures the distance between two discrete probability distributions For example, if:

 double[] P = new double[]
   { 0.6, 0.1, 0.1, 0.1, 0.1 };
 double[] Q1 = new double[]
   { 0.1, 0.1, 0.6, 0.1, 0.1 };
 double[] Q2 = new double[]
   { 0.1, 0.1, 0.1, 0.1, 0.6 }; 

Wasserstein(P, Q1) = 1.00
Wasserstein(P, Q2) = 2.00

Conceptually, if P is considered to be piles of dirt and Q is considered to be holes, then Wasserstein(P, Q) is the minimum amount of work (amount of dirt times distance moved) needed to transfer all dirt to the holes. Or you can think of Wasserstein as the effort required to transform P into Q.

I use the Python language for most of my machine learning projects, but sometimes I use the C# language. I coded up a quick demo of a highly simplified Wasserstein distance function:

using System;
namespace Wasserstein
{
  class Program
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nBegin demo \n");

      double[] P = new double[]
        { 0.6, 0.1, 0.1, 0.1, 0.1 };
      double[] Q1 = new double[]
        { 0.1, 0.1, 0.6, 0.1, 0.1 };
      double[] Q2 = new double[]
        { 0.1, 0.1, 0.1, 0.1, 0.6 };

      double wass_p_q1 = MyWasserstein(P, Q1);
      double wass_p_q2 = MyWasserstein(P, Q2);

      Console.WriteLine("Wasserstein(P, Q1) = " +
        wass_p_q1.ToString("F4"));
      Console.WriteLine("Wasserstein(P, Q2) = " +
        wass_p_q2.ToString("F4"));

      Console.WriteLine("\nEnd demo ");
      Console.ReadLine();
    }  // Main

    static int FirstNonZero(double[] vec)
    {
      int dim = vec.Length;
      for (int i = 0; i  0.0)
          return i;
      return -1;
    }

    static double MoveDirt(double[] dirt, int di,
      double[] holes, int hi)
    {
      double flow = 0.0;
      int dist = 0;
      if (dirt[di]  holes[hi])
      {
        flow = holes[hi];
        dirt[di] -= flow;
        holes[hi] = 0.0;
      }
      dist = Math.Abs(di - hi);
      return flow * dist;
    }

    static double MyWasserstein(double[] p, double[] q)
    {
      double[] dirt = (double[])p.Clone();
      double[] holes = (double[])q.Clone();
      double totalWork = 0.0;
      while (true)
      {
        int fromIdx = FirstNonZero(dirt);
        int toIdx = FirstNonZero(holes);
        if (fromIdx == -1 || toIdx == -1)
          break;
        double work = MoveDirt(dirt, fromIdx,
          holes, toIdx);
        totalWork += work;
      }
      return totalWork;
    }
  }  // Program
}  // ns

There are many complex variations of Wasserstein. My C# Wasserstein demo works only with discrete probability distributions where each data item is a single-valued probability.

Python is the programming language of choice for most machine learning systems. But C# is often the language of choice for business-related systems so it’s nice to be able to implement ML functions in C# when needed, rather than try to glue a Python ML system to a C# business system.



Ray Cummings (1887-1957) was one of the early pioneers of science fiction. Decades ago, the conceptual distance between science fiction and reality was much greater than today. For example, in the early 1900s when the first airplanes could barely fly, stories about space travel must have seemed impossible. Today, sending a probe to Mars is almost commonplace.

This entry was posted in Machine Learning. Bookmark the permalink.

1 Response to The Wasserstein Distance Using C#

  1. Kevin Davis says:

    Mahalo Dr. McCaffrey!!!

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s