Support Vector Machine From Scratch Using C#

Quite some time ago I implemented a support vector machine (SVM) classifier from scratch using C#. I decided to revisit that code.

Sheesh. I had forgotten how insanely complex SVM code is. And complexity is the real problem with SVMs.

As I’m starting to write this blog post I’m getting a headache because even a brief description of SVMs would take pages and pages. But I’ll try, by using a concrete example.

An SVM is a binary classifier. It is possible to extend SVMs to multi-class classification using one-versus-all but OVA has many problems — that’s another story.

My demo sets up 8 training items:

[0]    0.2900   0.3600   0.5000  |   -1
[1]    0.5000   0.2900   0.1400  |   -1
[2]    0.0000   0.4300   0.8600  |   -1
[3]    0.0700   0.2900   0.5700  |   -1
[4]    0.6400   0.5000   0.3600  |    1
[5]    0.9800   0.5000   0.0000  |    1
[6]    0.4300   0.6400   0.8600  |    1
[7]    0.5700   0.6400   0.7100  |    1

Each item has three predictor values. The two possible classes are encoded as -1 or +1. Next, the demo creates an SVM classifier:

Creating SVM with poly kernel degree = 2, gamma = 4.0, coef = 0.0
Starting training
Training complete in 13 iterations

An SVM must specify a kernel function. The two most common kernel functions are polynomial kernel and RBF (radial basis function) kernel. Each type of kernel has parameters. The kernel type and its parameter values must be determined by trial and error — a huge pain. I used a polynomial kernel with parameters degree = 2, gamma = 4, and coef = 0.

Training produces “support vectors”, weights and a bias:

Support vectors:
0.2900  0.3600  0.5000
0.5000  0.2900  0.1400
0.6400  0.5000  0.3600

Weights:
-0.260310 -0.382273 0.642583

Bias = -2.541029

The support vectors are training items, in this case items [0], [1], [4]. There is one weight per support vector. There is one bias.

To make a prediction for an input of [x, y, z] you combine the [x, y, z] and the support vectors using the kernel function, then apply the weights and the bias. If the output is negative the prediction is class -1, if the prediction is positive the prediction is class +1.

Predicted decision value for [0] =  -1.000000
Predicted decision value for [1] =  -1.000000
Predicted decision value for [2] =  -1.503370
Predicted decision value for [3] =  -1.877962
Predicted decision value for [4] =   0.998002
Predicted decision value for [5] =   2.006503
Predicted decision value for [6] =   1.648473
Predicted decision value for [7] =   2.216912

Model accuracy on test data = 1.0000

Predicted value for (0.21, 0.36, 0.50) = -1.314
Predicted label for (0.21, 0.36, 0.50) = -1

None of the colleagues I work with use SVM classification anymore. SVM were popular in the late 1990s — for reasons that aren’t entirely clear to me. Neural network classifiers are, in general, easier to use, easier to understand, easier to implement, more flexible, and give better results than SVMs. That said, SVM outputs are a bit easier to interpret than neural network outputs, and I have seen some very rare problems where SVMs work better than neural binary classification systems.

But SVMs are incredibly fascinating.



In computer systems complexity is almost always a bad thing. In art, complexity can be good or bad. Three examples where the complexity is appealing (to me anyway). Left: By artist Randy Monteith. Center: By artist Daniel Arrhakis. Right: By artist Hans Jochem Bakker.


Demo code below. Replace “lt”, “gt”, “lte”, “gte” with symbols.


using System;
using System.Collections.Generic;
// for experimentation only. not tested for production use.

namespace SVM_CSharp
{
  class SVM_Program
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nBegin SVM from scratch C# demo\n");

      double[][] train_X = new double[8][] {
      new double[] { 0.29, 0.36, 0.50 }, new double[] { 0.50, 0.29, 0.14 },
      new double[] { 0.00, 0.43, 0.86 }, new double[] { 0.07, 0.29, 0.57 },
      new double[] { 0.64, 0.50, 0.36 }, new double[] { 0.98, 0.50, 0.00 },
      new double[] { 0.43, 0.64, 0.86 }, new double[] { 0.57, 0.64, 0.71 } };

      int[] train_y = new int[8] { -1, -1, -1, -1, 1, 1, 1, 1 };

      Console.WriteLine("Training data:");
      for (int i = 0; i "lt" train_X.Length; ++i)
      {
        Console.Write("[" + i + "] ");
        for (int j = 0; j "lt" train_X[i].Length; ++j)
          Console.Write(train_X[i][j].ToString("F4").PadLeft(9));
        Console.WriteLine("  |  " + train_y[i].ToString().PadLeft(3));
      }

      Console.WriteLine("\nCreating SVM with poly kernel degree = 2, " +
        "gamma = 4.0, coef = 0.0");
      var svm = new SupportVectorMachine("poly", 0); // poly kernel, seed
      svm.gamma = 4.0;  // poly
      svm.coef = 0.0;   // poly
      svm.degree = 2;   // poly

      //Console.WriteLine("\nCreating SVM with RBF kernel gamma = 1.0");
      //var svm = new SupportVectorMachine("rbf", 0);  // rbf kernel 
      //svm.gamma = 1.0;   // rbf

      svm.complexity = 1.0;  // training parameters
      svm.tolerance = 0.001;
      int maxIter = 1000;

      Console.WriteLine("Starting training");
      int iter = svm.Train(train_X, train_y, maxIter);
      Console.WriteLine("Training complete in " + iter + " iterations\n");

      Console.WriteLine("Support vectors:");
      foreach (double[] vec in svm.supportVectors)
      {
        for (int i = 0; i "lt" vec.Length; ++i)
          Console.Write(vec[i].ToString("F4") + "  ");
        Console.WriteLine("");
      }

      Console.WriteLine("\nWeights: ");
      for (int i = 0; i "lt" svm.weights.Length; ++i)
        Console.Write(svm.weights[i].ToString("F6") + " ");
      Console.WriteLine("");

      Console.WriteLine("\nBias = " + svm.bias.ToString("F6") + "\n");

      for (int i = 0; i "lt" train_X.Length; ++i)
      {
        double pred = svm.ComputeDecision(train_X[i]);
        Console.Write("Predicted decision value for [" + i + "] = ");
        Console.WriteLine(pred.ToString("F6").PadLeft(10));
      }

      double acc = svm.Accuracy(train_X, train_y);
      Console.WriteLine("\nModel accuracy on test data = " +
        acc.ToString("F4"));

      // double[] unknown = new double[] { 3, 5, 7 };
      double[] unknown = new double[] { 0.21, 0.36, 0.50 };
      //for (int i = 0; i "lt" 3; ++i)
      //  unknown[i] /= 14.0;

      double predDecVal = svm.ComputeDecision(unknown);
      Console.WriteLine("\nPredicted value for (0.21, 0.36, 0.50) = " +
        predDecVal.ToString("F3"));
      int predLabel = Math.Sign(predDecVal);
      Console.WriteLine("\nPredicted label for (0.21, 0.36, 0.50) = " +
        predLabel);

      Console.WriteLine("\nEnd demo ");
      Console.ReadLine();
    } // Main
  } // Program

  // ======================================================================

  public class SupportVectorMachine
  {
    public Random rnd;

    public double complexity = 1.0;
    public double tolerance = 1.0e-3;  // error tolerance
    public double epsilon = 1.0e-3;

    public List supportVectors;  // at least 2 of them
    public double[] weights;  // one weight per support vector
    public double[] alpha;    // one alpha per training item
    public double bias;

    public double[] errors;  // cache. one per training item

    public string kernelType = "poly";  // poly is default, rbf alternative
    public double gamma = 1.0;
    public double coef = 0.0;
    public int degree = 2;

    public SupportVectorMachine(string kernelType, int seed)
    {
      //if (kernelType != "poly")
      //  throw new Exception("This SVM uses hard-coded polynomial kernel");
      this.kernelType = kernelType;
      this.rnd = new Random(seed);
      this.supportVectors = new List();
      // this.weights allocated after know how many support vecs there are
    } // ctor

    // --------------------------------------------------------------------

    public double PolyKernel(double[] v1, double[] v2)
    {
      double sum = 0.0;
      for (int i = 0; i "lt" v1.Length; ++i)
        sum += v1[i] * v2[i];
      double z = this.gamma * sum + this.coef;
      return Math.Pow(z, this.degree);
    }

    // --------------------------------------------------------------------

    public double RbfKernel(double[] v1, double[] v2)
    {
      double sum = 0.0;
      for (int i = 0; i "lt" v1.Length; ++i)
        sum += (v1[i] - v2[i]) * (v1[i] - v2[i]);
      return Math.Exp(-this.gamma * sum);
    }

    // --------------------------------------------------------------------
    public double ComputeDecision(double[] input)
    {
      double sum = 0;
      for (int i = 0; i "lt" this.supportVectors.Count; ++i)
        if (this.kernelType == "poly")
          sum += this.weights[i] *  
            this.PolyKernel(this.supportVectors[i], input);
        else if (this.kernelType == "rbf")
          sum += this.weights[i] * 
            this.RbfKernel(this.supportVectors[i], input);
      sum += this.bias;
      return sum;
    }

    // --------------------------------------------------------------------

    public int Train(double[][] X_matrix, int[] y_vector, int maxIter)
    {
      int N = X_matrix.Length;
      this.alpha = new double[N];
      this.errors = new double[N];
      int numChanged = 0;
      bool examineAll = true;
      int iter = 0;

      while (iter "lt" maxIter && numChanged "gt" 0 || examineAll == true)
      {
        ++iter;
        numChanged = 0;
        if (examineAll == true)
        {
          // all training examples
          for (int i = 0; i "lt" N; ++i)
            numChanged += ExamineExample(i, X_matrix, y_vector);
        }
        else
        {
          // examples where alpha is not 0 and not C
          for (int i = 0; i "lt" N; ++i)
            if (this.alpha[i] != 0 && this.alpha[i] != this.complexity)
              numChanged += ExamineExample(i, X_matrix, y_vector);
        }

        if (examineAll == true)
          examineAll = false;
        else if (numChanged == 0)
          examineAll = true;
      }

      List indices = new List();  // indices of support vectors
      for (int i = 0; i "lt" N; ++i)
      {
        // Only store vectors with Lagrange multipliers "gt" 0
        if (this.alpha[i] "gt" 0) indices.Add(i);
      }

      int num_supp_vectors = indices.Count;
      this.weights = new double[num_supp_vectors];
      for (int i = 0; i "lt" num_supp_vectors; ++i)
      {
        int j = indices[i];
        this.supportVectors.Add(X_matrix[j]);
        this.weights[i] = this.alpha[j] * y_vector[j];
      }
      this.bias = -1 * this.bias;
      return iter;
    } // Train

    // --------------------------------------------------------------------

    public double Accuracy(double[][] X_matrix, int[] y_vector)
    {
      // Compute classification accuracy
      int numCorrect = 0; int numWrong = 0;
      for (int i = 0; i "lt" X_matrix.Length; ++i)
      {
        double signComputed = Math.Sign(ComputeDecision(X_matrix[i]));
        if (signComputed == Math.Sign(y_vector[i]))
          ++numCorrect;
        else
          ++numWrong;
      }

      return (1.0 * numCorrect) / (numCorrect + numWrong);
    }

    // --------------------------------------------------------------------

    private bool TakeStep(int i1, int i2,
      double[][] X_matrix, int[] y_vector)
    {
      // "Sequential Minimal Optimization: A Fast Algorithm for
      // Training Support Vector Machines", J. Platt, 1998.
      if (i1 == i2) return false;

      double C = this.complexity;
      double eps = this.epsilon;

      double[] x1 = X_matrix[i1];  // "point" at index i1
      double alph1 = this.alpha[i1];    // Lagrange multiplier for i1
      double y1 = y_vector[i1];    // label

      double E1;
      if (alph1 "gt" 0 && alph1 "lt" C)
        E1 = this.errors[i1];
      else
        E1 = ComputeAll(x1, X_matrix, y_vector) - y1;

      double[] x2 = X_matrix[i2];  // index i2
      double alph2 = this.alpha[i2];
      double y2 = y_vector[i2];

      // SVM output on point [i2] - y2 (check in error cache)
      double E2;
      if (alph2 "gt" 0 && alph2 "lt" C)
        E2 = this.errors[i2];
      else
        E2 = ComputeAll(x2, X_matrix, y_vector) - y2;

      double s = y1 * y2;

      // Compute L and H via equations (13) and (14)
      double L; double H;
      if (y1 != y2)
      {
        L = Math.Max(0, alph2 - alph1);  // 13a
        H = Math.Min(C, C + alph2 - alph1);  // 13b
      }
      else
      {
        L = Math.Max(0, alph2 + alph1 - C);  // 14a
        H = Math.Min(C, alph2 + alph1);  // 14b
      }

      if (L == H) return false;

      double eta = 0.0;
      double k11 = 0.0;
      double k12 = 0.0;
      double k22 = 0.0;
      if (this.kernelType == "poly")
      {
        k11 = this.PolyKernel(x1, x1);  // conveniences
        k12 = this.PolyKernel(x1, x2);
        k22 = this.PolyKernel(x2, x2);
        eta = k11 + k22 - 2 * k12;  // 15
      }
      else if (this.kernelType == "rbf")
      {
        k11 = this.RbfKernel(x1, x1);  // conveniences
        k12 = this.RbfKernel(x1, x2);
        k22 = this.RbfKernel(x2, x2);
        eta = k11 + k22 - 2 * k12;  // 15
      }

      double a1; double a2;
      if (eta "gt" 0)
      {
        a2 = alph2 - y2 * (E2 - E1) / eta;  // 16

        if (a2 "gte" H) a2 = H;  // 17a
        else if (a2 "lte" L) a2 = L;  // 17b
      }
      else  // "Under unusual circumstances, eta will not be positive"
      {
        double f1 =
          y1 * (E1 + this.bias) - alph1 * k11 - s * alph2 * k12;  // 19a
        double f2 =
          y2 * (E2 + this.bias) - alph2 * k22 - s * alph1 * k12;  // 19b
        double L1 = alph1 + s * (alph2 - L);  // 19c
        double H1 = alph1 + s * (alph2 - H);  // 19d
        double Lobj = (L1 * f1) + (L * f2) + (0.5 * L1 * L1 * k11) +
          (0.5 * L * L * k22) + (s * L * L1 * k12);  // 19e
        double Hobj = (H1 * f1) + (H * f2) + (0.5 * H1 * H1 * k11) +
          (0.5 * H * H * k22) + (s * H * H1 * k12);  // 19f

        if (Lobj "lt" Hobj - eps) a2 = L;
        else if (Lobj "gt" Hobj + eps) a2 = H;
        else a2 = alph2;
      }

      if (Math.Abs(a2 - alph2) "lt" eps * (a2 + alph2 + eps))
        return false;

      a1 = alph1 + s * (alph2 - a2);  // 18

      // Update threshold (biasa). See section 2.3
      double b1 = E1 + y1 * (a1 - alph1) * k11 +
        y2 * (a2 - alph2) * k12 + this.bias;
      double b2 = E2 + y1 * (a1 - alph1) * k12 +
        y2 * (a2 - alph2) * k22 + this.bias;
      double newb;
      if (0 "lt" a1 && C "gt" a1)
        newb = b1;
      else if (0 "lt" a2 && C "gt" a2)
        newb = b2;
      else
        newb = (b1 + b2) / 2;

      double deltab = newb - this.bias;
      this.bias = newb;

      // Update error cache using new Lagrange multipliers
      double delta1 = y1 * (a1 - alph1);
      double delta2 = y2 * (a2 - alph2);

      for (int i = 0; i "lt" X_matrix.Length; ++i)
      {
        if (0 "lt" this.alpha[i] && this.alpha[i] "lt" C)
        {
          if (this.kernelType == "poly")
            this.errors[i] += delta1 * this.PolyKernel(x1, X_matrix[i]) +
              delta2 * this.PolyKernel(x2, X_matrix[i]) - deltab;
          else if (this.kernelType == "rbf")
            this.errors[i] += delta1 * this.RbfKernel(x1, X_matrix[i]) +
              delta2 * this.RbfKernel(x2, X_matrix[i]) - deltab;
        }
      }

      this.errors[i1] = 0.0;
      this.errors[i2] = 0.0;
      this.alpha[i1] = a1;  // Store a1 in the alpha array
      this.alpha[i2] = a2;  // Store a2 in the alpha array

      return true;
    } // TakeStep

    // --------------------------------------------------------------------

    private int ExamineExample(int i2, double[][] X_matrix, int[] y_vector)
    {
      // "Sequential Minimal Optimization: A Fast Algorithm for
      // Training Support Vector Machines", Platt, 1998.
      double C = this.complexity;
      double tol = this.tolerance;

      double[] x2 = X_matrix[i2]; // "point" at i2
      double y2 = y_vector[i2];   // class label for p2
      double alph2 = this.alpha[i2];   // Lagrange multiplier for i2

      // SVM output on point[i2] - y2. (check in error cache)
      double E2;
      if (alph2 "gt" 0 && alph2 "lt" C)
        E2 = this.errors[i2];
      else
        E2 = ComputeAll(x2, X_matrix, y_vector) - y2;

      double r2 = y2 * E2;

      if ((r2 "lt" -tol && alph2 "lt" C) || (r2 "gt" tol && alph2 "gt" 0))
      {
        // See section 2.2
        int i1 = -1; double maxErr = 0;
        for (int i = 0; i "lt" X_matrix.Length; ++i)
        {
          if (this.alpha[i] "gt" 0 && this.alpha[i] "lt" C)
          {
            double E1 = this.errors[i];
            double delErr = System.Math.Abs(E2 - E1);

            if (delErr "gt" maxErr)
            {
              maxErr = delErr;
              i1 = i;
            }
          }
        }

        if (i1 "gte" 0 && TakeStep(i1, i2, X_matrix, y_vector)) return 1;

        int rndi = this.rnd.Next(X_matrix.Length);
        for (i1 = rndi; i1 "lt" X_matrix.Length; ++i1)
        {
          if (this.alpha[i1] "gt" 0 && this.alpha[i1] "lt" C)
            if (TakeStep(i1, i2, X_matrix, y_vector)) return 1;
        }
        for (i1 = 0; i1 "lt" rndi; ++i1)
        {
          if (this.alpha[i1] "gt" 0 && this.alpha[i1] "lt" C)
            if (TakeStep(i1, i2, X_matrix, y_vector)) return 1;
        }

        // "Both the iteration through the non-bound examples and the
        // iteration through the entire training set are started at
        // random locations"
        rndi = this.rnd.Next(X_matrix.Length);
        for (i1 = rndi; i1 "lt" X_matrix.Length; ++i1)
        {
          if (TakeStep(i1, i2, X_matrix, y_vector)) return 1;
        }
        for (i1 = 0; i1 "lt" rndi; ++i1)
        {
          if (TakeStep(i1, i2, X_matrix, y_vector)) return 1;
        }
      } // if ((r2 "lt" -tol && alph2 "lt" C) || (r2 "gt" tol && alph2 "gt" 0))

      // "In extremely degenerate circumstances, none of the examples
      // will make an adequate second example. When this happens, the
      // first example is skipped and SMO continues with another chosen
      // first example."
      return 0;
    } // ExamineExample

    // --------------------------------------------------------------------

    private double ComputeAll(double[] vector,
      double[][] X_matrix, int[] y_vector)
    {
      // output using all training data, even if alpha[] is zero
      double sum = -this.bias;  // quirk of SMO paper
      for (int i = 0; i "lt" X_matrix.Length; ++i)
      {
        if (this.alpha[i] "gt" 0)
        {
          if (this.kernelType == "poly")
            sum += this.alpha[i] * y_vector[i] * 
              this.PolyKernel(X_matrix[i], vector);
          else if (this.kernelType == "rbf")
            sum += this.alpha[i] * y_vector[i] * 
              this.RbfKernel(X_matrix[i], vector);
        }
      }
      return sum;
    }

  } // class SupportVectorMachine
} // ns
  // ========================================================================
This entry was posted in Machine Learning. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s