The t-SNE Data Visualization Technique From Scratch Using C#

The t-SNE (“t-distributed Stochastic Neighbor Embedding”) technique is a method for visualizing high-dimensional data. The basic form of the technique is very specific: it converts data with three or more columns into data with just two columns, which can then be visualized as an XY graph. No more, no less.

For example, the UCI digits dataset contains 5,620 (3,823 train, 1,797 test) crude 8-by-8 images of handwritten ‘0’ through ‘9’ digits. Each digit is stored as a row of the 64 pixel values. You can’t graph data that has 64 dimensions but if you reduce the data to 2 dimensions, the reduced data can be visualized on an XY graph.

In most scenarios, the easiest way to apply t-SNE is to use an existing library such the Python scikit-learn sklearn.manifold.TSNE module. But to really understand how t-SNE works, and its strengths and weaknesses, you can implement t-SNE from scratch.

There are quite a few from-scratch Python versions of t-SNE floating around the Internet (including one I wrote in a blog post a couple of days ago), but I decided to implement t-SNE using C#. My C# implementation is a direct refactoring of a Python implementation by researcher L. van der Maaten, who co-authored the original t-SNE paper.



Example UCI digits. If you squint, you can see the digits better.


For my demo, I used a tiny 15-item subset of the UCI Digits dataset. For simplicity I extracted the first five ‘0’ items, the first five ‘1’ items, and the first five ‘2’ items. The source data looks like:

 
[ 0]  0,0,5,13,9,1,0,0,0,0,13 . . 0
[ 1]  0,1,9,15,11,0,0,0,0,11 . . 0
. . .
[14]  0,0,0,3,15,10,1,0,0,0,0 . . 2

Each line has 65 values. The first 64 values on each line are pixel values between 0 (white) and 16 (black). The last value on each line is the digit represented, 0, 1, or 2. The t-SNE technique does not need class labels but the demo uses labeled data so that the resulting condensed data can be evaluated. Unlike many machine learning techniques, t-SNE does not require data normalization. The leading index values, [0] through [14] are not part of the data, and are displayed for convenience.

The key C# statements in the demo are:

string fn =
  "..\\..\\..\\Data\\optdigits_test_012_15.txt";
int[] cols = new int[64];
for (int i = 0; i "lt" 64; ++i) cols[i] = i;
double[][] X = TSNE.MatLoad(fn, cols, ',', "#");

double[][] reduced = TSNE.Reduce(X, maxIter: 600,
  perplexity: 5);

Console.WriteLine("Result: ");
TSNE.MatShow(reduced, 4, 12, true);

The file name indicates that the file contains a 15-item subset of the UCI Digits test dataset, containing only ‘0’, ‘1’, ‘2’ items. The program-defined MatLoad() function loads comma-delimited columns 0 to 63 inclusive (the pixel values), where lines beginning with “#” are interpreted as comments.



The reduced data visualized as an XY graph


The Reduce() function accepts a maxIter argument of 600 iterations, and a perplexity argument of 5. The maxIter and perplexity values must be determined by trial and error. The prerplexity value specifies how many data item neighbors to use. Because the dataset only has 15 items, a small perplexity value is used.

The resulting reduced 15-item data is:

[ 0]   -337.1624    -72.7990
[ 1]   -270.6862    -41.9135
[ 2]   -220.7945    -87.5525
[ 3]   -349.0787   -139.6775
[ 4]   -277.9941   -120.1643
[ 5]    237.8607     21.8573
[ 6]    194.9783   -112.4145
[ 7]    264.3068    -66.9215
[ 8]    253.1594   -134.5962
[ 9]    202.8873    -40.9125
[10]    -31.8724    114.8531
[11]    176.6870    225.8076
[12]    125.7436    223.0445
[13]     16.3423    151.2865
[14]     15.6229     80.1027

When the reduced data is graphed using the first column as x and the second column as y, the reduced data groups into three distinct clusters, as expected. If you didn’t know beforehand how the data was ordered, you’d learn that the first three items are similar, the second three are similar, and the last three or similar.


The Rocky and Bullwinkle TV Show ran from 1959 to 1964. Even though the animation detail was deliberately reduced for easy visualization, the humor was quite sophisticated and often hilarious. The simple art enhanced, rather than detracted from, the show’s quality. Rocky was a good-natured flying squirrel, Bullwinkle was a dim-witted moose, Boris Badenov was a height-challenged villain whose plans always failed (much to the annoyance of his boss, Fearless Leader), and Natasha Fatale was Boris’ assistant. The R and B show featured other sub-shows including Peabody’s Improbable History where dog scientist Mr. Peabody and his protege Sherman, would travel back in time via the Wayback machine.


I have posted the C# source code on GitHub at https://github.com/jdmccaffrey/tsne-csharp. I’ve also pasted the code below.

Demo code. Long. Data below. Replace “lt” (less-than), “gt”, “lte”, “gte”, “and” with Boolean operator symbols (my blog editor consistently chokes on symbols).

using System;
using System.IO;

// Paper: "Visualizing Data using t-SNE" (2008), 
//   L. van der Maaten and G. Hinton.
// Python code: https://lvdmaaten.github.io/tsne/

namespace TSNE
{
  internal class TSNEProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nBegin t-SNE demo ");
      Console.WriteLine("\nLoading 15-item subset" +
        " of UCI digits 64-dim data ");

      string fn =
      "..\\..\\..\\Data\\optdigits_test_012_15.txt";
      int[] cols = new int[64];
      for (int i = 0; i "lt" 64; ++i) cols[i] = i;
      double[][] X = TSNE.MatLoad(fn, cols, ',', "#");
      Console.WriteLine("\nX (first 12 columns): ");
      TSNE.MatShow(X, 12, 1, 6, true);

      // labels not used by t-SNE but used for graph
      Console.WriteLine("\ny labels: ");
      int[] y = TSNE.VecLoad(fn, 64, "#");  // col [64]
      TSNE.VecShow(y, 3);

      Console.WriteLine("\nApplying TSNE reduction ");
      Console.WriteLine("maxIter = 600, perplexity = 5 ");
      double[][] reduced = TSNE.Reduce(X, maxIter: 600,
        perplexity: 5);
      Console.WriteLine("Done ");

      Console.WriteLine("\nResult: ");
      TSNE.MatShow(reduced, 4, 12, true);

      Console.WriteLine("\nEnd demo ");
      Console.ReadLine();
    }
  } // Program

  // ========================================================

  public class TSNE
  {
    // wrapper class for static Reduce() and its helpers

    public static double[][] Reduce(double[][] X,
      int maxIter, int perplexity)
    {
      // perplexity is number of nearest neighbors
      // calls ComputeP(), which calls ComputePH()
      // number of result dimension fixed to 2   
      int n = X.Length;
      double initialMomentum = 0.5;
      double finalMomentum = 0.8;
      double eta = 500.0;
      double minGain = 0.01;

      // initialize result Y
      Gaussian g = new Gaussian(mean: 0.0, sd: 1.0,
        seed: 1);
      double[][] Y = MatCreate(n, 2);
      for (int i = 0; i "lt" n; ++i)
        for (int j = 0; j "lt" 2; ++j)
          Y[i][j] = g.NextGaussian();

      double[][] dY = MatCreate(n, 2);
      double[][] iY = MatCreate(n, 2);
      double[][] Gains = MatOnes(n, 2);

      double[][] P = ComputeP(X, perplexity);
      P = MatAdd(P, MatTranspose(P));
      double sumP = MatSum(P);
      for (int i = 0; i "lt" n; ++i)
      {
        for (int j = 0; j "lt" n; ++j)
        {
          P[i][j] = (P[i][j] * 4.0) / sumP;
          if (P[i][j] "lt" 1.0e-12)
            P[i][j] = 1.0e-12;
        }
      }

      for (int iter = 0; iter "lt" maxIter; ++iter)
      {
        double[] rowSums = new double[n]; // sum squared
        for (int i = 0; i "lt" n; ++i)
        {
          double rowSum = 0.0;
          for (int j = 0; j "lt" 2; ++j)
          {
            rowSum += Y[i][j] * Y[i][j];
          }
          rowSums[i] = rowSum;
        }

        double[][] Num = MatProduct(Y, MatTranspose(Y));
        for (int i = 0; i "lt" Num.Length; ++i)
          for (int j = 0; j "lt" Num[0].Length; ++j)
            Num[i][j] *= -2.0;

        // num = 1 / (1 + np.add(np.add(num,sum_Y).T,sum_Y))
        // num[range(n), range(n)] = 0.

        double[][] tmp0 = MatVecAdd(Num, rowSums);
        double[][] tmp1 = MatTranspose(tmp0);
        double[][] tmp2 = MatVecAdd(tmp1, rowSums);
        double[][] tmp3 = MatAddScalar(tmp2, 1.0);
        double[][] tmp4 = MatInvertElements(tmp3);
        Num = MatCopy(tmp4);
        MatZeroOutDiag(Num);

        double sumNum = MatSum(Num);
        double[][] Q = MatCopy(Num);
        for (int i = 0; i "lt" Num.Length; ++i)
          for (int j = 0; j "lt" Num[0].Length; ++j)
          {
            Q[i][j] = Q[i][j] / sumNum;
            if (Q[i][j] "lt" 1.0e-12)
              Q[i][j] = 1.0e-12;
          }

        double[][] PminusQ = MatDiff(P, Q);

        for (int i = 0; i "lt" n; ++i)  // compute dY
        {
          double[] tmpA = MatExtractRow(Y, i);
          double[][] tmpB = VecMinusMat(tmpA, Y); // tricky!
          double[] tmpC = MatExtractColumn(PminusQ, i);
          double[] tmpD = MatExtractColumn(Num, i);
          double[] tmpK = VecMult(tmpC, tmpD);
          double[][] tmpE = VecTile(tmpK, 2);
          double[][] tmpF = MatTranspose(tmpE);
          double[][] tmpG = MatMultiply(tmpF, tmpB);
          double[] tmpZ = MatColumnSums(tmpG);
          MatReplaceRow(dY, i, tmpZ);
        } // i

        double momentum;
        if (iter "lt" 20)
          momentum = initialMomentum;
        else
          momentum = finalMomentum;

        // compute Gains
        for (int i = 0; i "lt" Gains.Length; ++i)
        {
          for (int j = 0; j "lt" Gains[0].Length; ++j)
          {
            if ((dY[i][j] "gt" 0.0 "and" 
              iY[i][j] "lte" 0.0) ||
              (dY[i][j] "lte" 0.0 "and"
               iY[i][j] "gt" 0.0))
              Gains[i][j] = Gains[i][j] + 0.2;
            else if ((dY[i][j] "gt" 0.0 "and"
              iY[i][j] "gt" 0.0) ||
              (dY[i][j] "lte" 0.0 "and" 
              iY[i][j] "lte" 0.0))
              Gains[i][j] = Gains[i][j] * 0.8;
          }
        }

        for (int i = 0; i "lt" Gains.Length; ++i)
          for (int j = 0; j "lt" Gains[0].Length; ++j)
            if (Gains[i][j] "lt" minGain)
              Gains[i][j] = minGain;

        // use dY to compute iY to update result Y
        for (int i = 0; i "lt" iY.Length; ++i)
        {
          for (int j = 0; j "lt" iY[0].Length; ++j)
          {
            iY[i][j] = (momentum * iY[i][j]) -
              (eta * Gains[i][j] * dY[i][j]);
            Y[i][j] += iY[i][j];
          }
        }

        double[] meansY = MatColumnMeans(Y);
        double[][] meansTile = VecTile(meansY, n);
        Y = MatDiff(Y, meansTile);  // updated result

        if ((iter + 1) % 100 == 0)
        {
          // cost aka error
          double C = MatSum(MatMultLogDivide(P, P, Q));
          Console.WriteLine("iter = " + (iter + 1).
            ToString().PadLeft(6) +
            "  |  error = " + C.ToString("F4"));
        }

        if (iter == 100) // undo the initial expansion of P
        {
          for (int i = 0; i "lt" P.Length; ++i)
            for (int j = 0; j "lt" P[0].Length; ++j)
              P[i][j] /= 4.0;
        }
      } // iter

      return Y;
    } // Reduce()

    // ------------------------------------------------------

    private static double[][] ComputeP(double[][] X,
      int perplexity)
    {
      // helper for Reduce()
      double tol = 1.0e-5;
      int n = X.Length; int d = X[0].Length;
      double[][] D = MatSquaredDistances(X);
      double[][] P = MatCreate(n, n);
      double[] beta = new double[n];
      for (int i = 0; i "lt" n; ++i)
        beta[i] = 1.0;
      double logU = Math.Log(perplexity);
      for (int i = 0; i "lt" n; ++i)
      {
        double betaMin = -1.0e15;
        double betaMax = 1.0e15;
        // ith row of D[][], without 0.0 at [i][i]
        double[] Di = new double[n - 1];
        int k = 0; // points into Di
        for (int j = 0; j "lt" n; ++j) // j points D[][]
        {
          if (j == i) continue;
          Di[k] = D[i][j];
          ++k;
        }

        double h;
        double[] currP = ComputePH(Di, beta[i], out h);
        double hDiff = h - logU;
        int ct = 0;
        while (Math.Abs(hDiff) "gt" tol "and" ct "lt" 50)
        {
          if (hDiff "gt" 0.0)
          {
            betaMin = beta[i];
            if (betaMax == 1.0e15 || betaMax == -1.0e15)
              beta[i] *= 2.0;
            else
              beta[i] = (beta[i] + betaMax) / 2.0;
          }
          else
          {
            betaMax = beta[i];
            if (betaMin == 1.0e15 || betaMin == -1.0e15)
              beta[i] /= 2.0;
            else
              beta[i] = (beta[i] + betaMin) / 2.0;
          }
          // recompute
          currP = ComputePH(Di, beta[i], out h);
          hDiff = h - logU;
          ++ct;
        } // while

        k = 0; // points into currP vector
        for (int j = 0; j "lt" n; ++j)
        {
          if (i == j) P[i][j] = 0.0;
          else P[i][j] = currP[k++];
        }
      } // for i

      return P;
    } // ComputeP

    // ------------------------------------------------------

    private static double[] ComputePH(double[] di,
      double beta, out double h)
    {
      // helper for ComputeP()
      // return p vector explicitly, h val as out param
      int n = di.Length; // (n-1) relative to D
      double[] p = new double[n];
      double sumP = 0.0;
      for (int j = 0; j "lt" n; ++j)
      {
        p[j] = Math.Exp(-1 * di[j] * beta);
        sumP += p[j];
      }
      if (sumP == 0.0) sumP = 1.0e-12; // avoid div by 0

      double sumDP = 0.0;
      for (int j = 0; j "lt" n; ++j)
        sumDP += di[j] * p[j];

      double hh = Math.Log(sumP) + (beta * sumDP / sumP);
      for (int j = 0; j "lt" n; ++j)
        p[j] /= sumP;

      h = hh;
      return p; // a new row for P[][]
    }

    // ------------------------------------------------------
    //
    // 32 secondary helper functions, one helper class
    //
    // MatSquaredDistances, MatCreate, MatOnes,
    // MatLoad, MatSave, VecLoad,
    // MatShow, MatShow, MatShow, (3 overloads)
    // VecShow, VecShow, MatCopy, MatAdd, MatTranspose,
    // MatProduct, MatDiff, MatExtractRow,
    // MatExtractColumn, VecMinusMat, VecMult, VecTile,
    // MatMultiply, MatMultLogDivide, MatRowSums,
    // MatColumnSums, MatColumnMeans, MatReplaceRow,
    // MatVecAdd, MatAddScalar, MatInvertElements,
    // MatSum, MatZeroOutDiag
    //
    // Gaussian class: NextGaussian()
    //
    // ------------------------------------------------------

    private static double[][] 
      MatSquaredDistances(double[][] X)
    {
      // squared Euclidean distances, all rows in X
      int n = X.Length;
      int dim = X[0].Length;
      double[][] result = MatCreate(n, n);
      for (int i = 0; i "lt" n; ++i)
      {
        for (int j = i; j "lt" n; ++j)
        {
          double dist = 0.0;
          for (int k = 0; k "lt" dim; ++k)
          {
            dist += (X[i][k] - X[j][k]) *
              (X[i][k] - X[j][k]);
          }
          result[i][j] = dist;
          result[j][i] = dist;
        }
      }
      return result;
    }

    // ------------------------------------------------------

    private static double[][] MatCreate(int rows, int cols)
    {
      double[][] result = new double[rows][];
      for (int i = 0; i "lt" rows; ++i)
        result[i] = new double[cols];
      return result;
    }

    // ------------------------------------------------------

    private static double[][] MatOnes(int rows, int cols)
    {
      double[][] result = MatCreate(rows, cols);
      for (int i = 0; i "lt" rows; ++i)
        for (int j = 0; j "lt" cols; ++j)
          result[i][j] = 1.0;
      return result;
    }

    // ------------------------------------------------------

    public static double[][] MatLoad(string fn,
      int[] usecols, char sep, string comment)
    {
      // count number of non-comment lines
      int nRows = 0;
      string line = "";
      FileStream ifs = new FileStream(fn, FileMode.Open);
      StreamReader sr = new StreamReader(ifs);
      while ((line = sr.ReadLine()) != null)
        if (line.StartsWith(comment) == false)
          ++nRows;
      sr.Close(); ifs.Close();

      // make result matrix
      int nCols = usecols.Length;
      double[][] result = new double[nRows][];
      for (int r = 0; r "lt" nRows; ++r)
        result[r] = new double[nCols];

      line = "";
      string[] tokens = null;
      ifs = new FileStream(fn, FileMode.Open);
      sr = new StreamReader(ifs);

      int i = 0;
      while ((line = sr.ReadLine()) != null)
      {
        if (line.StartsWith(comment) == true)
          continue;
        tokens = line.Split(sep);
        for (int j = 0; j "lt" nCols; ++j)
        {
          int k = usecols[j];  // into tokens
          result[i][j] = double.Parse(tokens[k]);
        }
        ++i;
      }
      sr.Close(); ifs.Close();
      return result;
    }

    // ------------------------------------------------------

    public static void MatSave(double[][] data, 
      string fn, char sep, int dec)
    {
      int nRows = data.Length;
      int nCols = data[0].Length;
      FileStream ofs = new FileStream(fn,
        FileMode.Create);
      StreamWriter sw = new StreamWriter(ofs);
      for (int i = 0; i "lt" nRows; ++i)
      {
        string line = "";
        for (int j = 0; j "lt" nCols; ++j)
        {
          line += data[i][j].ToString("F" + dec);
          if (j "lt" nCols - 1)
            line += sep;
        }
        sw.WriteLine(line); // includes NL
      }
      sw.Close();
      ofs.Close();
    }

    // ------------------------------------------------------

    public static int[] VecLoad(string fn, int usecol,
      string comment)
    {
      char dummySep = ',';
      double[][] tmp = MatLoad(fn, new int[] { usecol },
        dummySep, comment);
      int n = tmp.Length;
      int[] result = new int[n];
      for (int i = 0; i "lt" n; ++i)
        result[i] = (int)tmp[i][0];
      return result;
    }

    // ------------------------------------------------------

    public static void MatShow(double[][] M, int dec,
      int wid, bool showIndices)
    {
      double small = 1.0 / Math.Pow(10, dec);
      for (int i = 0; i "lt" M.Length; ++i)
      {
        if (showIndices == true)
        {
          int pad = M.Length.ToString().Length;
          Console.Write("[" + i.ToString().
            PadLeft(pad) + "]");
        }
        for (int j = 0; j "lt" M[0].Length; ++j)
        {
          double v = M[i][j];
          if (Math.Abs(v) "lt" small) v = 0.0;
          Console.Write(v.ToString("F" + dec).
            PadLeft(wid));
        }
        Console.WriteLine("");
      }
    }

    // ------------------------------------------------------

    public static void MatShow(double[][] M, int nCols,
      int dec, int wid, bool showIndices)
    {
      double small = 1.0 / Math.Pow(10, dec);
      for (int i = 0; i "lt" M.Length; ++i)
      {
        if (showIndices == true)
        {
          int pad = M.Length.ToString().Length;
          Console.Write("[" + i.ToString().
            PadLeft(pad) + "]");
        }
        for (int j = 0; j "lt" nCols; ++j)
        {
          double v = M[i][j];
          if (Math.Abs(v) "lt" small) v = 0.0;
          Console.Write(v.ToString("F" + dec).
            PadLeft(wid));
        }
        Console.WriteLine(" . . . ");
      }
    }

    // ------------------------------------------------------

    public static void MatShow(double[][] M, int dec,
      int wid, int nRows)
    {
      double small = 1.0 / Math.Pow(10, dec);
      for (int i = 0; i "lt" nRows; ++i)
      {
        for (int j = 0; j "lt" M[0].Length; ++j)
        {
          double v = M[i][j];
          if (Math.Abs(v) "lt" small) v = 0.0;
          Console.Write(v.ToString("F" + dec).
            PadLeft(wid));
        }
        Console.WriteLine("");
      }
      if (nRows "lt" M.Length)
        Console.WriteLine(". . . ");
    }

    // ------------------------------------------------------

    public static void VecShow(int[] vec, int wid)
    {
      int n = vec.Length;
      for (int i = 0; i "lt" n; ++i)
        Console.Write(vec[i].ToString().PadLeft(wid));
      Console.WriteLine("");
    }

    // ------------------------------------------------------

    public static void VecShow(double[] vec, int decimals,
      int wid)
    {
      int n = vec.Length;
      for (int i = 0; i "lt" n; ++i)
        Console.Write(vec[i].ToString("F" + decimals).
          PadLeft(wid));
      Console.WriteLine("");
    }

    // ------------------------------------------------------

    private static double[][] MatCopy(double[][] m)
    {
      int nRows = m.Length; int nCols = m[0].Length;
      double[][] result = MatCreate(nRows, nCols);
      for (int i = 0; i "lt" nRows; ++i)
        for (int j = 0; j "lt" nCols; ++j)
          result[i][j] = m[i][j];
      return result;
    }

    // ------------------------------------------------------

    private static double[][] MatAdd(double[][] mA,
      double[][] mB)
    {
      int rows = mA.Length;
      int cols = mA[0].Length;
      double[][] result = MatCreate(rows, cols);
      for (int i = 0; i "lt" rows; ++i)
        for (int j = 0; j "lt" cols; ++j)
          result[i][j] = mA[i][j] + mB[i][j];
      return result;
    }

    // ------------------------------------------------------

    private static double[][] MatTranspose(double[][] m)
    {
      int nr = m.Length;
      int nc = m[0].Length;
      double[][] result = MatCreate(nc, nr);  // note
      for (int i = 0; i "lt" nr; ++i)
        for (int j = 0; j "lt" nc; ++j)
          result[j][i] = m[i][j];
      return result;
    }

    // ------------------------------------------------------

    private static double[][] MatProduct(double[][] matA,
      double[][] matB)
    {
      int aRows = matA.Length;
      int aCols = matA[0].Length;
      int bRows = matB.Length;
      int bCols = matB[0].Length;
      if (aCols != bRows)
        throw new Exception("Non-conformable matrices");

      double[][] result = MatCreate(aRows, bCols);

      for (int i = 0; i "lt" aRows; ++i) // each row of A
        for (int j = 0; j "lt" bCols; ++j) // each col of B
          for (int k = 0; k "lt" aCols; ++k)
            result[i][j] += matA[i][k] * matB[k][j];

      return result;
    }

    // ------------------------------------------------------

    private static double[][] MatDiff(double[][] A,
      double[][] B)
    {
      int nRows = A.Length; int nCols = A[0].Length;
      double[][] result = MatCreate(nRows, nCols);
      for (int i = 0; i "lt" nRows; ++i)
        for (int j = 0; j "lt" nCols; ++j)
          result[i][j] = A[i][j] - B[i][j];
      return result;
    }

    // ------------------------------------------------------

    private static double[] MatExtractRow(double[][] M, 
      int row)
    {
      int cols = M[0].Length;
      double[] result = new double[cols];
      for (int j = 0; j "lt" cols; ++j)
        result[j] = M[row][j];
      return result;
    }

    // ------------------------------------------------------

    private static double[] MatExtractColumn(double[][] M,
      int col)
    {
      int rows = M.Length;
      double[] result = new double[rows];
      for (int i = 0; i "lt" rows; ++i)
        result[i] = M[i][col];
      return result;
    }

    // ------------------------------------------------------

    private static double[][] VecMinusMat(double[] v,
      double[][] M)
    {
      // add v row vector to each row of -M
      // B = A - Y  # 10x2; -Y + A vector to each row
      double[][] result = MatCopy(M);
      for (int i = 0; i "lt" M.Length; ++i)
        for (int j = 0; j "lt" M[0].Length; ++j)
          result[i][j] = -M[i][j] + v[j];
      return result;
    }

    // ------------------------------------------------------

    private static double[] VecMult(double[] v1,
      double[] v2)
    {
      // element-wise multiplication
      int n = v1.Length;
      double[] result = new double[n];
      for (int i = 0; i "lt" n; ++i)
        result[i] = v1[i] * v2[i];
      return result;
    }

    // ------------------------------------------------------

    private static double[][] VecTile(double[] vec, int n)
    {
      // n row-copies of vec
      int nCols = vec.Length;
      double[][] result = MatCreate(n, nCols);
      for (int i = 0; i "lt" n; ++i)
      {
        for (int j = 0; j "lt" nCols; ++j)
        {
          result[i][j] = vec[j];
        }
      }
      return result;
    }

    // ------------------------------------------------------

    private static double[][] MatMultiply(double[][] A,
      double[][] B)
    {
      // element-wise multiplication
      int nRows = A.Length; int nCols = A[0].Length;
      double[][] result = MatCreate(nRows, nCols);
      for (int i = 0; i "lt" nRows; ++i)
        for (int j = 0; j "lt" nCols; ++j)
          result[i][j] = A[i][j] * B[i][j];
      return result;
    }

    // ------------------------------------------------------

    private static double[][] MatMultLogDivide(double[][] P,
      double[][] A, double[][] B)
    {
      // element-wise P * log(A / B)
      int nRows = A.Length; int nCols = A[0].Length;
      double[][] result = MatCreate(nRows, nCols);
      for (int i = 0; i "lt" nRows; ++i)
        for (int j = 0; j "lt" nCols; ++j)
        {
          if (B[i][j] == 0.0)
            result[i][j] = P[i][j] * Math.Log(1.0e10);
          else
            result[i][j] = P[i][j] *
              Math.Log(A[i][j] / B[i][j]);
        }
      return result;
    }

    // ------------------------------------------------------

    private static double[] MatRowSums(double[][] M)
    {
      int nRows = M.Length; int nCols = M[0].Length;
      double[] result = new double[nRows];
      for (int i = 0; i "lt" nRows; ++i)
        for (int j = 0; i "lt" nCols; ++j)
          result[i] += M[i][j];
      return result;
    }

    // ------------------------------------------------------

    private static double[] MatColumnSums(double[][] M)
    {
      int nRows = M.Length; int nCols = M[0].Length;
      double[] result = new double[nCols];
      for (int j = 0; j "lt" nCols; ++j)
        for (int i = 0; i "lt" nRows; ++i)
          result[j] += M[i][j];
      return result;
    }

    // ------------------------------------------------------

    private static double[] MatColumnMeans(double[][] M)
    {
      int nRows = M.Length; int nCols = M[0].Length;
      double[] result = new double[nCols];
      for (int j = 0; j "lt" nCols; ++j)
        for (int i = 0; i "lt" nRows; ++i)
          result[j] += M[i][j];
      for (int j = 0; j "lt" nCols; ++j)
        result[j] /= nRows;
      return result;
    }

    // ------------------------------------------------------

    private static void MatReplaceRow(double[][] M, int i,
      double[] vec)
    {
      int nCols = M[0].Length;
      for (int j = 0; j "lt" nCols; ++j)
        M[i][j] = vec[j];
      return;
    }

    // ------------------------------------------------------

    private static double[][] MatVecAdd(double[][] M,
      double[] vec)
    {
      // add row vector vec to each row of M
      int nr = M.Length; int nc = M[0].Length;
      double[][] result = MatCreate(nr, nc);
      for (int i = 0; i "lt" nr; ++i)
        for (int j = 0; j "lt" nc; ++j)
          result[i][j] = M[i][j] + vec[j];
      return result;
    }

    // ------------------------------------------------------

    private static double[][] MatAddScalar(double[][] M,
      double val)
    {
      int nr = M.Length; int nc = M[0].Length;
      double[][] result = MatCreate(nr, nc);
      for (int i = 0; i "lt" nr; ++i)
        for (int j = 0; j "lt" nc; ++j)
          result[i][j] = M[i][j] + val;
      return result;
    }

    // ------------------------------------------------------

    private static double[][] MatInvertElements(double[][] M)
    {
      int nr = M.Length; int nc = M[0].Length;
      double[][] result = MatCreate(nr, nc);
      for (int i = 0; i "lt" nr; ++i)
        for (int j = 0; j "lt" nc; ++j)
          result[i][j] = 1.0 / M[i][j]; // M[i][j] not 0
      return result;
    }

    // ------------------------------------------------------

    private static double MatSum(double[][] M)
    {
      int nr = M.Length; int nc = M[0].Length;
      double result = 0.0;
      for (int i = 0; i "lt" nr; ++i)
        for (int j = 0; j "lt" nc; ++j)
          result += M[i][j];
      return result;
    }

    // ------------------------------------------------------

    private static void MatZeroOutDiag(double[][] M)
    {
      int nr = M.Length; //int nc = M[0].Length;
      double result = 0.0;
      for (int i = 0; i "lt" nr; ++i)
        M[i][i] = 0.0;
      return;
    }

    // ------------------------------------------------------

    // nested Gaussian to init TSNE.Reduce() result
    class Gaussian
    {
      private Random rnd;
      private double mean;
      private double sd;

      public Gaussian(double mean, double sd, int seed)
      {
        this.rnd = new Random(seed);
        this.mean = mean;
        this.sd = sd;
      }

      public double NextGaussian()
      {
        double u1 = this.rnd.NextDouble();
        double u2 = this.rnd.NextDouble();
        double left = Math.Cos(2.0 * Math.PI * u1);
        double right = Math.Sqrt(-2.0 * Math.Log(u2));
        double z = left * right;
        return this.mean + (z * this.sd);
      }
    } // Gaussian

    // ------------------------------------------------------

  } // class TSNE

  // ========================================================

} // ns

Source data with backslash-newline inserted every 8 values.

# optdigits_test_012_15.txt
# UCI digits, just 5 each of '0', '1', '2'
# 64 pixel values in [0,16] then digit [0,1,2]
#
# remove the \-newline characters so that
# each line has 64 pixel values followed by
# the class label (0, 1, 2)
#
0,0,5,13,9,1,0,0,\
0,0,13,15,10,15,5,0,\
0,3,15,2,0,11,8,0,\
0,4,12,0,0,8,8,0,\
0,5,8,0,0,9,8,0,\
0,4,11,0,1,12,7,0,\
0,2,14,5,10,12,0,0,\
0,0,6,13,10,0,0,0, 0
#
0,0,1,9,15,11,0,0,\
0,0,11,16,8,14,6,0,\
0,2,16,10,0,9,9,0,\
0,1,16,4,0,8,8,0,\
0,4,16,4,0,8,8,0,\
0,1,16,5,1,11,3,0,\
0,0,12,12,10,10,0,0,\
0,0,1,10,13,3,0,0, 0
#
0,0,3,13,11,7,0,0,\
0,0,11,16,16,16,2,0,\
0,4,16,9,1,14,2,0,\
0,4,16,0,0,16,2,0,\
0,0,16,1,0,12,8,0,\
0,0,15,9,0,13,6,0,\
0,0,9,14,9,14,1,0,\
0,0,2,12,13,4,0,0, 0
#
0,0,10,14,11,3,0,0,\
0,4,16,13,6,14,1,0,\
0,4,16,2,0,11,7,0,\
0,8,16,0,0,10,5,0,\
0,8,16,0,0,14,4,0,\
0,8,16,0,1,16,1,0,\
0,4,16,1,11,15,0,0,\
0,0,11,16,12,3,0,0, 0
#
0,0,6,14,10,2,0,0,\
0,0,15,15,13,15,3,0,\
0,2,16,10,0,13,9,0,\
0,1,16,5,0,12,5,0,\
0,0,16,3,0,13,6,0,\
0,1,15,5,6,13,1,0,\
0,0,16,11,14,10,0,0,\
0,0,7,16,11,1,0,0, 0
#
#
0,0,0,12,13,5,0,0,\
0,0,0,11,16,9,0,0,\
0,0,3,15,16,6,0,0,\
0,7,15,16,16,2,0,0,\
0,0,1,16,16,3,0,0,\
0,0,1,16,16,6,0,0,\
0,0,1,16,16,6,0,0,\
0,0,0,11,16,10,0,0, 1
#
0,0,0,0,14,13,1,0,\
0,0,0,5,16,16,2,0,\
0,0,0,14,16,12,0,0,\
0,1,10,16,16,12,0,0,\
0,3,12,14,16,9,0,0,\
0,0,0,5,16,15,0,0,\
0,0,0,4,16,14,0,0,\
0,0,0,1,13,16,1,0, 1
#
0,0,0,2,16,16,2,0,\
0,0,0,4,16,16,2,0,\
0,1,4,12,16,12,0,0,\
0,7,16,16,16,12,0,0,\
0,0,3,10,16,14,0,0,\
0,0,0,8,16,12,0,0,\
0,0,0,6,16,16,2,0,\
0,0,0,2,12,15,4,0, 1
#
0,0,0,0,12,5,0,0,\
0,0,0,2,16,12,0,0,\
0,0,1,12,16,11,0,0,\
0,2,12,16,16,10,0,0,\
0,6,11,5,15,6,0,0,\
0,0,0,1,16,9,0,0,\
0,0,0,2,16,11,0,0,\
0,0,0,3,16,8,0,0, 1
#
0,0,0,1,11,9,0,0,\
0,0,0,7,16,13,0,0,\
0,0,4,14,16,9,0,0,\
0,10,16,11,16,8,0,0,\
0,0,0,3,16,6,0,0,\
0,0,0,3,16,8,0,0,\
0,0,0,5,16,10,0,0,\
0,0,0,2,14,6,0,0, 1
#
#
0,0,0,4,15,12,0,0,\
0,0,3,16,15,14,0,0,\
0,0,8,13,8,16,0,0,\
0,0,1,6,15,11,0,0,\
0,1,8,13,15,1,0,0,\
0,9,16,16,5,0,0,0,\
0,3,13,16,16,11,5,0,\
0,0,0,3,11,16,9,0, 2
#
0,0,5,12,1,0,0,0,\
0,0,15,14,7,0,0,0,\
0,0,13,1,12,0,0,0,\
0,2,10,0,14,0,0,0,\
0,0,2,0,16,1,0,0,\
0,0,0,6,15,0,0,0,\
0,0,9,16,15,9,8,2,\
0,0,3,11,8,13,12,4, 2
#
0,0,8,16,5,0,0,0,\
0,1,13,11,16,0,0,0,\
0,0,10,0,13,3,0,0,\
0,0,3,1,16,1,0,0,\
0,0,0,9,12,0,0,0,\
0,0,3,15,5,0,0,0,\
0,0,14,15,8,8,3,0,\
0,0,7,12,12,12,13,1, 2
#
0,0,0,5,14,12,2,0,\
0,0,7,15,8,14,4,0,\
0,0,6,2,3,13,1,0,\
0,0,0,1,13,4,0,0,\
0,0,1,11,9,0,0,0,\
0,8,16,13,0,0,0,0,\
0,5,14,16,11,2,0,0,\
0,0,0,6,12,13,3,0, 2
#
0,0,0,3,15,10,1,0,\
0,0,0,11,10,16,4,0,\
0,0,0,12,1,15,6,0,\
0,0,0,3,4,15,4,0,\
0,0,0,6,15,6,0,0,\
0,4,15,16,9,0,0,0,\
0,0,13,16,15,9,3,0,\
0,0,0,4,9,14,7,0, 2
This entry was posted in Machine Learning. Bookmark the permalink.

Leave a comment