The t-SNE (“t-distributed Stochastic Neighbor Embedding”) technique is a method for visualizing high-dimensional data. The basic form of the technique is very specific: it converts data with three or more columns into data with just two columns, which can then be visualized as an XY graph. No more, no less.
For example, the UCI digits dataset contains 5,620 (3,823 train, 1,797 test) crude 8-by-8 images of handwritten ‘0’ through ‘9’ digits. Each digit is stored as a row of the 64 pixel values. You can’t graph data that has 64 dimensions but if you reduce the data to 2 dimensions, the reduced data can be visualized on an XY graph.
In most scenarios, the easiest way to apply t-SNE is to use an existing library such the Python scikit-learn sklearn.manifold.TSNE module. But to really understand how t-SNE works, and its strengths and weaknesses, you can implement t-SNE from scratch.
There are quite a few from-scratch Python versions of t-SNE floating around the Internet (including one I wrote in a blog post a couple of days ago), but I decided to implement t-SNE using C#. My C# implementation is a direct refactoring of a Python implementation by researcher L. van der Maaten, who co-authored the original t-SNE paper.
Example UCI digits. If you squint, you can see the digits better.
For my demo, I used a tiny 15-item subset of the UCI Digits dataset. For simplicity I extracted the first five ‘0’ items, the first five ‘1’ items, and the first five ‘2’ items. The source data looks like:
[ 0] 0,0,5,13,9,1,0,0,0,0,13 . . 0 [ 1] 0,1,9,15,11,0,0,0,0,11 . . 0 . . . [14] 0,0,0,3,15,10,1,0,0,0,0 . . 2
Each line has 65 values. The first 64 values on each line are pixel values between 0 (white) and 16 (black). The last value on each line is the digit represented, 0, 1, or 2. The t-SNE technique does not need class labels but the demo uses labeled data so that the resulting condensed data can be evaluated. Unlike many machine learning techniques, t-SNE does not require data normalization. The leading index values, [0] through [14] are not part of the data, and are displayed for convenience.
The key C# statements in the demo are:
string fn = "..\\..\\..\\Data\\optdigits_test_012_15.txt"; int[] cols = new int[64]; for (int i = 0; i "lt" 64; ++i) cols[i] = i; double[][] X = TSNE.MatLoad(fn, cols, ',', "#"); double[][] reduced = TSNE.Reduce(X, maxIter: 600, perplexity: 5); Console.WriteLine("Result: "); TSNE.MatShow(reduced, 4, 12, true);
The file name indicates that the file contains a 15-item subset of the UCI Digits test dataset, containing only ‘0’, ‘1’, ‘2’ items. The program-defined MatLoad() function loads comma-delimited columns 0 to 63 inclusive (the pixel values), where lines beginning with “#” are interpreted as comments.
The reduced data visualized as an XY graph
The Reduce() function accepts a maxIter argument of 600 iterations, and a perplexity argument of 5. The maxIter and perplexity values must be determined by trial and error. The prerplexity value specifies how many data item neighbors to use. Because the dataset only has 15 items, a small perplexity value is used.
The resulting reduced 15-item data is:
[ 0] -337.1624 -72.7990 [ 1] -270.6862 -41.9135 [ 2] -220.7945 -87.5525 [ 3] -349.0787 -139.6775 [ 4] -277.9941 -120.1643 [ 5] 237.8607 21.8573 [ 6] 194.9783 -112.4145 [ 7] 264.3068 -66.9215 [ 8] 253.1594 -134.5962 [ 9] 202.8873 -40.9125 [10] -31.8724 114.8531 [11] 176.6870 225.8076 [12] 125.7436 223.0445 [13] 16.3423 151.2865 [14] 15.6229 80.1027
When the reduced data is graphed using the first column as x and the second column as y, the reduced data groups into three distinct clusters, as expected. If you didn’t know beforehand how the data was ordered, you’d learn that the first three items are similar, the second three are similar, and the last three or similar.
The Rocky and Bullwinkle TV Show ran from 1959 to 1964. Even though the animation detail was deliberately reduced for easy visualization, the humor was quite sophisticated and often hilarious. The simple art enhanced, rather than detracted from, the show’s quality. Rocky was a good-natured flying squirrel, Bullwinkle was a dim-witted moose, Boris Badenov was a height-challenged villain whose plans always failed (much to the annoyance of his boss, Fearless Leader), and Natasha Fatale was Boris’ assistant. The R and B show featured other sub-shows including Peabody’s Improbable History where dog scientist Mr. Peabody and his protege Sherman, would travel back in time via the Wayback machine.
I have posted the C# source code on GitHub at https://github.com/jdmccaffrey/tsne-csharp. I’ve also pasted the code below.
Demo code. Long. Data below. Replace “lt” (less-than), “gt”, “lte”, “gte”, “and” with Boolean operator symbols (my blog editor consistently chokes on symbols).
using System; using System.IO; // Paper: "Visualizing Data using t-SNE" (2008), // L. van der Maaten and G. Hinton. // Python code: https://lvdmaaten.github.io/tsne/ namespace TSNE { internal class TSNEProgram { static void Main(string[] args) { Console.WriteLine("\nBegin t-SNE demo "); Console.WriteLine("\nLoading 15-item subset" + " of UCI digits 64-dim data "); string fn = "..\\..\\..\\Data\\optdigits_test_012_15.txt"; int[] cols = new int[64]; for (int i = 0; i "lt" 64; ++i) cols[i] = i; double[][] X = TSNE.MatLoad(fn, cols, ',', "#"); Console.WriteLine("\nX (first 12 columns): "); TSNE.MatShow(X, 12, 1, 6, true); // labels not used by t-SNE but used for graph Console.WriteLine("\ny labels: "); int[] y = TSNE.VecLoad(fn, 64, "#"); // col [64] TSNE.VecShow(y, 3); Console.WriteLine("\nApplying TSNE reduction "); Console.WriteLine("maxIter = 600, perplexity = 5 "); double[][] reduced = TSNE.Reduce(X, maxIter: 600, perplexity: 5); Console.WriteLine("Done "); Console.WriteLine("\nResult: "); TSNE.MatShow(reduced, 4, 12, true); Console.WriteLine("\nEnd demo "); Console.ReadLine(); } } // Program // ======================================================== public class TSNE { // wrapper class for static Reduce() and its helpers public static double[][] Reduce(double[][] X, int maxIter, int perplexity) { // perplexity is number of nearest neighbors // calls ComputeP(), which calls ComputePH() // number of result dimension fixed to 2 int n = X.Length; double initialMomentum = 0.5; double finalMomentum = 0.8; double eta = 500.0; double minGain = 0.01; // initialize result Y Gaussian g = new Gaussian(mean: 0.0, sd: 1.0, seed: 1); double[][] Y = MatCreate(n, 2); for (int i = 0; i "lt" n; ++i) for (int j = 0; j "lt" 2; ++j) Y[i][j] = g.NextGaussian(); double[][] dY = MatCreate(n, 2); double[][] iY = MatCreate(n, 2); double[][] Gains = MatOnes(n, 2); double[][] P = ComputeP(X, perplexity); P = MatAdd(P, MatTranspose(P)); double sumP = MatSum(P); for (int i = 0; i "lt" n; ++i) { for (int j = 0; j "lt" n; ++j) { P[i][j] = (P[i][j] * 4.0) / sumP; if (P[i][j] "lt" 1.0e-12) P[i][j] = 1.0e-12; } } for (int iter = 0; iter "lt" maxIter; ++iter) { double[] rowSums = new double[n]; // sum squared for (int i = 0; i "lt" n; ++i) { double rowSum = 0.0; for (int j = 0; j "lt" 2; ++j) { rowSum += Y[i][j] * Y[i][j]; } rowSums[i] = rowSum; } double[][] Num = MatProduct(Y, MatTranspose(Y)); for (int i = 0; i "lt" Num.Length; ++i) for (int j = 0; j "lt" Num[0].Length; ++j) Num[i][j] *= -2.0; // num = 1 / (1 + np.add(np.add(num,sum_Y).T,sum_Y)) // num[range(n), range(n)] = 0. double[][] tmp0 = MatVecAdd(Num, rowSums); double[][] tmp1 = MatTranspose(tmp0); double[][] tmp2 = MatVecAdd(tmp1, rowSums); double[][] tmp3 = MatAddScalar(tmp2, 1.0); double[][] tmp4 = MatInvertElements(tmp3); Num = MatCopy(tmp4); MatZeroOutDiag(Num); double sumNum = MatSum(Num); double[][] Q = MatCopy(Num); for (int i = 0; i "lt" Num.Length; ++i) for (int j = 0; j "lt" Num[0].Length; ++j) { Q[i][j] = Q[i][j] / sumNum; if (Q[i][j] "lt" 1.0e-12) Q[i][j] = 1.0e-12; } double[][] PminusQ = MatDiff(P, Q); for (int i = 0; i "lt" n; ++i) // compute dY { double[] tmpA = MatExtractRow(Y, i); double[][] tmpB = VecMinusMat(tmpA, Y); // tricky! double[] tmpC = MatExtractColumn(PminusQ, i); double[] tmpD = MatExtractColumn(Num, i); double[] tmpK = VecMult(tmpC, tmpD); double[][] tmpE = VecTile(tmpK, 2); double[][] tmpF = MatTranspose(tmpE); double[][] tmpG = MatMultiply(tmpF, tmpB); double[] tmpZ = MatColumnSums(tmpG); MatReplaceRow(dY, i, tmpZ); } // i double momentum; if (iter "lt" 20) momentum = initialMomentum; else momentum = finalMomentum; // compute Gains for (int i = 0; i "lt" Gains.Length; ++i) { for (int j = 0; j "lt" Gains[0].Length; ++j) { if ((dY[i][j] "gt" 0.0 "and" iY[i][j] "lte" 0.0) || (dY[i][j] "lte" 0.0 "and" iY[i][j] "gt" 0.0)) Gains[i][j] = Gains[i][j] + 0.2; else if ((dY[i][j] "gt" 0.0 "and" iY[i][j] "gt" 0.0) || (dY[i][j] "lte" 0.0 "and" iY[i][j] "lte" 0.0)) Gains[i][j] = Gains[i][j] * 0.8; } } for (int i = 0; i "lt" Gains.Length; ++i) for (int j = 0; j "lt" Gains[0].Length; ++j) if (Gains[i][j] "lt" minGain) Gains[i][j] = minGain; // use dY to compute iY to update result Y for (int i = 0; i "lt" iY.Length; ++i) { for (int j = 0; j "lt" iY[0].Length; ++j) { iY[i][j] = (momentum * iY[i][j]) - (eta * Gains[i][j] * dY[i][j]); Y[i][j] += iY[i][j]; } } double[] meansY = MatColumnMeans(Y); double[][] meansTile = VecTile(meansY, n); Y = MatDiff(Y, meansTile); // updated result if ((iter + 1) % 100 == 0) { // cost aka error double C = MatSum(MatMultLogDivide(P, P, Q)); Console.WriteLine("iter = " + (iter + 1). ToString().PadLeft(6) + " | error = " + C.ToString("F4")); } if (iter == 100) // undo the initial expansion of P { for (int i = 0; i "lt" P.Length; ++i) for (int j = 0; j "lt" P[0].Length; ++j) P[i][j] /= 4.0; } } // iter return Y; } // Reduce() // ------------------------------------------------------ private static double[][] ComputeP(double[][] X, int perplexity) { // helper for Reduce() double tol = 1.0e-5; int n = X.Length; int d = X[0].Length; double[][] D = MatSquaredDistances(X); double[][] P = MatCreate(n, n); double[] beta = new double[n]; for (int i = 0; i "lt" n; ++i) beta[i] = 1.0; double logU = Math.Log(perplexity); for (int i = 0; i "lt" n; ++i) { double betaMin = -1.0e15; double betaMax = 1.0e15; // ith row of D[][], without 0.0 at [i][i] double[] Di = new double[n - 1]; int k = 0; // points into Di for (int j = 0; j "lt" n; ++j) // j points D[][] { if (j == i) continue; Di[k] = D[i][j]; ++k; } double h; double[] currP = ComputePH(Di, beta[i], out h); double hDiff = h - logU; int ct = 0; while (Math.Abs(hDiff) "gt" tol "and" ct "lt" 50) { if (hDiff "gt" 0.0) { betaMin = beta[i]; if (betaMax == 1.0e15 || betaMax == -1.0e15) beta[i] *= 2.0; else beta[i] = (beta[i] + betaMax) / 2.0; } else { betaMax = beta[i]; if (betaMin == 1.0e15 || betaMin == -1.0e15) beta[i] /= 2.0; else beta[i] = (beta[i] + betaMin) / 2.0; } // recompute currP = ComputePH(Di, beta[i], out h); hDiff = h - logU; ++ct; } // while k = 0; // points into currP vector for (int j = 0; j "lt" n; ++j) { if (i == j) P[i][j] = 0.0; else P[i][j] = currP[k++]; } } // for i return P; } // ComputeP // ------------------------------------------------------ private static double[] ComputePH(double[] di, double beta, out double h) { // helper for ComputeP() // return p vector explicitly, h val as out param int n = di.Length; // (n-1) relative to D double[] p = new double[n]; double sumP = 0.0; for (int j = 0; j "lt" n; ++j) { p[j] = Math.Exp(-1 * di[j] * beta); sumP += p[j]; } if (sumP == 0.0) sumP = 1.0e-12; // avoid div by 0 double sumDP = 0.0; for (int j = 0; j "lt" n; ++j) sumDP += di[j] * p[j]; double hh = Math.Log(sumP) + (beta * sumDP / sumP); for (int j = 0; j "lt" n; ++j) p[j] /= sumP; h = hh; return p; // a new row for P[][] } // ------------------------------------------------------ // // 32 secondary helper functions, one helper class // // MatSquaredDistances, MatCreate, MatOnes, // MatLoad, MatSave, VecLoad, // MatShow, MatShow, MatShow, (3 overloads) // VecShow, VecShow, MatCopy, MatAdd, MatTranspose, // MatProduct, MatDiff, MatExtractRow, // MatExtractColumn, VecMinusMat, VecMult, VecTile, // MatMultiply, MatMultLogDivide, MatRowSums, // MatColumnSums, MatColumnMeans, MatReplaceRow, // MatVecAdd, MatAddScalar, MatInvertElements, // MatSum, MatZeroOutDiag // // Gaussian class: NextGaussian() // // ------------------------------------------------------ private static double[][] MatSquaredDistances(double[][] X) { // squared Euclidean distances, all rows in X int n = X.Length; int dim = X[0].Length; double[][] result = MatCreate(n, n); for (int i = 0; i "lt" n; ++i) { for (int j = i; j "lt" n; ++j) { double dist = 0.0; for (int k = 0; k "lt" dim; ++k) { dist += (X[i][k] - X[j][k]) * (X[i][k] - X[j][k]); } result[i][j] = dist; result[j][i] = dist; } } return result; } // ------------------------------------------------------ private static double[][] MatCreate(int rows, int cols) { double[][] result = new double[rows][]; for (int i = 0; i "lt" rows; ++i) result[i] = new double[cols]; return result; } // ------------------------------------------------------ private static double[][] MatOnes(int rows, int cols) { double[][] result = MatCreate(rows, cols); for (int i = 0; i "lt" rows; ++i) for (int j = 0; j "lt" cols; ++j) result[i][j] = 1.0; return result; } // ------------------------------------------------------ public static double[][] MatLoad(string fn, int[] usecols, char sep, string comment) { // count number of non-comment lines int nRows = 0; string line = ""; FileStream ifs = new FileStream(fn, FileMode.Open); StreamReader sr = new StreamReader(ifs); while ((line = sr.ReadLine()) != null) if (line.StartsWith(comment) == false) ++nRows; sr.Close(); ifs.Close(); // make result matrix int nCols = usecols.Length; double[][] result = new double[nRows][]; for (int r = 0; r "lt" nRows; ++r) result[r] = new double[nCols]; line = ""; string[] tokens = null; ifs = new FileStream(fn, FileMode.Open); sr = new StreamReader(ifs); int i = 0; while ((line = sr.ReadLine()) != null) { if (line.StartsWith(comment) == true) continue; tokens = line.Split(sep); for (int j = 0; j "lt" nCols; ++j) { int k = usecols[j]; // into tokens result[i][j] = double.Parse(tokens[k]); } ++i; } sr.Close(); ifs.Close(); return result; } // ------------------------------------------------------ public static void MatSave(double[][] data, string fn, char sep, int dec) { int nRows = data.Length; int nCols = data[0].Length; FileStream ofs = new FileStream(fn, FileMode.Create); StreamWriter sw = new StreamWriter(ofs); for (int i = 0; i "lt" nRows; ++i) { string line = ""; for (int j = 0; j "lt" nCols; ++j) { line += data[i][j].ToString("F" + dec); if (j "lt" nCols - 1) line += sep; } sw.WriteLine(line); // includes NL } sw.Close(); ofs.Close(); } // ------------------------------------------------------ public static int[] VecLoad(string fn, int usecol, string comment) { char dummySep = ','; double[][] tmp = MatLoad(fn, new int[] { usecol }, dummySep, comment); int n = tmp.Length; int[] result = new int[n]; for (int i = 0; i "lt" n; ++i) result[i] = (int)tmp[i][0]; return result; } // ------------------------------------------------------ public static void MatShow(double[][] M, int dec, int wid, bool showIndices) { double small = 1.0 / Math.Pow(10, dec); for (int i = 0; i "lt" M.Length; ++i) { if (showIndices == true) { int pad = M.Length.ToString().Length; Console.Write("[" + i.ToString(). PadLeft(pad) + "]"); } for (int j = 0; j "lt" M[0].Length; ++j) { double v = M[i][j]; if (Math.Abs(v) "lt" small) v = 0.0; Console.Write(v.ToString("F" + dec). PadLeft(wid)); } Console.WriteLine(""); } } // ------------------------------------------------------ public static void MatShow(double[][] M, int nCols, int dec, int wid, bool showIndices) { double small = 1.0 / Math.Pow(10, dec); for (int i = 0; i "lt" M.Length; ++i) { if (showIndices == true) { int pad = M.Length.ToString().Length; Console.Write("[" + i.ToString(). PadLeft(pad) + "]"); } for (int j = 0; j "lt" nCols; ++j) { double v = M[i][j]; if (Math.Abs(v) "lt" small) v = 0.0; Console.Write(v.ToString("F" + dec). PadLeft(wid)); } Console.WriteLine(" . . . "); } } // ------------------------------------------------------ public static void MatShow(double[][] M, int dec, int wid, int nRows) { double small = 1.0 / Math.Pow(10, dec); for (int i = 0; i "lt" nRows; ++i) { for (int j = 0; j "lt" M[0].Length; ++j) { double v = M[i][j]; if (Math.Abs(v) "lt" small) v = 0.0; Console.Write(v.ToString("F" + dec). PadLeft(wid)); } Console.WriteLine(""); } if (nRows "lt" M.Length) Console.WriteLine(". . . "); } // ------------------------------------------------------ public static void VecShow(int[] vec, int wid) { int n = vec.Length; for (int i = 0; i "lt" n; ++i) Console.Write(vec[i].ToString().PadLeft(wid)); Console.WriteLine(""); } // ------------------------------------------------------ public static void VecShow(double[] vec, int decimals, int wid) { int n = vec.Length; for (int i = 0; i "lt" n; ++i) Console.Write(vec[i].ToString("F" + decimals). PadLeft(wid)); Console.WriteLine(""); } // ------------------------------------------------------ private static double[][] MatCopy(double[][] m) { int nRows = m.Length; int nCols = m[0].Length; double[][] result = MatCreate(nRows, nCols); for (int i = 0; i "lt" nRows; ++i) for (int j = 0; j "lt" nCols; ++j) result[i][j] = m[i][j]; return result; } // ------------------------------------------------------ private static double[][] MatAdd(double[][] mA, double[][] mB) { int rows = mA.Length; int cols = mA[0].Length; double[][] result = MatCreate(rows, cols); for (int i = 0; i "lt" rows; ++i) for (int j = 0; j "lt" cols; ++j) result[i][j] = mA[i][j] + mB[i][j]; return result; } // ------------------------------------------------------ private static double[][] MatTranspose(double[][] m) { int nr = m.Length; int nc = m[0].Length; double[][] result = MatCreate(nc, nr); // note for (int i = 0; i "lt" nr; ++i) for (int j = 0; j "lt" nc; ++j) result[j][i] = m[i][j]; return result; } // ------------------------------------------------------ private static double[][] MatProduct(double[][] matA, double[][] matB) { int aRows = matA.Length; int aCols = matA[0].Length; int bRows = matB.Length; int bCols = matB[0].Length; if (aCols != bRows) throw new Exception("Non-conformable matrices"); double[][] result = MatCreate(aRows, bCols); for (int i = 0; i "lt" aRows; ++i) // each row of A for (int j = 0; j "lt" bCols; ++j) // each col of B for (int k = 0; k "lt" aCols; ++k) result[i][j] += matA[i][k] * matB[k][j]; return result; } // ------------------------------------------------------ private static double[][] MatDiff(double[][] A, double[][] B) { int nRows = A.Length; int nCols = A[0].Length; double[][] result = MatCreate(nRows, nCols); for (int i = 0; i "lt" nRows; ++i) for (int j = 0; j "lt" nCols; ++j) result[i][j] = A[i][j] - B[i][j]; return result; } // ------------------------------------------------------ private static double[] MatExtractRow(double[][] M, int row) { int cols = M[0].Length; double[] result = new double[cols]; for (int j = 0; j "lt" cols; ++j) result[j] = M[row][j]; return result; } // ------------------------------------------------------ private static double[] MatExtractColumn(double[][] M, int col) { int rows = M.Length; double[] result = new double[rows]; for (int i = 0; i "lt" rows; ++i) result[i] = M[i][col]; return result; } // ------------------------------------------------------ private static double[][] VecMinusMat(double[] v, double[][] M) { // add v row vector to each row of -M // B = A - Y # 10x2; -Y + A vector to each row double[][] result = MatCopy(M); for (int i = 0; i "lt" M.Length; ++i) for (int j = 0; j "lt" M[0].Length; ++j) result[i][j] = -M[i][j] + v[j]; return result; } // ------------------------------------------------------ private static double[] VecMult(double[] v1, double[] v2) { // element-wise multiplication int n = v1.Length; double[] result = new double[n]; for (int i = 0; i "lt" n; ++i) result[i] = v1[i] * v2[i]; return result; } // ------------------------------------------------------ private static double[][] VecTile(double[] vec, int n) { // n row-copies of vec int nCols = vec.Length; double[][] result = MatCreate(n, nCols); for (int i = 0; i "lt" n; ++i) { for (int j = 0; j "lt" nCols; ++j) { result[i][j] = vec[j]; } } return result; } // ------------------------------------------------------ private static double[][] MatMultiply(double[][] A, double[][] B) { // element-wise multiplication int nRows = A.Length; int nCols = A[0].Length; double[][] result = MatCreate(nRows, nCols); for (int i = 0; i "lt" nRows; ++i) for (int j = 0; j "lt" nCols; ++j) result[i][j] = A[i][j] * B[i][j]; return result; } // ------------------------------------------------------ private static double[][] MatMultLogDivide(double[][] P, double[][] A, double[][] B) { // element-wise P * log(A / B) int nRows = A.Length; int nCols = A[0].Length; double[][] result = MatCreate(nRows, nCols); for (int i = 0; i "lt" nRows; ++i) for (int j = 0; j "lt" nCols; ++j) { if (B[i][j] == 0.0) result[i][j] = P[i][j] * Math.Log(1.0e10); else result[i][j] = P[i][j] * Math.Log(A[i][j] / B[i][j]); } return result; } // ------------------------------------------------------ private static double[] MatRowSums(double[][] M) { int nRows = M.Length; int nCols = M[0].Length; double[] result = new double[nRows]; for (int i = 0; i "lt" nRows; ++i) for (int j = 0; i "lt" nCols; ++j) result[i] += M[i][j]; return result; } // ------------------------------------------------------ private static double[] MatColumnSums(double[][] M) { int nRows = M.Length; int nCols = M[0].Length; double[] result = new double[nCols]; for (int j = 0; j "lt" nCols; ++j) for (int i = 0; i "lt" nRows; ++i) result[j] += M[i][j]; return result; } // ------------------------------------------------------ private static double[] MatColumnMeans(double[][] M) { int nRows = M.Length; int nCols = M[0].Length; double[] result = new double[nCols]; for (int j = 0; j "lt" nCols; ++j) for (int i = 0; i "lt" nRows; ++i) result[j] += M[i][j]; for (int j = 0; j "lt" nCols; ++j) result[j] /= nRows; return result; } // ------------------------------------------------------ private static void MatReplaceRow(double[][] M, int i, double[] vec) { int nCols = M[0].Length; for (int j = 0; j "lt" nCols; ++j) M[i][j] = vec[j]; return; } // ------------------------------------------------------ private static double[][] MatVecAdd(double[][] M, double[] vec) { // add row vector vec to each row of M int nr = M.Length; int nc = M[0].Length; double[][] result = MatCreate(nr, nc); for (int i = 0; i "lt" nr; ++i) for (int j = 0; j "lt" nc; ++j) result[i][j] = M[i][j] + vec[j]; return result; } // ------------------------------------------------------ private static double[][] MatAddScalar(double[][] M, double val) { int nr = M.Length; int nc = M[0].Length; double[][] result = MatCreate(nr, nc); for (int i = 0; i "lt" nr; ++i) for (int j = 0; j "lt" nc; ++j) result[i][j] = M[i][j] + val; return result; } // ------------------------------------------------------ private static double[][] MatInvertElements(double[][] M) { int nr = M.Length; int nc = M[0].Length; double[][] result = MatCreate(nr, nc); for (int i = 0; i "lt" nr; ++i) for (int j = 0; j "lt" nc; ++j) result[i][j] = 1.0 / M[i][j]; // M[i][j] not 0 return result; } // ------------------------------------------------------ private static double MatSum(double[][] M) { int nr = M.Length; int nc = M[0].Length; double result = 0.0; for (int i = 0; i "lt" nr; ++i) for (int j = 0; j "lt" nc; ++j) result += M[i][j]; return result; } // ------------------------------------------------------ private static void MatZeroOutDiag(double[][] M) { int nr = M.Length; //int nc = M[0].Length; double result = 0.0; for (int i = 0; i "lt" nr; ++i) M[i][i] = 0.0; return; } // ------------------------------------------------------ // nested Gaussian to init TSNE.Reduce() result class Gaussian { private Random rnd; private double mean; private double sd; public Gaussian(double mean, double sd, int seed) { this.rnd = new Random(seed); this.mean = mean; this.sd = sd; } public double NextGaussian() { double u1 = this.rnd.NextDouble(); double u2 = this.rnd.NextDouble(); double left = Math.Cos(2.0 * Math.PI * u1); double right = Math.Sqrt(-2.0 * Math.Log(u2)); double z = left * right; return this.mean + (z * this.sd); } } // Gaussian // ------------------------------------------------------ } // class TSNE // ======================================================== } // ns
Source data with backslash-newline inserted every 8 values.
# optdigits_test_012_15.txt # UCI digits, just 5 each of '0', '1', '2' # 64 pixel values in [0,16] then digit [0,1,2] # # remove the \-newline characters so that # each line has 64 pixel values followed by # the class label (0, 1, 2) # 0,0,5,13,9,1,0,0,\ 0,0,13,15,10,15,5,0,\ 0,3,15,2,0,11,8,0,\ 0,4,12,0,0,8,8,0,\ 0,5,8,0,0,9,8,0,\ 0,4,11,0,1,12,7,0,\ 0,2,14,5,10,12,0,0,\ 0,0,6,13,10,0,0,0, 0 # 0,0,1,9,15,11,0,0,\ 0,0,11,16,8,14,6,0,\ 0,2,16,10,0,9,9,0,\ 0,1,16,4,0,8,8,0,\ 0,4,16,4,0,8,8,0,\ 0,1,16,5,1,11,3,0,\ 0,0,12,12,10,10,0,0,\ 0,0,1,10,13,3,0,0, 0 # 0,0,3,13,11,7,0,0,\ 0,0,11,16,16,16,2,0,\ 0,4,16,9,1,14,2,0,\ 0,4,16,0,0,16,2,0,\ 0,0,16,1,0,12,8,0,\ 0,0,15,9,0,13,6,0,\ 0,0,9,14,9,14,1,0,\ 0,0,2,12,13,4,0,0, 0 # 0,0,10,14,11,3,0,0,\ 0,4,16,13,6,14,1,0,\ 0,4,16,2,0,11,7,0,\ 0,8,16,0,0,10,5,0,\ 0,8,16,0,0,14,4,0,\ 0,8,16,0,1,16,1,0,\ 0,4,16,1,11,15,0,0,\ 0,0,11,16,12,3,0,0, 0 # 0,0,6,14,10,2,0,0,\ 0,0,15,15,13,15,3,0,\ 0,2,16,10,0,13,9,0,\ 0,1,16,5,0,12,5,0,\ 0,0,16,3,0,13,6,0,\ 0,1,15,5,6,13,1,0,\ 0,0,16,11,14,10,0,0,\ 0,0,7,16,11,1,0,0, 0 # # 0,0,0,12,13,5,0,0,\ 0,0,0,11,16,9,0,0,\ 0,0,3,15,16,6,0,0,\ 0,7,15,16,16,2,0,0,\ 0,0,1,16,16,3,0,0,\ 0,0,1,16,16,6,0,0,\ 0,0,1,16,16,6,0,0,\ 0,0,0,11,16,10,0,0, 1 # 0,0,0,0,14,13,1,0,\ 0,0,0,5,16,16,2,0,\ 0,0,0,14,16,12,0,0,\ 0,1,10,16,16,12,0,0,\ 0,3,12,14,16,9,0,0,\ 0,0,0,5,16,15,0,0,\ 0,0,0,4,16,14,0,0,\ 0,0,0,1,13,16,1,0, 1 # 0,0,0,2,16,16,2,0,\ 0,0,0,4,16,16,2,0,\ 0,1,4,12,16,12,0,0,\ 0,7,16,16,16,12,0,0,\ 0,0,3,10,16,14,0,0,\ 0,0,0,8,16,12,0,0,\ 0,0,0,6,16,16,2,0,\ 0,0,0,2,12,15,4,0, 1 # 0,0,0,0,12,5,0,0,\ 0,0,0,2,16,12,0,0,\ 0,0,1,12,16,11,0,0,\ 0,2,12,16,16,10,0,0,\ 0,6,11,5,15,6,0,0,\ 0,0,0,1,16,9,0,0,\ 0,0,0,2,16,11,0,0,\ 0,0,0,3,16,8,0,0, 1 # 0,0,0,1,11,9,0,0,\ 0,0,0,7,16,13,0,0,\ 0,0,4,14,16,9,0,0,\ 0,10,16,11,16,8,0,0,\ 0,0,0,3,16,6,0,0,\ 0,0,0,3,16,8,0,0,\ 0,0,0,5,16,10,0,0,\ 0,0,0,2,14,6,0,0, 1 # # 0,0,0,4,15,12,0,0,\ 0,0,3,16,15,14,0,0,\ 0,0,8,13,8,16,0,0,\ 0,0,1,6,15,11,0,0,\ 0,1,8,13,15,1,0,0,\ 0,9,16,16,5,0,0,0,\ 0,3,13,16,16,11,5,0,\ 0,0,0,3,11,16,9,0, 2 # 0,0,5,12,1,0,0,0,\ 0,0,15,14,7,0,0,0,\ 0,0,13,1,12,0,0,0,\ 0,2,10,0,14,0,0,0,\ 0,0,2,0,16,1,0,0,\ 0,0,0,6,15,0,0,0,\ 0,0,9,16,15,9,8,2,\ 0,0,3,11,8,13,12,4, 2 # 0,0,8,16,5,0,0,0,\ 0,1,13,11,16,0,0,0,\ 0,0,10,0,13,3,0,0,\ 0,0,3,1,16,1,0,0,\ 0,0,0,9,12,0,0,0,\ 0,0,3,15,5,0,0,0,\ 0,0,14,15,8,8,3,0,\ 0,0,7,12,12,12,13,1, 2 # 0,0,0,5,14,12,2,0,\ 0,0,7,15,8,14,4,0,\ 0,0,6,2,3,13,1,0,\ 0,0,0,1,13,4,0,0,\ 0,0,1,11,9,0,0,0,\ 0,8,16,13,0,0,0,0,\ 0,5,14,16,11,2,0,0,\ 0,0,0,6,12,13,3,0, 2 # 0,0,0,3,15,10,1,0,\ 0,0,0,11,10,16,4,0,\ 0,0,0,12,1,15,6,0,\ 0,0,0,3,4,15,4,0,\ 0,0,0,6,15,6,0,0,\ 0,4,15,16,9,0,0,0,\ 0,0,13,16,15,9,3,0,\ 0,0,0,4,9,14,7,0, 2
You must be logged in to post a comment.