The Fisher-Yates Shuffle in JavaScript

Over the past few days I’ve been exploring the feasibility of creating a very lightweight set of code examples that create neural networks using the JavaScript language. There are several important utility functions needed to implement a neural network from scratch. One of them is to generate an array of indices, such as (0, 1, 2, 3), and then randomly permute them, for example to (2, 1, 3, 0).

The standard shuffle algorithm is called the Fisher-Yates shuffle (or less frequently, Knuth shuffle). I coded up a demo. The demo relies on a program-defined class that can generate reproducible pseudo-random numbers.

The calling code looks like:

let v = vecRange(8);  // creates [0, 1, 2, .. 7]
shuffle(v, 1);        // 1 is a seed value for randomness
// values in v now in different order

The whole idea here is that when training a neural network using back-propagation (also called stochastic gradient descent), it’s critically important that you visit the training items in a different, random order, on each pass (“epoch”).

OK, one more piece of the neural networks with JavaScript puzzle has fallen into place.



One of the very first jigsaw puzzles made was a map of Europe (1766). I did a Disney villainesses puzzle over Christmas – it was surprisingly difficult. Clever face makeup for Halloween. I like jigsaw puzzles that feature historical topics, such as this one of the famous Confederate general Robert E. Lee from the American Civil War.

Advertisements
Posted in Machine Learning

Calling Functions From Another File in Node.js

I was looking at running JavaScript in the Node.js environment. I wanted to put a couple of utility functions in one JavaScript file and then call the functions from another JavaScript file. I’ve done this before but I didn’t remember the exact syntax so I searched the Internet for information to remind me of the technique. How difficult could it be?

I was very surprised at the number of weak examples I found on the Internet. Too much chit-chat, and not enough example code.

To cut to the chase, here’s my preferred way to do it. First, I created a file named utilities.js like so:

// utilities.js
function matrixMake(rows, cols, val) {
  let result = [];  // avoid new Array()
  for (let i = 0; i < rows; ++i) {
     result[i] = [];
     for (let j = 0; j < cols; ++j) {
       result[i][j] = val;
     }
  }
  return result;
}

function matrixPrint(m, dec) {
  let rows = m.length; let cols = m[0].length;
  for (let i = 0; i < rows; ++i) {
    for (let j = 0; j < cols; ++j) {
      process.stdout.write(m[i][j].toFixed(dec));
      process.stdout.write("  ");
    }
    console.log("");
  }
}

// --------------------------------------------------

module.exports = {
  matrixMake,
  matrixPrint
};

There are two normal JavaScript function definitions followed by a Node.js module.exports block at the bottom.

I named the file to call the external functions test_utilities.js and coded it as:

// test_utilities.js

console.log("Node.js exporting demo \n");
let U = require("./utilities.js");

let m = U.matrixMake(3, 4, 0.0);
U.matrixPrint(m, 2);

Easy peasy. Use the require() function to load the external functions. The only gotcha was that even though I was running on Windows, I had to use the Linux path syntax with a forward slash instead of the Windows backslash.

The moral of the story is that if you want to explain a programming model to a software engineer, just show him some example code.



“Model Airplane News” magazine has been published continuously since 1929. From left: March 1932, July 1945, January 1958, August 1965, June 1975, October 1982, October 2018.

Posted in Miscellaneous

The Levenshtein Distance Between Two Strings

I was reviewing a technical book recently and the authors mentioned the Levenshtein distance, which is a metric I hadn’t thought about in a long time.

The Levenshtein distance between two strings is the minimum number of operations necessary to convert one string into the other, where an operation is an insertion, deletion, or substitution of a single character. For example, if:

A = cat
B = cots

then the Levenshtein distance is 2. Starting with “cat”, you must change the ‘a’ to ‘o’, and then add an ‘s’. There is a very cool visual algorithm to compute the Levenshtein distance. I’ll walk you through it. First construct this matrix:

       c   a   t 
   0   1   2   3 
c  1
o  2
t  3
s  4

The short string goes on top, the longer sting goes vertical and each letter gets a 1-based index value.

Now, working by columns, from top to bottom, for each cell in the matrix, first assign a 0 if the corresponding characters match, and a 1 if the characters do not match. Then adjust this temp value by assigning the minimum of these three values:

1. the value in the cell above plus 1.
2. the value in the cell to the left plus 1.
3. the value in the cell to the upper left plus the temp value.

So the first cell in the example above gets a temp value of 0 (because the ‘c’ characters match) then gets modified to 0 (because condition #3 holds). The first column ends up like this:

       c   a   t 
   0   1   2   3 
c  1   0
o  2   1
t  3   2
s  4   3

The first cell in the second column gets a temp value of 1 because ‘a’ does not equal ‘c’. The temp value gets modified to 1 because condition #2 holds. The second column ends up like this:

       c   a   t 
   0   1   2   3 
c  1   0   1
o  2   1   1
t  3   2   2
s  4   3   3

In the same way, the last column becomes:

       c   a   t 
   0   1   2   3 
c  1   0   1   2
o  2   1   1   2
t  3   2   2   1
s  4   3   3   2

Now, the Levenshtein distance is the value in the lower right corner, 2. Remarkable. This is an interesting algorithm and one which can be implemented relatively easily.



Five paintings that depict distance in some way. The center one is the famous “Christina’s World” (1948) by Andrew Wyeth.

Posted in Miscellaneous | 1 Comment

A Seedable Sort-Of Random Class in JavaScript

I’ve been thinking about the possibility of implementing a neural network mini-library in raw JavaScript. Neural networks use pseudo-random numbers in a minimum of two ways: initializing weight and bias values, and selecting a subset of training data items during training.

One of the weird things about JavaScript is that the built-in Math.random function doesn’t have a way to set the initial seed value. This means that there’s no way to get reproducible results. This is not acceptable when working with neural networks.

There are some external JavaScript libraries that have seedable RNGs, but I didn’t want any external dependencies. I’m very familiar RNGs and trust me, implementing a good RNG is incredibly difficult. But neural networks don’t need a good (cryptographically secure) RNG so I set out to create a very lightweight way to generate seedable sort-of random values.

After an hour or so, I had a working JavaScript class I named Erratic (to indicate that it isn’t close to being Random). Even though there aren’t many lines of code, there is a lot going on.

class Erratic {
  constructor(seed) {
    if (seed <= 0) {
      seed = 1;
    }
    this.seed = seed;
  }

  next() {
    let x = Math.sin(this.seed) * 1000;  // 
    let result = x - Math.floor(x);
    this.seed = result;
    return result;
  }

  nextInt(lo, hi) {
    let x = this.next();
    return Math.trunc((hi - lo) * x + lo);
  }
}

There’s no big moral to this story, but I’m one step closer to creating a neural network library in raw JavaScript.



Random = capricious = whimsical. The famous Campari annual calendars often feature actresses and celebrities in whimsical poses. From left: images from years 2002, 2006, 2009, 2005, 2010, 2012.

Posted in Machine Learning

Self-Organizing Maps Using Python

I wrote an article titled “Self-Organizing Maps Using Python” in the January 2019 issue of Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2019/01/01/self-organizing-maps-python.aspx.

A self-organizing map (SOM) is surprisingly difficult to describe. Briefly, the most common form of a SOM is an n x n grid/map where each square/node in the grid holds a data item. Data items in a square are similar to each other (so you can think of the squares as clusters), but also items in adjacent squares are more similar to items in squares that aren’t adjacent.

Suppose you have 150 data items. Each data item has four values, for example, (5.1, 3.5, 1.4, 0.8). And suppose you create a 30 x 30 SOM. You apply an algorithm that assigns a vector of size for to each of the 900 SOM squares/nodes. When finished, each of the 150 data items is implicitly associated to one of the 900 squares/nodes. Or equivalently, each square/node holds zero or more data items.

After a SOM has been created, it can then be used in several ways. Because each square/node has a coordinate, for example the upper left square/node is (0, 0), each of the 150 data is also associated with a coordinate, and so the data items can be mapped in two-dimensions. This is called dimensionality reduction for visualization.


A SOM used for dimensionality reduction for visualization.

Another way to use a SOM is to construct what’s called a U-Matrix. For the 150-items and 30×30 SOM, a U-Matrix would also be 30×30. Each cell of the U-Matrix has a value where a small value indicates the cell is close to its neighbor cells and a large value indicates the cell is far from its neighbors. If you plot the U-Matrix in grayscale, dark regions are similar, and light regions indicate boundaries between classes. Therefore you gain insights into the data, including an indication of how many natural clusters there are.


A U-Matrix created from a SOM. Light regions are boundaries so, if you squint, it looks like there are two or three natural groupings of the data.

Whew! Very complicated! But very cool, geek-wise.



Four paintings by artist Jeannette Guichard Bunel. I’m not a big fan of pop art styles in general but I think these examples are geek-wise cool. I like the bright colors.

Posted in Machine Learning

The 2019 TDWI Conference is Coming Soon

I will be presenting a keynote talk at the 2019 TDWI Conference. The event runs February 10-15 and will be in Las Vegas. See https://tdwi.org/events/conferences/las-vegas/Home.aspx. My talk is titled “The Present and Future of Machine Learning and Artificial Intelligence”.


The TDWI Web site home page.

TDWI stands for “transforming data with intelligence”. Until a couple of years ago, TDWI emphasized data and what I think of as classical data science (things like SQL, big data, R, and data visualization). But over the past couple of years, the event has moved towards topics related to machine learning.

Examples of some of the all-day courses at the 2019 TDWI event include “Introduction to AI and Deep Learning” (wow! that’s a lot to cover in one day!), and “Hands-On: Data Science in R”.

Most conferences have an educational component to some extent. The TDWI event leans heavily towards straightforward education in the sense that there are many multi-hour sessions that are like workshops, and even the shorter talks resemble college lectures. And this is a good thing. Too many conferences have sessions that are just fluff, or thinly disguised product marketing.

I’m often asked to present a keynote talk but I usually decline. When I attend or speak at a conference, in many cases I am unimpressed with the keynote talks. A good keynote is very difficult to deliver because the audience is usually very heterogeneous. A talk that’s too generic isn’t interesting, and a talk that’s too specific might not resonate with a large part of the audience.

I’m best when I speak about a very specific technical topic. So that’s what I’m thinking I’ll do at the 2019 TDWI keynote. I’m toying with several ideas but I’ll probably talk about recent work in anomaly detection with deep neural autoencoders and LSTM recurrent neural networks. And I’m thinking about forward-looking topics including homomorphic encryption, symbolic reasoning, and quantum computing. These are areas where I know some leading experts who work at my large tech company, so I can get good state-of-the-art information.


I made my keynote talk title quite generic so I could have lots of leeway about what topics I end up talking about.

I get far more requests to speak than I can accommodate. One of the reasons I agreed to speak at TDWI is that I like the people who run the event. Sure, they’re in business to make money, but TDWI has been around for many years, and I’ve gotten a good vibe from the people I’ve dealt with there.

The TDWI organizers told me that if you wish to go to the 2019 TDWI event in Las Vegas, when you register, use the promo code “Speaker100” and you’ll get $100 off your registration. If you attend, track me down and say “hello”.



A group of art students created abstract animation set to a live performance of “La Creation du monde, Op. 81”, by Leonard Bernstein. Here are five screenshots from http://www.youtube.com/watch?v=pYzH4349yCc.

Posted in Conferences

I Give a Talk About Neural Binary Classification Using PyTorch

I gave a talk about creating a binary classification model using the PyTorch neural network library. Most neural network beginners start by learning multiclass classification on the Iris Dataset, where the goal is to predict which of three species (setosa, vewrsicolor, virginica) an iris flower is, based on four predictor values (petal length and width, and sepal length and width).

Most people, including me, are mildly surprised when they discover that doing binary prediction, where the goal is to predict something that can be just one of two possible values, has some significant differences from multiclass classification.

I explained neural binary classification by using the standard Banknote Dataset. The goal is to predict whether a banknote (think euro or dollar bill) is authentic (0) or counterfeit/forgery (1) based on four numeric values of a scanned image of the banknote.

Cutting to the chase, to do binary classification, you 1.) encode the value to predict as 0 or 1 in the training and test data, 2.) specify 1 output node in the neural network, 3.) use logistic sigmoid for output layer/node activation, 4.) use binary cross entropy (or mean squared error) for the training loss/error function.

Understanding neural networks has many conceptual layers. At a high level, you just need to know how to write some code that works. At an intermediate level, you need to understand the effect of your design alternatives, such as using tanh vs. using ReLU for hidden layer activation. At lower levels, you can fully understand exactly what the neural network library code is doing for you.

Anyway, I’ll boast and claim I really, really understand neural binary classification because I have created such systems from scratch, without using any neural library. But as always, whenever I present a lecture, even on something I know inside out, I learn something new when attendees ask me a question and I have to think deeply to articulate the answer.



Binary star system GG Tauri-A. See http://www.sci-news.com/astronomy/science-ezekiels-wheel-in-wheel-binary-system-gg-tauri-a-02241.html.

Posted in Machine Learning, PyTorch