Neural Network Momentum using Python

I wrote an article titled “Neural Network Momentum using Python” in the August 2017 issue of Visual Studio Magazine. See

Momentum is a technique intended to speed up neural network training. Training a neural network is the process of determining the values of the weights and biases that essentially define the behavior of the network. The most common training algorithm is called back-propagation. Back-propagation is an iterative process which can take a very long time for complex neural networks.

The basic update for one weight is w = w + (-1 * lr * grad(w)). Put a bit differently:

delta = -1 * lr * grad(w)
w = w + delta

In words, the new weight value is the old value plus -1 times a small learning rate constant time the current gradient value of the weight. Th learning rate is a small constant, perhaps 0.01 but is determined by trial and error. The gradient is the Calculus derivative (just a number like -2.34) where the sign tells you if the weigh needs to increase or decrease and the magnitude influences how much the weight changes in one update.

Adding momentum is very easy and is:

delta = -1 * lr * grad(w)
w = w + delta + (mf * prev(delta))

In each weight update you add an additional term which is a momentum factor constant (typically something like 0.50) times the value of the delta from the previous update iteration.

I my article I go through the details of neural network momentum and give a complete demo program, written in Python, from scratch.

(See )

Posted in Machine Learning | Leave a comment

Replicator Neural Networks

A standard neural network classifier builds a model that predicts output values from input values. For example, the famous Iris Data has 150 items. Each item has four predictor variables (sepal length, sepal width, petal length, petal width) followed by one of three species to predict: setosa encoded as (1,0,0), versicolor encoded as (0,1,0), and virginica encoded as (0,0,1). The first item in the set is:

5.1, 3.5, 1.4, 0.2, 1, 0, 0

You train the neural classifier to find the defining weight constants so that given an input set of four values, the model correctly predicts the species.

A replicator neural network builds a model that predicts its own inputs. This sounds strange at first, but I’ll explain the point shortly. For the Iris Data, you’d take the data for one of the three species (say, setosa), remove the encoded labels to predict. The idea is to feed the replicator NN the four inputs and have the model spit back the same four values. For example, conceptually, the first line of a training data file would be:

5.1, 3.5, 1.4, 0.2, 5.1, 3.5, 1.4, 0.2

The first four values act as inputs and the next set of four values act as the targets. Of course even though you could explicitly duplicate the outputs, there’s no need to do so because you can duplicate the values programmatically. because they’re the same.

(Click image to enlarge)

So, what’s the point? A replicator neural network can be used for anomaly detection. For example, if the data is some sort of network packet data, then you have tons of “normal” data. You create a replicator NN. Now when new data comes in, you pass it to the replicator. If the replicator NN doesn’t predict the packet data closely enough (defining what that means is the hard part), then the incoming packet might be malicious.

I coded up a short demo using raw Python. Good fun!

The moral of the story is that getting and using training data that is labeled (called supervised training) — and so has known correct output values — is time-consuming and difficult. Replicator NNs are an example of a machine learning technique that doesn’t need labeled data (unsupervised training).

Posted in Machine Learning | 1 Comment

Deal and Reveal Blackjack Again

Many of the technical conferences I speak at are in Las Vegas. Vegas is a great town for conferences because, well, the town is basically designed to accommodate thousands of people. Hotel rates in Vegas are very reasonable, air travel is easy and relatively inexpensive, and there’s lots to do if you enjoy observing people and mathematics like I do.

When I’m at an event in Vegas, I usually try to get away for an hour or two and cruise through the casino gambling areas. It’s not uncommon for me to see a new game — Vegas is relentlessly trying to find new ways to separate visitors from their money. There are many companies that design new games and then showcase the games at one of the two big casino conferences (the Global Gaming Expo, and the Table Games Conference).

Of the dozens and dozens of new games invented each year, only about two or three ever make it into a casino for a trial run of a few months so that the Nevada Gaming Commission and the casinos are satisfied that the new game makes money (casinos) but not too much money (Gaming Commission).

(See )

While I was in Vegas for a conference recently, I walked through the Palazzo casino (connected to the Venetian where my conference was at) and I noticed a table game I hadn’t seen in several months. It’s a variation of Blackjack and is called “Deal & Reveal”. Briefly, thee game is much like regular Blackjack. Recall that you (the player) bet (say $25) and get two cards. The dealer gets two cards, one face down and one face up, so you know one of her cards. In Deal & Reveal, if the dealer’s up card is a 2, 3, 4, 5, or 6, then before you decide to hit or stand, she turns over her down car so you can see both cards! If the dealer’s up card is 7, 8, 9, 10, J, Q, K then she doesn’t do anything. I’ll explain when the dealer’s up card is an Ace in a moment. It would seem like this would give the player a big advantage, but surprisingly, seeing both of the dealer’s cards helps you a lot less than you’d expect.

An interesting detail is that when the dealer’s up card is an Ace, the dealer immediately checks to see if her down card is any ten, meaning she has Blackjack. Normally you’d lose (omitting the detail of Insurance) but in Deal & Reveal, if the dealer’s down card is any ten, she discards it and you get a second chance. This is psychologically very powerful, but again, mathematically it doesn’t help you as much as you’d think.

I’ve left out several important details. You can look the game up on the Internet or click on the image of the Rule Card I picked up to enlarge it so you can read it.

(Click on image to enlarge)

The moral here is for me only: My love of combinatorial math, probability, and computer science was ignited in part by my love of games such as poker and chess, when I was young. Las Vegas is a intriguing place for me because of the math and the psychology. I do have some minor qualms about the ethics of gambling but I think I’m over-sensitive to those kinds of issues. I have more fun analyzing the games than actually playing them. Usually.

Posted in Conferences, Miscellaneous | Leave a comment

Time Series Regression using a Raw Python Neural Network

I’ve been looking at time series regression recently. Just for fun I coded up an example using a raw Python (with the NumPy library for numerical functions) neural network. For my example I used a standard benchmark data set that has the total number of airline passengers for the 144 months from January 1949 through December 1960.

(Click image to enlarge)

I used a rolling window approach, with a window size of 4. This means that I used each consecutive four months to predict the next month. So the first data item is (1.12 1.18, 1.32, 1.29, 1.21). I normalized the raw data by dividing each passenger count by 100,000. So the first item means in months 1-4 there were 112,000, 118,000, 132,000, and 129,000 passengers. Those values are used to predict the passenger count for month 5, which is 121,000. The second item is (1.18, 1.32, 1.29, 1.21, 1.35) — the counts for months 2-5 are used to predict the count for month 6.

After I created my prediction model, I used it to print out the actual and predicted passenger counts. I dropped that data into Excel and made a graph. The model worked pretty well. Time series regression can be extremely complicated, but this was an interesting little exercise.

Posted in Machine Learning | Leave a comment

My Four Most Common Python NumPy Array Initializations

I use several different programming languages. Whenever I switch between languages, there’s always an adjustment time in my head. For some reason, whenever I switch from C# to Python with NumPy, it always takes me about an hour to start thinking fully in Python. In particular, it always takes me time to recall Python/NumPy array initializations.

One of the causes of this is that C# has basically two ways to instantiate an array:

double[] arr1 = new double[4];
double[] arr2 = new double[] { 1.0, 5.0, 2.0 };

But Python NumPy has many ways to instantiate. The ones I use most are often are np.zeros(), np.array(), np.full(), and a return from np.random.choice(). For example:

arr1 = np.zeros(shape=5, dtype=np.float32) # 5 0.0 cells
arr2 = np.array([17,2,5,0,5,12],
arr3 = np.array(range(0,5), # [0,1,2,3,4]
arr4 = np.full(shape=3, fill_value=0.01, dtype=np.float64)
arr5 = np.random.choice(7, 2) # 2 random ints between [0,6]
mat1 = np.zeros(shape=(2,3), dtype=np.float32) # 2x3 matrix

There are several implications. One is that I prefer programming languages that have sparse feature sets — I prefer to know everything about a small language. For example, the np.zeros() function is redundant in a sense because you can get the same effect using np.full(fill_value=0.0).

Another implication is the high cost of context switching when programming. It costs time and effort to switch languages (for example, working with C# on Monday, Wednesday, Friday, and with Python on Tuesday, Thursday). Or doing programming from 9:00 AM to 11:00 AM, then switching over to email tasks, then switching back to programming.

Posted in Machine Learning | Leave a comment

I do an Interview about Machine Learning on Microsoft’s Channel 9

Channel 9 is a Microsoft community video Web site. There are all kinds of interesting videos on Channel 9, but most of the videos are aimed at software developers.

I was recently asked to do a short (6-minute) interview on Channel 9. The topic was machine learning and the upcoming DevIntersection conference where I’ll be speaking about the Microsoft CNTK code library. See

The interview host was Richard Campbell. I’ve known Richard for a long time because we’ve both spoken at Microsoft conferences for quite a few years. Richard is a very bright guy, and as much as anyone I know, he has a really good understanding of the big picture of software development and technology. And he’s very articulate too — a relatively rare characteristic for deep technical experts.

Anyway, we chatted and I explained the differences between data science, machine learning, deep learning, and artificial intelligence. The video interview recording session took place at the Microsoft Production Studios in Building 25. The studios there are quite impressive and very professional.

Anyway, if you go to the DevIntersection Conference, October 31 through November 2, 2017, please seek me out before or after my CNTK talk and say “hello”. See the conference site at:

Posted in Conferences, Machine Learning | Leave a comment

More About Iterating Through a CNTK Minibatch Input Data Object

Recently, I wanted to iterate through a built-in CNTK library input data structure, a MinibatchData object, or just minibatch for short. With the help of a work colleague (Nikos), I finally figured out how to do it, but the technique is ugly. In a previous post I described how to walk through a text file in CNTK data format, which, although a bit tricky in the details, is simple in principle.

The alternative I was looking at is to iterate through a CNTK mini-batch object. Although the code (below) is short, it’s very tricky, and not at all obvious. My demo program creates a special CNTK reader. The reader has a next_minibatch() function which returns a minibatch which is actually a complex Python Dictionary collection.

To get at the data in the minibatch Dictionary, you have to get the keys, create a list from the keys (because weirdly, the keys aren’t enumerable), then get the data using the keys-list and the asarray() function. But unfortunately, the order of the keys in the Dictionary can vary from run to run, so the technique isn’t practical unless you sort the list holding the Dictionary, which is way more trouble than it’s worth. In short, to walk through CNTK input values, you’re best off using np.loadtxt() to iterate through the source data file rather than iterating through the minibatch collection that holds the data in memory.

I really like CNTK a lot. But this is a tiny bit crazy. It shouldn’t be that hard to walk through a critically important data structure. I bet that the CNTK team will be adding an easy-access function at some point in the near future. To be fair, CNTK was only released about 9 weeks ago, so a little roughness is expected. And in my opinion, CNTK is much, much easier to use than its direct competitor, the TensorFlow library.

# use Nikos solution to fetch contents of minibatch

import numpy as np
import cntk as C

def create_reader(path, is_training, input_dim, output_dim):
    features ='predictors', shape=input_dim,
    labels ='passengers', shape=output_dim,
  )), randomize = is_training,
    max_sweeps = if is_training else 1)

the_file = "tsr_sample_cntk.txt"

input_dim = 4
output_dim = 1
input_Var = C.ops.input(input_dim, np.float32)
label_Var = C.ops.input(output_dim, np.float32)

rdr = create_reader(the_file, False, input_dim, output_dim)

my_input_map = {
  input_Var : rdr.streams.features,
  label_Var : rdr.streams.labels

print("\nFeatures and Labels: ")
for i in range(0,6):  # each data item
  mb = rdr.next_minibatch(1, input_map = my_input_map)
  keys = list(mb.keys())
  print(mb[keys[0]].asarray()) # no order guarantee !!

print("\nEnd experiment \n")

Posted in CNTK, Machine Learning | Leave a comment