Understanding the LSTM Input-Output Process

An LSTM cell (“long short-term memory”) is a software component that can be used to create a neural system that can make predictions on sequences of input values. I’m been looking very closely at LSTMs using the CNTK code library.

My goal was to completely understand what the following CNTK code does:

# lstm_peek.py
# explore CNTK LSTM I/O

import numpy as np
import cntk as C

input_dim = 2   # context pattern window
output_dim = 1

X = C.ops.sequence.input_variable(input_dim)

model = None
with C.layers.default_options():
  model = C.layers.Recurrence(C.layers.LSTM(shape=4))(X)
  model = C.sequence.last(model)
  model = C.layers.Dense(output_dim)(model) 

inpt_array = np.array([[1.0, 2.0],
                       [3.0, 4.0],
                       [5.0, 6.0]], dtype=np.float32)

result = model.eval({X:inpt_array})

Briefly, three sequences of two items each are fed to an LSTM cell that has a state-memory of size 4. The (unseen) internal output is three states of size 4 each. The last of these states of 4 items is fetched and then fed to a neural layer and condensed to a single value, which is then displayed.

Whew! The code is short but very deep and it took me several hours to completely understand what was happening.

Unfortunately, I can’t show you the exact calculations because the weights and biases in the LSTM and the last neural layer are initialized to random values.

Moral of the story: When using a code library like CNTK, a lot of black-box components are used. But if you spend some time you can understand their behavior.

Posted in CNTK, Machine Learning | Leave a comment

Machine Learning and Cell Phone Stores

Every now and then I do some semi-random searches on the Internet. My search term is usually generated by a news headline. The idea is to find unexpected areas for machine learning. For example, I recently read a story about researchers who were able to accurately predict how a town would vote in an election, based on street-map images. My reaction was, “Cool idea! Why didn’t I think of that?”

So, I’ll do a random Internet search occasionally, hoping to get some sort of inspiration or insight. Naturally, the vast majority of these Internet searches turn up absolutely nothing of interest.

A few days ago, a local news headline was something about a store employee being shot during a cell phone store robbery. I was puzzled and mildly distrubed– why would anyone rob a cell phone store in the first place, and then why would they try to shoot someone? It just made no sense at all to me.

So, I searched the Web for “cell phone store robbery”. I expected to get close to zero hits. But there were hundreds of thousands of image results. I was very surprised. Unfortunately though, examining the images didn’t yield any interesting insights — they were just images of criminals and robberies, nothing more.

The moral of the story: There’s no algorithm for generating inspiration. And I’m glad I don’t work at a cell phone store.

“Night of Inspiration” – Leonid Afremov

Posted in Machine Learning | Leave a comment

Matrix Functions in C# for Machine Learning

The C# language is the most common language used by many enterprise developers who use the Microsoft technology stack. The Python language is very common for researchers and engineers who write machine learning code. I intend to refactor some Python code to C#. Python has all kinds of built-in functions that are useful for ML but for C#, you have to write several helper functions.

For example, it’s useful to have routines to create a matrix from an array, print a matrix, matrix multiplication, and so on.

Here’s a way to create a matrix from a C# array:

static float[][] ArrayToMatrix(float[] arr, int rows, int cols)
  float[][] result = new float[rows][];
  for (int i = 0; i < rows; ++i)
    result[i] = new float[cols];

  int k = 0;
  for (int i = 0; i < rows; ++i)
    for (int j = 0; j < cols; ++j)
      result[i][j] = arr[k++];

  return result;

Note that 32-bit type float is more common than 64-bit type double. Here’s a way to print a matrix:

static void Print(float[][] matrix, int dec)
  for (int i = 0; i < matrix.Length; ++i) {
    for (int j = 0; j < matrix[0].Length; ++j) {
      Console.Write(matrix[i][j].ToString("F" + dec) + " ");

And matrix multiplcation:

static float[][] MatrixProduct(float[][] a, float[][] b)
  int aRows = a.Length; int aCols = a[0].Length;
  int bRows = b.Length; int bCols = b[0].Length;
  if (aCols != bRows)
        throw new Exception("xxx");

  float[][] result = new float[aRows][];
  for (int i = 0; i < aRows; ++i)
    result[i] = new float[bCols];

  for (int i = 0; i < aRows; ++i) // each row of A
    for (int j = 0; j < bCols; ++j) // each col of B
      for (int k = 0; k < aCols; ++k) // could use k < bRows
        result[i][j] += a[i][k] * b[k][j];
  return result;

And here’s a liitle demo:

static void Main(string[] args)
  float[][] A = ArrayToMatrix(new float[] { 1, 2, 3,
                                            4, 5, 6 }, 
                                            2, 3);  // 2x3
  float[][] B = ArrayToMatrix(new float[] { 1, 2, 3, 4,
                                            5, 6, 7, 8,
                                            9, 10, 11, 12},
                                            3, 4);   // 3x4
  float[][] AB = MatrixProduct(A, B);  // 2x4
  Print(AB, 4);
} // Main

In addition to these, you need many more functuioms depending on exactly what kind of ML code you’re writing. For example, it’s likely you need matrix addition, element-wise multiplication, element-wise addition, and so on.

Chester – Watergate Street Looking East – Louise Rayner, approx. 1875. Victorian urban matrix.

Posted in Machine Learning | Leave a comment

Iterating Through a CNTK-Format Data File

CNTK is Microsoft’s open source library for deep neural networks. A key component in CNTK code is a mini-batch object. A mini-batch object holds training data (input values and known correct output values) and a bunch of them are sent to a CNTK training function.

I decided to see if I could iterate though a data file using CNTK functions. I didn’t have a concrete idea of why this might be useful, but I do have a few thoughts that possibly CNTK could be used for numeric processing in general, in addition to creating deep neural networks.

Anyway, after some experimentation, I succeeded. I created a small dummy text file in CNTK format:

|id 001 |data 11
|id 002 |data 12
|id 003 |data 13
|id 004 |data 14
|id 005 |data 15
|id 006 |data 16
|id 007 |data 17
|id 008 |data 18
|id 009 |data 19

The I wrote a demo program that uses CNTK stream functions to read four items at a time into a mini-batch, and then walk through each of the four items in the mini-batch.

The good news is that I can now iterate though a CNTK file using CNTK stream functions. The bad news (for now at least) is that data in a mini-batch isn’t particularly useful if it isn’t going to be sent to a training function. In my demo, I cast each item to an array using the asarray() function. But I could have just read data directly without using CNTK at all, with the numpy loadtxt() function.

Hmmmm. I’m not entirely convinced that I fully understand the underlying mechanism here so I’ll keep probing. I still think there might be some clever, out-of-the-box ways to use the CNTK library.

Posted in CNTK, Machine Learning | 2 Comments

Encoding Data for Machine Learning using Excel

In many machine learning situations, the most time-consuming and annoying part of the process is getting data ready. A common task is to encode categorical data. For example, suppose you have a Color variable that can be one of three colors (red, white, blue), and the raw data encodes those three colors as 0, 1, or 2. If you are working with a neural network, you’ll want to use 1-of-(N-1) encoding so that 0 = (1, 0), 1 = (0, 1), and 2 = (-1, -1).

If there’s a lot of data (say more than 200 items) then the best approach is usually to write a utility program to do all the work. But for quick and dirty jobs, Excel is a great option.

Suppose raw data is:

age	color	foo
23	0	0
34	1	1
48	2	0
50	1	1
42	2	1
36	0	0

The first thing I’d do is take care of the purely numeric columns, like Age. For example, I’d use min-max normalization where each value v in the column becomes (v – min) / (max – min). Easy.

Now for the Color column, first I’d replace the 0, 1, 2 with A, B, C (to get rid of 0s and 1s). Next, I’d replace:

A with 1 0
B with 0 1
C with -1 -1

Then in Excel, I’d use the Text to Columns feature to expand the single cells to multiple columns. Neat! This technique also works for 1-of-N encoding where A would be replaced by 1 0 0 and so on.

For the Foo column, which is binary, I’d replace 0 with -1 and leave the 1 values alone.

Moral of the story: Excel is pretty awesome.

Artist Tatsuo Horiuchi creates paintings using only Excel. Crazy Awesome.

Posted in Machine Learning | Leave a comment

Logistic Regression using Python

I wrote an article in the January 2018 issue of Visual Studio Magazine titled “Logistic Regression using Python. See https://visualstudiomagazine.com/articles/2018/01/04/logistic-regression.aspx.

The goal of a binary classification problem is to predict a class label, which can take one of two possible values, based on the values of two or more predictor variables (sometimes called features in machine language terminology). For example, you might want to predict the sex (male = 0, female = 1) of a person based on their age, annual income and height.

There are several quite different techniques you can use to tackle a binary classification problem. Logistic Regression is one of the simplest. The article shows how to implement logistic regression to solve a binary classification problem, using a program coded in Python, the current programming language of choice for machine learning.

There are many existing systems you can use to perform binary classification with logistic regression. However, when using an external library, your system may be difficult (for technical reasons) or impossible (for legal reasons) to integrate into an existing system. Coding logistic regression from scratch gives you full control over your code, and allows you to fully understand how your prediction system works.

The strength of logistic regression for binary classification is simplicity. The major disadvantage of logistic regression is that it only works well for data that is mostly linearly separable. That is, data where you can conceptually separate the two classes to predict with a straight line.

The major alternative to logistic regression for binary classification is to use a neural network. Neural networks can deal with complex data. Another alternative is to use a support vector machine (SVM). SVMs have fallen out of favor, at least among all my research and engineering colleagues. SVMs are more complex than neural networks and rarely perform as well as neural networks in practice.

Posted in Machine Learning | Leave a comment

NFL 2017 Week 19 (Division Playoffs) Predictions – Zoltar Says Flip a Coin

Zoltar is my NFL football machine learning prediction system. It’s a hybrid system that uses a custom reinforcement learning algorithm plus a neural network. Here are Zoltar’s predictions for week #19 (the Division Playoff games) of the 2017 NFL season:

Zoltar:     falcons  by    0  dog =      eagles    Vegas:     falcons  by  2.5
Zoltar:    patriots  by   10  dog =      titans    Vegas:    patriots  by 13.5
Zoltar:    steelers  by   10  dog =     jaguars    Vegas:    steelers  by  7.5
Zoltar:     vikings  by    6  dog =      saints    Vegas:     vikings  by    4

Zoltar theoretically suggests betting when the Vegas line is more than 4.0 points different from Zoltar’s prediction (for playoff games — 3.0 points difference for regular season). There are four games and Zoltar agrees with the Vegas point spread, to within 4 points, on all four games. So, Zoltar has no hypothetical suggestions.

But if forced to give advice, Zoltar would say:

1. Take the underdog Eagles against the Falcons.
2. Take the underdog Titans against the Patriots.
3. Take the favored Steelers against the Jaguars.
4. Take the favored Vikings against the Saints.


Zoltar went 2-0 against the Vegas point spread last week, correctly liking the Vegas underdog Falcons (who won outright against the Rams) and the underdog Bills who lost only by 7 points and thus prevented the Jaguars from covering the 8-point spread.

For the 2017 regular season, against the Vegas point spread, Zoltar finished a pretty good 51-32 (61% accuracy). If you must bet $110 to win $100 (typical in Vegas) then you must theoretically predict with 53% or better accuracy to make money, but realistically you must predict at 60% or better accuracy.

I tracked how well Zoltar, Bing Predicts, and the Vegas line do when just predicting which team will win. For the 2017 regular season, just predicting the winning team, Zoltar finished 178-78 (70% accuracy), Bing finished 168-88 (66% accuracy), and Vegas finished 162-86 (65% accuracy). The best humans are typically about 67% accurate predicting winners, so Zoltar is slightly better than the best human experts. Bing did OK too, beating Vegas by a percentage point.

Note: Some of my numbers could be off a bit because of some weirdness with games played outside the U.S. (London and Mexico) earlier this season.

My Zoltar system is named after the arcade fortune telling machine.

Posted in Zoltar | Leave a comment