Processing the Contents of a CNTK Mini-Batch

CNTK v2 is a relatively new (mid-2017) deep learning code library. One of the advantages of using CNTK, rather than coding from scratch, is that CNTK has data reader functions. If you can fit all your data into memory, then readers aren’t a big deal. But for large datasets that won’t fit into memory, coding readers from scratch can take days or even longer.

The key idea is that CNTK has a function called next_minibatch that reads a chunk of data from a file for processing, and returns that chunk as a mini-batch object. Then there are all kinds of built-in CNTK functions that can process the mini-batch.

But what if you need to do some custom processing of a mini-batch? For example, in many regression problems, you want to compute a custom accuracy metric. I spent quite a bit of time exploring and finally came up with a technique to iterate through the contents of a CNTK mini-batch.

As often happens, once I knew the tricks, the idea is simple. But it wasn’t so simple when I was working things out. Here’s the code for my demo.

# process_minibatch.py
# fetch contents of a CNTK minibatch

import numpy as np
import cntk as C

def create_reader(path, is_training, input_dim, output_dim):
  features_strm = C.io.StreamDef(field='predictors', \
    shape=input_dim, is_sparse=False)
  labels_strm = C.io.StreamDef(field='passengers', \
    shape=output_dim, is_sparse=False)
  both_strms = C.io.StreamDefs(features_nm = features_strm, \
    labels_nm = labels_strm)
  deserial = C.io.CTFDeserializer(path, both_strms)
  sweeps = C.io.INFINITELY_REPEAT if is_training else 1
  mb_source = C.io.MinibatchSource(deserial, \
    randomize = is_training, max_sweeps = sweeps)
  return mb_source

the_file = "tsr_sample_cntk.txt"  # 6 items

input_dim = 4
output_dim = 1
feature_Var = C.ops.input(input_dim, np.float32)
label_Var = C.ops.input(output_dim, np.float32)

rdr = create_reader(the_file, False, input_dim, output_dim)

my_input_map = {
  feature_Var : rdr.streams.features_nm,
  label_Var : rdr.streams.labels_nm
}

np.set_printoptions(precision=2)
print("\nProcessing three mini-batches of size 2 items each: \n")
for i in range(0,3):  # each data item
  mb = rdr.next_minibatch(2, input_map = my_input_map) # get 2
  for j in range(0,2):  # process each of the two items
    x = mb[feature_Var].asarray()[j]  # this is the magic !
    y = mb[label_Var].asarray()[j]
    print(" x = " + str(x[0]))
    print(" y = " + str(y[0,0]))
    print("")
  
  print("=====")

print("\nEnd experiment \n")

There is an absolute ton going on in this demo code, but the majority is boiler plate — for a given data file I can make a few changes and am good to go.

The moral: Technical problems almost always have a solution. The question is how much time and effort you can afford to put in. On the other hand, there are many problems in “life” that simply cannot be solved. I prefer my technical world.


“Persistence of Memory II” – Craig Royal

Advertisements
Posted in CNTK, Machine Learning | Leave a comment

Time Series Regression using CNTK LSTM

Over the past few weeks I’ve been spending some time looking at LSTM networks using CNTK. LSTM (long short-term memory) networks are useful when predicting sequences, such as the next word in a sentence when you know the first few words. Regular neural networks can’t easily deal with sequences because NNs have no memory – each input is independent. LSTMs have a form of memory so they can deal with sequences.

The CNTK v2 library is a code library of sophisticated deep neural network modules, including LSTMs. Rather than try to code a CNTK LSTM demo on a word sequencing problem, I figured it’d be easier to work with plain numeric data.

My target problem was to create a predictive model of the trigonometric sine function. Obviously this isn’t useful, but it’s a good, simple problem — I wanted to focus on understanding LSTMs without getting distracted by details of a realistic problem.

Somewhat to my surprise, I discovered that the CNTK documentation had an example of predicting the sine function using an LSTM. Easy!

Well, not so easy. The documentation example was quite difficult to understand. So I set out to deconstruct the documentation example one chunk of code at a time, figure out what each chunk of code did, and then reconstruct the example from scratch, removing all peripheral code that deal with data generation, plotting and so on.

It took a bit of time, but I eventually got a model up and running. In the image, you can see the LSTM model predicts the sine function fairly well, as you’d expect for an easy problem.

My knowledge of the CNTK library is slowly but surely building up. At some point I should probably put together a long document, or a short e-book, that walks through CNTK installation, logistic regression, neural networks, LSTM networks, and convolutional neural networks.

Posted in CNTK, Machine Learning | Leave a comment

The 2017 DevIntersection Conference is Coming

The DevIntersection Conference is one of my three favorite events for software developers who use Microsoft technologies. The 2017 event will be held from Monday, October 30 through Friday, November 3, in Las Vegas. See http://www.devintersection.com.

I will be speaking at DevIntersection, but I’ll also be attending as many presentations as I can. All of the speakers at DevIntersection are very, very good — as opposed to some company-specific conferences where it’s possible to get a real dud of a speaker. I’m especially looking forward to hearing Microsoft VPs Scott Guthrie (Azure) and Steve Guggenheimer (AI Business).


DevIntersection 2015

My talk will be about machine learning using the new CNTK v2 library. I remember speaking at the first DevIntersection in 2012, where I spoke about machine learning — and only a handful of people showed up to my talk. But interest in machine learning has grown tremendously and the last couple of years my talks on ML have been very well attended.

This points out that one of the benefits of attending a conference like DevIntersection is that you get to see industry trends early on. DevIntersection has a very good balance of different types of talks, and almost no marketing-style talk (and the vaguely marketing content that does appear is usually interesting and useful).


My talk at the 2015 DevIntersection

Of course the problem is that conferences such as DevIntersection are pricey and you can’t afford to pay your own way. So, you have to convince your employer of the value of footing your bill. In addition to all the obvious, objective, benefits, I find that just getting away for a few days and mingling with fellow software developers has a huge subjective value. I always return from DevIntersection greatly energized and productive.

Interestingly, the 2017 DevIntersection runs the same days as the SEMA car show in Las Vegas — an absolutely enormous event with around 130,000 attendees. The town will be packed and that brings a lot of energy. And if that’s not enough, Halloween falls during DevIntersection too. If you haven’t seen a Halloween in Vegas, well, all can say is, it’s a pretty crazy night.

Posted in CNTK, Conferences, Machine Learning | Leave a comment

NFL 2017 Week 3 Predictions – Zoltar Likes Three Underdogs (Sort Of)

Zoltar is my NFL football prediction computer program. Here are Zoltar’s predictions for week 3 of the 2017 NFL season:

Zoltar:        rams  by    0  dog = fortyniners    Vegas:        rams  by  2.5
Zoltar:      ravens  by    3  dog =     jaguars    Vegas:      ravens  by    4
Zoltar:     broncos  by    0  dog =       bills    Vegas:     broncos  by    3
Zoltar:    dolphins  by    4  dog =        jets    Vegas:    dolphins  by    6
Zoltar:    patriots  by    7  dog =      texans    Vegas:    patriots  by   13
Zoltar:    steelers  by    6  dog =       bears    Vegas:    steelers  by  7.5
Zoltar:     falcons  by    0  dog =       lions    Vegas:     falcons  by    3
Zoltar:      giants  by    0  dog =      eagles    Vegas:      eagles  by  3.5
Zoltar:  buccaneers  by    0  dog =     vikings    Vegas:  buccaneers  by    0
Zoltar:    panthers  by    6  dog =      saints    Vegas:    panthers  by    6
Zoltar:       colts  by    9  dog =      browns    Vegas:      browns  by    1
Zoltar:      titans  by    1  dog =    seahawks    Vegas:      titans  by  2.5
Zoltar:     packers  by    7  dog =     bengals    Vegas:     packers  by    9
Zoltar:      chiefs  by    5  dog =    chargers    Vegas:      chiefs  by    3
Zoltar:     raiders  by    1  dog =    redskins    Vegas:     raiders  by    3
Zoltar:     cowboys  by    2  dog =   cardinals    Vegas:     cowboys  by    3

Zoltar theoretically suggests betting when the Vegas line is more than 3.0 points different from Zoltar’s prediction. For week 3 Zoltar has three hypothetical (and mildly questionable) suggestions.

1. Zoltar (sort of) likes the Vegas underdog Texans against the Patriots. Zoltar thinks the Patriots are 7 points better than the Texans, but Vegas has the Patriots listed as a huge favorite, by 13.0 points. So, Zoltar thinks the Patriots will win, but not by more than 13 points. Historically, Zoltar has done very poorly when predicting against the Patriots.

2. Zoltar likes the Vegas underdog Giants against the Eagles. Zoltar thinks the two teams are evenly matched, but Vegas has the Eagles as 3.5 point favorites. So, again, Zoltar believes the Vegas favorite won’t cover the spread.

3. Zoltar likes the Vegas favorite Colts against the Browns. Zoltar thinks the Colts are 9 points better than the Browns but Vegas has the Browns winning by 1.0 point. The Colts QB is injured, but Zoltar thinks Vegas has overestimated that impact.

==

Week #2 was excellent for Zoltar, who went 5-0 against the Vegas point spread, correctly predicting three favorites to cover, and two favorites not to cover. For the season, Zoltar is 6-3 (67% accuracy) against the spread (which points out how badly Zoltar did in week #1).

I also track how well Zoltar does when just predicting which team will win. This isn’t really useful except for parlay betting. Zoltar was a very good 13-3 just predicting winners. For comparison purposes, I track how well Bing/Cortana does. Bing/Cortana uses a crowd consensus scheme, which makes sense for predicting winners but not for prediction against the point spread. In week 2, Bing/Cortana was a decent 11-5 just predicting winners. For the season, just predicting winners, Zoltar is 24-7 (77% accuracy) and Bing/Cortana is 19-12 (61% accuracy).


Zoltar first appeared in the 1988 movie “Big” starring Tom Hanks

Posted in Machine Learning, Zoltar | Leave a comment

I Track Down a Nasty Bug in a CNTK Classification Program

I’m a big fan of the CNTK library. But . . .

CNTK is a powerful library of machine learning code. Whenever I look at a machine learning library (such as TensorFlow, Keras, Caffe, Theano, scikit-learn, Torch) my first example is usually to create a neural network classifier for Fisher’s Iris Dataset. The dataset has 150 items where each item represents one of three species of iris (setosa, versicolor, virginica). Each item has four predictor variables (sepal length and width, and petal length and width). There are 50 of each species.

Version 2.0 of CNTK was released in June 2017. I had very little trouble getting a nice NN classifier up and running. But when version 2.1 was released, my demo program no longer worked. Odd. Then a couple of days ago, CNTK version 2.2 was released and my NN classification program that worked perfectly on v2.0 still didn’t work.

When I say “didn’t work”, I mean that the program ran but it just didn’t learn. After training, the classification accuracy was 0.3333 = 33% = one-third. In other words, because there are only three species and the accuracy was 33%, the classifier was just guessing.

I spent hours and hours trying to track down the problem. At last I determined the problem occurred in this code:

print("Creating a 4-10-3 tanh softmax NN for Iris data ") 
with default_options(init = \
  glorot_uniform()):
  hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh,
    name='hidLayer')(input_Var)  
  oLayer = Dense(output_dim, activation=C.ops.softmax,
    name='outLayer')(hLayer)
nnet = oLayer

This code defines the NN architecture. It may look a bit tricky, but it’s fairly simple, and I spent most of my time looking at other parts of the code, especially the data reading routines.

It turns out that the problem was in the init = glorot_uniform() initialization. When creating a NN you must initialize all the weights and biases to small random values. The Glorot technique is one of many supported by CNTK.

Well, after hours of trying almost everything else, I finally replaced the initializer, and everything magically worked:

print("Creating a 4-10-3 tanh softmax NN for Iris data ") 
with default_options(init = \
  C.initializer.uniform(scale=0.10, seed=1)):
  hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh,
    name='hidLayer')(input_Var)  
  oLayer = Dense(output_dim, activation=C.ops.softmax,
    name='outLayer')(hLayer)
nnet = oLayer

The uniform initializer is the simplest and most primitive initialization technique.

I don’t know if there is a bug in the CNTK code, or if there was just a design change in the Glorot code.

The moral of the story is that using a machine learning code library usually saves a lot of time, but debugging library code is much more difficult than debugging raw code.


“Discovery” – by Sandra Bauser

Posted in CNTK, Machine Learning | Leave a comment

Time Series Regression with Keras over CNTK with Multi-Data Input Sequences

Time series regression is a very challenging class of problem. A classic benchmark dataset is the international airline passenger data. It covers 144 months, from January 1949 (when there were 112,000 passengers) through December 1960 (when there were 432,000 passengers).

A relatively new approach for time series regression is to use what’s called an LSTM recurrent neural network. A regular neural network has no memory of previous input values, but an LSTM network does have a memory, so it seems like a natural technique to use.

In a recent, previous experiment, I modeled the airline passenger data using the Keras library running on top of the CNTK engine, with simple “current-next” data that looks like:

|curr 1.12 |next 1.18
|curr 1.18 |next 1.32
|curr 1.32 |next 1.29
|curr 1.29 |next 1.21
|curr 1.21 |next 1.35
|curr 1.35 |next 1.48
. . .

My results were OK, but not great. So, this morning I tried using data where each input sequence has four values instead of just one:

|predictors 1.12 1.18 1.32 1.29 |passengers 1.21
|predictors 1.18 1.32 1.29 1.21 |passengers 1.35
|predictors 1.32 1.29 1.21 1.35 |passengers 1.48
. . .

Like everyone, much of deep learning is relatively new to me. Writing the LSTM modeling code was very difficult and time-consuming. This is due in large part to the fact that Keras and CNTK have only been around for a short time, so there’s very little good documentation, and very few code examples available.

Anyway, I eventually got my demo working (I think — the demo code isn’t long but it’s exceptionally tricky in my opinion). My result was . . . no better than using the simpler current-next approach, and arguably, the result was slightly worse. The graphs below show my initial “current-next” approach on top, and the multi-item input sequence approach on the bottom.


Current-Next Data Approach



Multiple Data Points in Each Input Sequence Approach

At this point, I’m ready to try time series regression using straight CNTK (without the Keras wrapper). A few early attempts failed but I’m confident I fully understand the problem, so now it will just be a matter of hard work to decipher the CNTK documentation.

The morals are: Time series regression is difficult. Deep neural techniques are difficult. Understanding LSTMs is difficult. Using deep neural libraries like Keras and CNTK is difficult. But it’s all very, very interesting.

Posted in CNTK, Machine Learning | 1 Comment

Recap of the 2017 Game Arts Conference

I spoke at the 2017 Game Arts Conference (GAC), September 8 – 10, in Las Vegas. The GAC is aimed primarily at independent game developers. The GAC has held several events in Atlantic City, New Jersey, but this was their first event in Las Vegas.

There were actually two co-located conferences at the September Las Vegas event. In addition to the GAC, there was the Casino eSports Conference. I hadn’t even heard of eSports until a few months ago — eSports is where professional video game players compete in front of an audience. This was initially a strange concept to me. But it’s really not a lot different from an audience watching two chess grandmasters play against each other, which is an idea that has been around for at least 100 years. My colleagues at Xbox tell me that eSports could turn out to be a multi-billion dollar industry some day.


Crowd watches Bobby Fischer vs. Mikhail Tal, 1960

Both conferences were held at the Westgate Hotel, which is off the Las Vegas Strip, near the gigantic Las Vegas Convention Center. The Westgate Hotel is notable for having a huge and influential “sports book” that places odds on all kinds of sporting events.


Westgate Sports Book

My talk was about understanding deep neural networks, from the point of view of a game developer. The 2017 GAC Vegas event was very small, as is to be expected for a first-time conference. I’ve talked at conferences of all sizes, ranging from tens of thousands of attendees, down to just tens of attendees. I actually tend to prefer smaller events because they’re a bit more personal. At the GAC, I learned a lot about independent game development, by talking to the attendees and other speakers.

The use of machine learning and artificial intelligence will likely revolutionize many areas. Game development has always introduced state-of-the-art technology, and I will not be surprised if some forward looking ML/AI ideas first emerge from game development.

Posted in Conferences, Machine Learning | Leave a comment