I do an Interview about Machine Learning on Microsoft’s Channel 9

Channel 9 is a Microsoft community video Web site. There are all kinds of interesting videos on Channel 9, but most of the videos are aimed at software developers.

I was recently asked to do a short (6-minute) interview on Channel 9. The topic was machine learning and the upcoming DevIntersection conference where I’ll be speaking about the Microsoft CNTK code library. See https://channel9.msdn.com/Shows/The-DEVintersection-Countdown-Show/DEVintersection-Countdown-Show-on-the-Opportunities-in-Machine-Learning-with-James-McCaffrey

The interview host was Richard Campbell. I’ve known Richard for a long time because we’ve both spoken at Microsoft conferences for quite a few years. Richard is a very bright guy, and as much as anyone I know, he has a really good understanding of the big picture of software development and technology. And he’s very articulate too — a relatively rare characteristic for deep technical experts.

Anyway, we chatted and I explained the differences between data science, machine learning, deep learning, and artificial intelligence. The video interview recording session took place at the Microsoft Production Studios in Building 25. The studios there are quite impressive and very professional.

Anyway, if you go to the DevIntersection Conference, October 31 through November 2, 2017, please seek me out before or after my CNTK talk and say “hello”. See the conference site at: https://www.devintersection.com.

Posted in Conferences, Machine Learning | Leave a comment

More About Iterating Through a CNTK Minibatch Input Data Object

Recently, I wanted to iterate through a built-in CNTK library input data structure, a MinibatchData object, or just minibatch for short. With the help of a work colleague (Nikos), I finally figured out how to do it, but the technique is ugly. In a previous post I described how to walk through a text file in CNTK data format, which, although a bit tricky in the details, is simple in principle.

The alternative I was looking at is to iterate through a CNTK mini-batch object. Although the code (below) is short, it’s very tricky, and not at all obvious. My demo program creates a special CNTK reader. The reader has a next_minibatch() function which returns a minibatch which is actually a complex Python Dictionary collection.

To get at the data in the minibatch Dictionary, you have to get the keys, create a list from the keys (because weirdly, the keys aren’t enumerable), then get the data using the keys-list and the asarray() function. But unfortunately, the order of the keys in the Dictionary can vary from run to run, so the technique isn’t practical unless you sort the list holding the Dictionary, which is way more trouble than it’s worth. In short, to walk through CNTK input values, you’re best off using np.loadtxt() to iterate through the source data file rather than iterating through the minibatch collection that holds the data in memory.

I really like CNTK a lot. But this is a tiny bit crazy. It shouldn’t be that hard to walk through a critically important data structure. I bet that the CNTK team will be adding an easy-access function at some point in the near future. To be fair, CNTK was only released about 9 weeks ago, so a little roughness is expected. And in my opinion, CNTK is much, much easier to use than its direct competitor, the TensorFlow library.

# read_exp.py
# use Nikos solution to fetch contents of minibatch

import numpy as np
import cntk as C

def create_reader(path, is_training, input_dim, output_dim):
  return C.io.MinibatchSource(C.io.CTFDeserializer(path,
    features = C.io.StreamDef(field='predictors', shape=input_dim,
    labels = C.io.StreamDef(field='passengers', shape=output_dim,
  )), randomize = is_training,
    max_sweeps = C.io.INFINITELY_REPEAT if is_training else 1)

the_file = "tsr_sample_cntk.txt"

input_dim = 4
output_dim = 1
input_Var = C.ops.input(input_dim, np.float32)
label_Var = C.ops.input(output_dim, np.float32)

rdr = create_reader(the_file, False, input_dim, output_dim)

my_input_map = {
  input_Var : rdr.streams.features,
  label_Var : rdr.streams.labels

print("\nFeatures and Labels: ")
for i in range(0,6):  # each data item
  mb = rdr.next_minibatch(1, input_map = my_input_map)
  keys = list(mb.keys())
  print(mb[keys[0]].asarray()) # no order guarantee !!

print("\nEnd experiment \n")

Posted in CNTK, Machine Learning | Leave a comment

Why is Microsoft Involved in Gaming?

One of my older friends asked me why Microsoft is involved with Xbox gaming. I didn’t really know even though I had a few ideas (see below), so I figured I’d search the Internet to see what Microsoft’s officially stated opinions are.

Well, the results were unambiguous. Microsoft supports gaming. Big time. In May, during a presentation at the Build Conference, CEO Satya Nadella said, “Most of you (financial analysts) view gaming as ‘Microsoft has an Xbox business.’ I think you understand the console economics. But it’s a much broader thing for us.”

During the presentation, the Microsoft CFO described how Xbox is a multi-billion dollar business. That’s a pretty good reason to be in a segment. And the head of the Xbox division said that being at the forefront of gaming helped the company’s understanding of the next wave of computing solutions, namely mixed reality and holograms.

Also, gaming creates an enormous social network — millions and millions of users: “This attracts more users to Microsoft’s social network which in turn builds loyalty. This is where people find and meet their digital friends and keep connected with them. The result of this is it drives commerce which leads to a cycle that is demonstrated in the slide above.” See https://www.windowscentral.com/xbox-multi-billion-dollar-profitable-business-more-first-party-investment-way.

So, really the question isn’t, “Why is Microsoft involved in gaming?” It’s really more like, “Where is Microsoft headed with gaming?”

One recent development is that Microsoft acquired Beam (now called Mixer). This is a live-streaming service similar to Twitch, which Amazon purchased for $1.0 billion dollars. Billon. With a ‘B’.

Another possible area of growth is in eSports. I’d never even heard about eSports until recently. But competitive gaming is projected to be a huge form of entertainment, with revenue roughly $700 million in 2017. See https://venturebeat.com/2017/03/14/newzoo-the-esports-economy-will-grow-41-to-696-million-in-2017/ ).

So, I suspect that people who aren’t really into gaming — and that includes me — dramatically underestimate the business impact of gaming. It’s clear the Microsoft sees huge value and is committed to gaming products and services.

As a technical person, my thoughts about the importance of gaming are a bit more indirect. Much of the roots of machine learning goes back to the early days of computer chess. And video game technology has pushed the forefront of computer graphics. In short, gaming related development has all kinds of beneficial technical side-effects.

The moral of the story is that I tend to think about gaming as a bunch of teenagers sitting in a darkened basement, shooting at each other with joysticks for hours on end. There’s a lot more to gaming, and there’s a lot of money involved.

Posted in Miscellaneous | Leave a comment

Iterating Through a CNTK Data File

The CNTK library is a very powerful tool to do advanced machine learning. Today I ran into an unusual scenario. CNTK supports a data file format called CTF. For example:

|predictors 1.12 1.18 1.32 1.29 |passengers 1.21
|predictors 1.18 1.32 1.29 1.21 |passengers 1.35
|predictors 1.32 1.29 1.21 1.35 |passengers 1.48
|predictors 1.29 1.21 1.35 1.48 |passengers 1.48
|predictors 1.21 1.35 1.48 1.48 |passengers 1.36
|predictors 1.35 1.48 1.48 1.36 |passengers 1.19

This is a sample of a time series regression problem I was working on. There are four input (aka feature) predictors values followed by a single value to predict. The CTF format is very convenient when you want to train a neural network because there’s built-in support to read and access.

But after training I wanted to walk through the input data, one at a time. Surprisingly, there’s no easy way to do this. So one work-around solution is to use the numpy loadtxt() function.

# read_exp.py

import numpy as np
import cntk as C

the_file = "tsr_sample_cntk.txt"  # CNTK format

predictors = np.loadtxt(fname=the_file, dtype=np.float32,
 delimiter=" ", usecols=(1,2,3,4))
passengers = np.loadtxt(fname=the_file, dtype=np.float32,
 delimiter=" ", ndmin=2, usecols=[6]) # note!

input_dim = 4
hidden_dim = 12
output_dim = 1

input_Var = C.ops.input(input_dim, np.float32)
label_Var = C.ops.input(output_dim, np.float32)
# create and train the nnet object

print("\n---- Predictions: ")
for i in range(len(predictors)):
  ipt = predictors[i]
  print("Inputs: ", end='')
  print(ipt, end='')
  # pred_passengers = nnet.eval( {input_Var: ipt} )
  pred_passengers = 1.0 + 0.12* i  # dummy prediction
  print("   Predicted: %0.2f \
   Actual: %0.2f" % (pred_passengers, passengers[i]))

print("\nEnd experiment \n")

The code is a bit trickier than it appears. Notice that when reading the passengers field, I had to use ndmin (minimum dimension) to get a matrix as wanted by CNTK, and the usecols (which columns to use) parameter needs an enumerable list when you read only one column.

Once I have the input data, I can feed it to the trained neural network and call the eval() function to get the output. In my demo I simulate the prediction/output.

The moral of the story is that CNTK is a complex library and a strong knowledge of Python is at least useful and perhaps necessary in some scenarios.

Posted in CNTK, Machine Learning | Leave a comment

I Give a Talk about the CNTK Library

I gave a talk about the Microsoft CNTK v2.0 code library. CNTK (which formerly stood for “computational network tool kit”) is a sophisticated code library that software developers can use to create deep neural networks. You call the library functions using the Python language.

Among my developer colleagues there is great interest in gaining machine learning skills. So, I wasn’t surprised when a lot of people showed up in person for my talk, and over 600 people viewed the talk as it was streamed live.

My talk had two parts. In the first part I showed how to install CNTK. In the second part I showed how to use CNTK. I don’t feel guilty about spending time showing attendees exactly how to install CNTK, step by step. Too many times speakers think they have to show off by saying something like, “The installation process isn’t worth our time.”

Well, yes, the installation process is worth spending time on, because it can be quite tricky, and there are several useful things revealed when going over an installation carefully.

In the second part of my talk, I showed exactly how to set up data, create a single hidden layer neural network, train the network, evaluate the quality of the trained model, use the model to make a prediction, and described how you’d go about transferring a trained CNTK model to another system, such as a C# program.

As usual, the audience asked a lot of very interesting questions. No matter how well anyone thinks they know a topic, when you present that topic, the audience is sure to ask some great questions that help clarify the topic.

I really like CNTK. I think I know it pretty well, considering that v2 was just released a few weeks ago as I write this post. But CNTK is very complex so there’s a lot more for me to learn. You can check out my July 2017 article on CNTK in MSDN Magazine at: https://msdn.microsoft.com/en-us/magazine/mt784662.

Posted in CNTK, Machine Learning

Bayesian Search for Sunken Objects

Bayesian Search is a technique that has been used to search for sunken submarines (USS Scorpion), nuclear bombs (Palomares B-52 crash), and aircraft (Air France 447). The idea is best explained by a concrete example. A plane has crashed somewhere in the ocean. Suppose the search area is divided into four grids.

Using all available information, each grid is assigned a probability p, that the target is in the grid, and a probability q, that the target will be found if the target is in fact in the grid and the grid is searched. Therefore, f = pq = the probability that the target will be found.

Next, the grid with the highest f probability is searched. In the image, Grid 4 has the highest f value so it’s searched first. Suppose the target isn’t found. Note that the target could still be in Grid 4 because there’s a 1-q = 0.10 probability that the target was there but was missed during the Grid 4 search.

The p values for each of the four grids is updated. For the one searched grid, the new p’ value is (p)(1-q) / (1 – pq). For the unsearched grids, the new p’ value are (p) / (1 – pq).

After all the p values are updated to p’, they are normalized by dividing each by the sum of the p’ values. The q values don’t change, and new f values are computed as f = p’’ * q.

So, after searching Grid 4 without success, the original p values go from (0.4, 0.1, 0.2, 0.3) to (0.5734, 0.1246, 0.2548, 0.0471). The probability that the target is in Grid 4 has dropped to a very low value, while the p values of Grids 1 to 3 have increased a bit. The next search would be in Grid 1.

Posted in Machine Learning | 1 Comment

The Hessian and Machine Learning

Most of the machine learning optimization I work with involves minimizing error to find values for neural network weights and biases, but there are many kinds of ML optimization algorithms. A classical optimization technique that tends to confuse newcomers to ML involves the Hessian.

The Hessian is a matrix of all possible Calculus second derivatives for a function. The Hessian can be used in two ways. First, the so-called second derivative test to determine if a value is a function minimum or a maximum or undetermined. The second way to use the Hessian is directly, to iteratively get closer and closer to a minimum error.

Suppose you want to minimize some error function E which depends on a set of weights:

So you’d add epsilon to the a guesses and then repeat. In words, evaluate the Hessian (all second derivatives) at the guesses a, then invert the matrix, then multiply by the gradient of the error function at a. The -1 is so the update is an addition instead of a subtraction.

Notice that you need the inverse of the Hessian matrix. If you have 10 weights to solve for, the Hessian would be 10×10. So, this direct approach won’t work if n gets very, very large. Therefore there are several variations of this technique that estimate the Hessian in various ways.

All of this is fairly deep stuff, but if you work with machine leaning, it slowly but surely starts to make sense over time.

Posted in Machine Learning