Microsoft ML.NET

Microsoft recently (June 2018) announced the release of ML.NET — let me try to explain what it is and a bit about its history. Briefly, ML.NET is a code library for machine learning. In its most basic form, ML.NET is just a set of .NET DLLs that can be called in many different ways — on a command line in a shell, from a GUI wrapper program, through a C# API in a C# program, and on and on.

The screenshot below shows an example of the original way ML.NET was used. You enter the command maml.exe followed by a huge list of command line arguments, and you get output. In this example, ML.NET uses a “averaged perceptron” (a very primitive linear classification algorithm that is rarely used anymore) to perform binary classification on the UCI breast cancer dataset.

ML.NET has actually been in existence for many years. It started out as an internal Microsoft project called TMSN (“text mining search and navigation”) when .NET was first released around 2003 or so. The idea was to create .NET DLLs for machine learning that Microsoft engineers could use inside product code.

Over the years TMSN was renamed to TLC (“the learning code”) and then to MAML (“Microsoft Azure machine learning”, where is was used to power the Azure Machine Learning Studio system) and then to ML.NET for use outside of Microsoft. Three’s a lot more to the story, but the point is that ML.NET isn’t new.

The motivation for ML.NET is to give software developers who use the traditional Microsoft technology stack a way to do machine learning that aligns with how they work — .NET in Visual Studio. The vast majority of machine learning is performed using the Python language but the problem is, when a prediction model has been created using Python, it’s not so easy to transfer the model into a .NET system.

The Microsoft ML.NET Web site is at

“The Propitious Garden of Plane Image”, Brice Marden

Posted in Machine Learning | Leave a comment

The 2018 Big Data Innovation Summit

I will be attending and speaking at the Big Data Innovation Summit, July 17-18, 2018, in Las Vegas. This should be a very interesting event, and it’s an event you might want to consider attending. Let me explain.

A UK-based media company called Innovation Enterprise (IE) was founded in 2009. They put on a series of conferences (but they call them summits) on all kinds of technology topics, and in many different cities, for a wide range of audiences. For example, “Chief Innovation Officer Summit” in Singapore and “Data Visualization Summit” in Boston. Over the years I spoke at a handful of these events and always got good value. See

One way in which the IE events were different from most others I speak at, is that the IE events were very broad. The approach seemed sort of, “Throw a bunch of smart people and interesting speakers together, and good things will happen.” And good things did happen — good connections, and so on.

But I think the IE company was recently purchased by a New York based company called Argyle. See I haven’t worked with Argyle before but they seem to want to focus the summits a bit more. But I won’t be entirely sure until I go to the Big Data Innovation Summit and get a feel for the event.

A couple of photos from the 2017 Big Data Innovation Summit in Las Vegas.

My bottom line is that I’ve always enjoyed the IE summits and I’m curious about the possibly new approach. Conferences have changed a lot of the past 10 years. Realistically, you can get almost any factual content on the Web/Internet. But there’s no substitute for face-to-face interaction. That’s where ideas emerge and business gets done.

Check out the Big Data event Web site at If you decide it’s something that you want to explore, the event organizers told me you can get a $200 discount on the two-day pass by using the code “James200”. And if you go, be sure to look me up!

Posted in Conferences | Leave a comment

Why I Prefer Keras and CNTK to PyTorch

There are many neural network code libraries. The two I like best are Microsoft CNTK and Google Keras (over TensorFlow). I am not a fan of PyTorch.

There’s nothing technically wrong with PyTorch and many of my colleagues use it as their neural network library of choice. But I find that PyTorch just doesn’t feel right for me. In the years before NN libraries, I coded neural networks from scratch many times, so I have a very good understanding of what goes on behind the scenes. But when I use PyTorch, the API doesn’t match my cognitive understanding of NNs. There’s a weird dissonance that, well, just doesn’t feel right to me.

When I use CNTK, I have a good idea of how the CNTK code maps to fundamental NN code operations. The same is true when I use Keras or even raw TensorFlow.

Now, none of this would matter except for one additional fact: the documentation for PyTorch is absolutely horrendous. Trying to find information about PyTorch is often an exercise in futility. If the PyTorch documentation was good, I’d be able to construct my mental mapping.

It will be interesting to see what happens over the next two years. Will just one or two NN libraries emerge as de facto standards? Or will there continue to be several libraries, all having significant usage in the ML developer community? I’m usually not shy about making guesses, but this is one question where I have no idea what will happen.


import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

input_dim = 4; hidden_dim = 5; output_dim = 3
lr = 0.01
max_epochs = 500

class Net(nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.fc1 = nn.Linear(input_dim, hidden_dim)
    self.fc2 = nn.Linear(hidden_dim, output_dim)

  def forward(self, x):
    x = nn.functional.relu(self.fc1(x))
    x = self.fc2(x)
    return F.log_softmax(x, dim=1)
model = Net()

train_file = ".\\Data\\iris_train_data.txt"
test_file = ".\\Data\\iris_test_data.txt"
train_x = np.loadtxt(train_file, usecols=[0,1,2,3],
  delimiter=",", dtype=np.float32)
train_y = np.loadtxt(train_file, usecols=[4,5,6],
  delimiter=",", dtype=np.float32)

my_loss = nn.CrossEntropyLoss()
opt = torch.optim.SGD(model.parameters(), lr=lr)

# train
for batch_idx in range(max_epochs):
  # need to extract batches from train_x and train_y) here
  X = Variable(torch.Tensor(train_x).float())
  Y = Variable(torch.Tensor(train_y).long())

  outpt = model(X)
  loss = my_loss(outpt, Y)  # errors out here . . .

  if (batch_idx) % 100 == 0:
    print ('batch_idx [%d/%d] Loss: %.4f'%(batch_idx+1, \

# evaluate
X = Variable(torch.Tensor(test_x).float())
Y = torch.Tensor(test_y).long()
outpt = model(X)
_, predicted = torch.max(, 1)
print('Accuracy of the network %d %%' % \
(100 * torch.sum(Y==predicted) / 30))

“Preference Game”, David Gray.

Posted in Keras, Machine Learning | 2 Comments

My Top Ten Favorite Fantasy Movie Villains

One of my three favorite movie genres is fantasy films. In many cases, the presence of a good villain is a key factor in my appreciation of a film. Here are my top 10 favorite villains in fantasy films. Note: I don’t include witches (there are so many I’ll have to do a separate post), and I don’t include monsters (my villains have to be vaguely human), and I don’t include villains from science fiction films.

1. Voldemort (Harry Potter series, 2001-2010) – Of course “He whose name must not be spoken” is number one on my list. Without Voldemort, the Harry Potter series really wouldn’t have a plot. Great portrayal by actor Ralph Fiennes. The no-nose look is kind of creepy.

2. The Witch-King of Angmar (Lord of the Rings series, 2001-2003) – This was the guy who rode on top of the dragon thing. Definitely one scary dude. He was also called Lord of the Nazgul. He was the chief servant of Sauron (the Big Eye).

3. Jareth the Goblin King (Labyrinth, 1986) – Played by actor David Bowie, Jareth wasn’t terribly menacing even though he kidnapped Sarah’s (actress Jennifer Connelly) infant brother. But Jareth was the most interesting character. This movie is my favorite Jim Henson / Muppet movie.

4. Lord of Darkness (Legend, 1985) – A very evil guy although his evil was tempered somewhat by his love/desire for Princess Lili (Mia Sara). Things don’t look very good for Jack (Tom Cruise) but luckily he has a unicorn’s horn and there’s a happy ending (even for the unicorn).

5. Loki (Thor, 2011) – I’m not a big fan of super hero movies but Thor is more fantasy than super hero in my opinion. I found Thor himself (played by Chris Hemsworth) mildly annoying, but I thought the performance of Loki (Tom Hiddleston) was interesting and very nicely done.

6. Governor Odius (The Fall, 2006) – I really like this movie even though Governor Odius (actor Daniel Caltagirone) has a relatively small part in the plot. Most of my friends don’t like this movie too much, but it’s one of my favorite fantasy films and Odius’ attempts to cause problems for Roy and Evelyn are just evil enough to drive the movie sub-plot.

7. Lo Pan (Big Trouble in Little China, 1986) – Lo Pan is the evil, ancient sorcerer who needs a Chinese woman with green eyes to release him from a curse. Jack Burton (Kurt Russell) is overmatched but gets some lucky breaks and defeats Lo Pan in the end.

8. The Kurgan (Highlander, 1986) – Soooo, Conor MacLeod (played by Christopher Lambert) is an immortal where, uh, well, it doesn’t matter. Lots of heads getting chopped off because, uh, well, never mind. The Kurgan (Clancy Brown) is a bad guy. Very bad.

9. Mola Ram (Indiana Jones and the Temple of Doom, 1984) – Mola Ram is part of the ancient Thuggee cult. I vividly remember the scene where he rips someone’s heart out of their chest and causes it to burst into flames. Ouch.

10. Evil (Time Bandits, 1981) – Yes, his name is just Evil. Pretty much all you need to know. This is a crazy Brit movie from director Terry Gilliam. An 11-year old boy has all kinds of adventures in space and time with a bunch of dwarves who’ve stolen a map from The Supreme Being.

Dishonorable Mention

Kaecilius (actor Mads Mikkelsen, “Dr. Strange”, 2016) – The primary bad guy, but he didn’t have much personality.

Count Olaf (actor Jim Carrey, “A Series of Unfortunate Events”, 2004) – I’m not a Jim Carrey fan, but Carrey is excellent here and I think this is his maybe his best performance.

Percival Graves (actor Colin Farrell) and Gellert Grindelwald (actor Johnny Depp, “Fantastic Beasts and Where to Find Them”, 2016) – At some point before the events of the movie, the evil Grindelwald killed or kidnapped (never explained) Graves and assumes his identity.

Captain Hook (actor Dustin Hofman, “Hook”, 1991) – Excellent performance by Hoffman and on a different day he’d be in my top 10.

Captain Vidal (actor Sergi Lopez, “Pan’s Labyrinth”, 2006) – This was a very dark film in part because Vidal was truly an evil psychopath.

Imhotep (actor Arnold Vosloo, “The Mummy”, 1999) – I felt sorry for Imhotep at first but after he cruelly killed some of the American explorers, I had no sympathy for him. Excellent film.

Pennywise the Clown (actor Bill Skarsgard, “It”, 2017) – I really, really don’t like clowns. Of any kind.

Zaren and Sonoy (John Dall and Frank DeKova, “Atlantis the Lost Continent”, 1961) – Zaren is an evil advisor to the emperor of Atlantis and Sonoy is an evil astrologer/sorcerer. One of my favorite movies of the 1960s.

Posted in Top Ten | 1 Comment

Autoencoder for Visualization

When you want to graph data, you normally want to graph it in a two-dimensional plane. But if your data has more than two dimensions you’ve got a problem. For example, suppose you have data for people: (age, height, weight, annual income, number children). Your data has five dimensions.

One way to visualize high dimensional data is to map the data down to two dimensions. Of course you lose some information but that’s unavoidable. The three most common techniques to shrink high dimensional data down to two dimensions (or possibly three) for visualization are principal component analysis (PCA), t-SNE (t-distributed stochastic neighbor embedding), and neural autoencoding.

Just for fun, I decided to implement an autoencoder. I used the UCI 8×8 digits dataset which is 1797 images of digits (‘0’ through ‘9’) each of which is 8×8 pixels and each pixel is a grayscale value between 0 and 16. In other words, each data item has 64 values and I want to map down to two dimensions.

An autoencoder pattern isn’t a specific algorithm, it’s a general idea so there are many possible implementations. My architecture was 64-32-2-32-64 and I used sigmoid activation on all layers.

The idea is surprisingly difficult to explain but briefly, the network accepts 64 input values, shrinks them down to 32 values and then down to 2 values. The second half of the network expands the two components back out to the original 64 values. The net result is that each of the 1797 digits is mapped to two values between 0 and 1 and so they can be graphed.

The result of a visualization is always quite subjective. In this example you can see that the 0s (black dots in the lower left) and the 6s (yellow-green dots in the lower right) aren’t likely to be confused with other digits. But the 8s (the red dots) can be confused with several other digits.

Fashion, beauty, and style are subjective for sure. This actress was in the 1987 science fiction movie “Gor” which is incredibly bad but hard to resist.

Posted in Machine Learning | 2 Comments

Word Similarity using GloVe

The GloVe (“global vectors for word representation”) data maps an English word, such as “love”, to a vector of values (for example 100 values). See

There are different versions of GloVe. One of the simplest used Wikipedia as it source (six billion non-unique words) and then extracted 400,000 distinct words, and then used a neural network to generate a vector of 100 values for each word.

The vectors are generated in a very clever way so that two semantically similar words have mathematically similar vectors. So, if you want to find words that are semantically close to the word “chess”, you’d get the GloVe vector for “chess”, then scan through the other 399,999 GloVe vectors, finding the vectors that are close (using Euclidean distance). Then you’d map the close vectors back into words.

GloVe is useful when the particular data you are using is general in nature. But if you have highly specialized text, such as legal text, or medical text, then you’re usually better off by creating your own custom word embedding vectors using the gensim tool.

Neat. Neural methods have really revolutionized natural language processing.

Image query: “painting of a woman with gloves” (left) and “a woman painting with gloves” (right). Natural language processing is tricky.

Posted in Machine Learning | Leave a comment

Recap of the 2018 Deep Learning World and Predictive Analytics World Conferences

I spoke at and attended the co-located Deep Learning World and Predictive Analytics World conferences. Let me cut to the chase and say the event was really, really good — I give the event an overall grade of an A- which is the best grade I’ve given to any event in the past eight years.

The event ran from June 3-7, 2018 and was at Caesars Palace in Las Vegas. I estimate there were about 2,000 attendees, speakers, and exhibitors there, but I could be way off. Attendees came from all types of companies and had a wide range of job titles and backgrounds.

Along with my colleague Ricky L., I did an all-day hands-on workshop. It went very well. We covered a huge amount of information. The key point here is that the workshop was hands-on. Attendees installed TensorFlow and Keras and used them to explore deep classification and regression (me) and reinforcement learning bandit and Q-learning problems (Ricky).

I also gave a talk “Time Series Regression using an LSTM Network” which was surprisingly well-attended (for such a specific and technical topic) and seemed to be well-received by attendees.

The Predictive Analytics World (PAW) event has been around for several years. It’s run by Eric Siegel. PAW had several tracks covering topics such as business, financial, health care, and so on. I sat in on a few of the talks and they were all interesting. See

This was the first year for the Deep Learning World (DLW) event. It was run by Luba Gloukhova. DLW was incredibly successful for a first-time event. For example, when Ricky and I were just getting started with our workshop, there were the inevitable logistics problems, but they were all fixed quickly by Luba and the event staff. This is a sign of good organization.

I don’t know much about how conferences are run, but PAW/DLW was run in part by the Rising Media company. See RM is run by Matthew Finlay. I enjoyed meeting all the event organizers, Eric, Luba, and Matthew.

Usually, when I’m speaking at a conference, by the end of the third day, I’m more than ready to go home. But at PAW/DLW I really wanted to stay an extra day to listen to the talks, visit more of the companies at the event Expo, and chat with other attendees.

In short, I give Predictive Analytics World and Deep Learning World a solid thumbs-up. If you work with predictive analytics or have an interest in it, give strong consideration to attending next year — I know I’ll be there if I can.

Posted in Conferences | Leave a comment