Zoltar Prepares for the 2018 NFL Football Season

Zoltar is my machine learning system that predicts the outcomes of NFL football games. The first game of the season is Thursday, September 10, just over three weeks from now, so I’m starting to get Zoltar up to speed.

I’ve made different versions of Zoltar every year for several years. The problem of predicting NFL football scores lends itself to all kinds of interesting approaches. Every year I try a different twist or two, usually some kind of new optimization algorithm.

My preliminary work involved getting some basic infrastructure up and running. Zoltar’s preliminary predictions, which are certain to change before the first game is played, are:

1. Zoltar suggests a bet on the Steelers against the Browns. The early Vegas line has the Steelers as 6-point favorites over the Browns. Zoltar thinks the Steelers are 11 points better than the Browns. A bet on the Steelers will pay off only if the Steelers win by more than 6 points (in other words, 7 points or more). If the Steelers win by exactly 6 points, the bet is a push.

2. Zoltar likes the Vikings over the 49ers. Vegas has the Vikings as 5.5 points better than the 49ers but Zoltar thinks the Vikings are 10 points better.

3. Zoltar likes the Patriots over the Texans. Vegas has the Patriots as 6.5 points better than the Texans but Zoltar thinks the Patriot are 11 points better.

4. Zoltar likes the Panthers over the Cowboys. Vegas says the Panthers are just 2.5 points better than the Cowboys but Zoltar has the Panthers as 6 points better.

5. Zoltar likes the Cardinals over the Redskins. Vegas says the Cardinals are evenly matched against the Redskins (a pick ’em game) but Zoltar thinks the Cardinals are 5 points better.

In all these games, Zoltar likes the favored team, thinking the favorite will cover the spread as the saying goes — win by more than the point spread. I suspect this is a consequence that hope springs eternal, meaning that fans of a bad team are overly optimistic that personnel changes that occurred during the off-season will improve their team more than what will actually occur. So these optimistic fans bet on their teams, which skews the point spread. Later in the season, Zoltar shows a bias towards picking Vegas underdog teams, where a bet wins if the underdog wins outright or if the favored team fails to cover the spread.



My machine learning system is named after the arcade fortune telling machine, which in turn is named after the machine in the 1988 fantasy movie “Big”, starring Tom Hanks.

Advertisements
Posted in Machine Learning, Zoltar | Leave a comment

A Lightweight Custom Batcher for PyTorch

The PyTorch neural network library operates at a low level of abstraction and so you have to write a certain amount of auxiliary plumbing code. One example is that to train a PyTorch neural network, you must write your own code to serve up a batch of training items.

Note: PyTorch has a utility Dataset and a DataLoader class but these are fairly complex and intended mostly for situations where the training data is too large to fit into memory.

For situations where all data can be loaded into memory, I coded up a custom Batcher object that serves up indices, a batch at a time. The object can be called along the lines of:

bat_size = 16
max_batches = 500
batcher = Batcher(num_items=len(train_x), batch_size=bat_size,
  seed=1)

print("Starting training")
for batch in range(0, max_batches):
  rows = batcher.next_batch()
  X = T.Tensor(train_x[rows])
  Y = T.LongTensor(train_y[rows])
. . .

For example, if the training data comes from the Iris Dataset and has 120 items, each call to next_batch() returns 16 indices which are used to get 16 training items and associated target labels.

My custom Batcher class is pretty crude, in particular, it doesn’t do any error checking. But the class is simple and effective in situations where you can read all data into memory.

One alternative design is to return a batch index in addition to the indices:

class Batcher:
  def __init__(self, num_items, batch_size, seed=0):
    self.indices = np.arange(num_items)
    self.num_items = num_items
    self.batch_size = batch_size
    self.rnd = np.random.RandomState(seed)
    self.rnd.shuffle(self.indices)
    self.ptr = 0
    self.bi = 0  # batch index

  def next_batch(self):
    if self.ptr + self.batch_size > self.num_items:
      self.rnd.shuffle(self.indices)
      self.ptr = 0
      self.bi = 0
    result = (self.bi, 
      self.indices[self.ptr:self.ptr+self.batch_size])
    self.ptr += self.batch_size
    self.bi += 1
    return result  # int, np array of int32

Another possible design would be to make the Batcher object iterable by defining a __iter__(self) method and a next(self) method.



Batcher’s Department Store and Opera House, in Staples, Minnesota. Built in 1907. I love old buildings.

Posted in Miscellaneous, PyTorch | Leave a comment

Microsoft Conferences

Microsoft puts on directly, or sponsors, dozens of conferences every year. Here’s a brief rundown of six of the key conferences related to Microsoft. Note that there are many more conferences that I haven’t listed.

1. Microsoft Build – Open to the public. Intended for Web, application, and system developers. Typically about 15,000 attendees. Registration sells out in seconds. Current incarnation dates from 2011. Combined earlier PDC (Professional Developers Conference, 1992-2010) and MIX (Web Developers, 2006-2011).

2. Microsoft Ignite – Open to the public. Intended for IT engineers and programmers. Typically about 22,000 attendees. Current incarnation dates from 2015. Combined earlier MMS (Microsoft Management Summit, 2002-2011), and TechEd (1993-2014).

3. Visual Studio Live – Open to the public, run by 1105 Media, co-sponsored by Microsoft. Intended for .NET developers. Dates from 1993. Multiple 300-person events throughout the country.

4. DevIntersection – Open to the public, privately run, co-sponsored by Microsoft. Intended for developers who use the Microsoft technologies stack. Dates from 2014. Split from DevConnection (2001-current). Typically about 2,500 attendees.

5. PASS (Professional Association for SQL Server) – Open to the public, run by pass.org, co-sponsored by Microsoft. Intended for SQL developers. Typically about 3,000 attendees. Dates from 1999.

6. Microsoft Ready – Internal Microsoft and by-invitation-only. Intended for people in sales, support, and other customer-facing roles. About 28,000 attendees. Current incarnation dates from 2017. Combined earlier TechReady (2005-2017), S4 (Solution Specialist Sales Summit), MGX (Microsoft Global Exchange), and Inspire (formerly Microsoft Worldwide Partner Conference, 2002-2017).

Years ago (say, before 1995), physically attending conferences was a critically important way to get technical information. Realistically, the Internet can now deliver any content. But in my opinion physically attending a conference has at last three advantages over the Internet. First, at a conference you can get very valuable information that emerges from impromptu conversations with other attendees. Second, you can often infer subjective trends, not from what people say, but rather, how they say it. Third, attending a conference recharges your mental batteries and when you return to your workplace, you have renewed enthusiasm and energy, which translates to increased productivity and creativity.

For me personally, staying ahead of trends in machine learning is extremely important. One of the things I do at my company is review project proposals. Accepted projects then get a lot of support, in terms of time and money. Not too surprisingly, picking good project ideas is critical for success. There have been many times where some information I gained while at a conference helped me select a project proposal that wasn’t impressive at first thought, but one that eventually turned out to be very successful.



Microsoft Build Conference



DevIntersection Conference



Microsoft Ready Conference



Visual Studio Live Conference


Posted in Conferences | Leave a comment

Neural Network Input-Output Using PyTorch

I’ve been taking a deep dive into the PyTorch neural network code library. My latest investigation was to determine exactly how a simple PyTorch neural network does input-output. In particular, I wanted to understand how PyTorch handles hidden layer and output layer weights and biases.

Well, after a couple of hour of coding, I’m moderately satisfied I have a pretty good idea how PyTorch works. A complete explanation would take several pages — I know because I’ve described how CNTK and Keras do input-output and those explanations took about eight pages. There are a lot of details.


My final demo sets up a 3-4-2 NN with tanh activation on the hidden layer and no activation (or equivalently, the identify activation) on the output layer. The trickiest part was determining the syntax to set the 3*4 + 4*2 = 20 weight values and the 4 + 2 = 6 bias values. One thing that threw me off a bit was the discovery that PyTorch stores the input-hidden weights with hidden node indices first and input node indices second. For example, weight.data[1][2] is the weight from input node [2] to hidden node [1], not the other way around as I’d expected.

The moral of the story: learning how a complex neural network library like PyTorch deals with weights and biases is quite interesting, and that knowledge is important when creating complex network structures.



The phrase “weight bias” has a much different meaning in the fashion industry than in the field of machine learning.

Posted in Machine Learning, PyTorch | Leave a comment

Eigenvalues, Eigenvectors, and Machine Learning

My educational background is in applied mathematics, which means mostly probability and statistics, plus linear algebra, plus a few other branches of math. In my opinion, there’s surprisingly little overlap between traditional mathematics and machine learning.

But there are dozens of math topics that pop up, somewhat peripherally, in machine learning. One such topic is eigenvectors and eigenvalues. Here’s the quick summary:

* Eigenvalues and eigenvectors are a way to decompose a matrix.
* If A is an nxn matrix, then an eigenvector v and a scalar eigenvalue lambda are values so that A*v = lambda*v
* If A is size nxn then there will be n pairs of eigenvalue-eigenvector.
* The A is decomposed into a vector and a scalar, which is not directly useful, but is useful as part of certain algorithms.
* The original A matrix can be reconstructed as A = v * inv(v) * diag(lambda).

As usual, these explanations only make sense if you already understand the topic. Here’s an example:

If 2×2 matrix A =

-1   1
 2  -3

then one eigenvalue-eigenvector set is -0.2679 and (0.8069, 0.5907). This can be verified:

A * vector = (-0.2162, -01583)

and

lambda * vector = (-0.2162, -01583)

One common use of eigenvalues-eigenvectors in machine learning is that they’re used when computing PCA (principal component analysis), a technique for dimensionality reduction. Eigenvalues and eigenvectors are also used in image analysis, but that’s another topic.



An Internet image search for just about any term returns unexpected results. This colorful photograph of the skyline of Dubai was a result of searching for “eigen”.

Posted in Machine Learning | Leave a comment

Iris Dataset Neural Network Using PyTorch Version -1.0

PyTorch is one of the major open source neural network libraries. It’s very immature as I write this blog post, which means that working with PyTorch is slow and difficult. This is due mostly to incomplete, out-of-date, and sometimes just plain incorrect documentation.

After spending a few days exploring the PyTorch fundamental Tensor object, I felt ready to tackle the Hello World problem for neural networks, the Iris Dataset classification problem.

To cut to the chase, after quite a bit of time I was able to get a PyTorch system created. But PyTorch works at a very low level (compared to Keras and CNTK) and is very complex, so I’m certain that my initial effort has bugs. But at least I got the system working.

My demo sets up a 4-5-3 neural network, trains it, evaluates the model accuracy on some test data. The demo concludes by predicting the iris species for inputs = [6.1, 3.1, 5.1, 1.1] and gets a result of (0.0187, 0.7531, 0.2282) which maps to (0, 1, 0) which maps to versicolor.

Over the next few days, I’ll be dissecting my demo one statement at a time. I know from previous experience that fully understanding the simplest neural network in PyTorch is going to be absolutely essential before I can work with advanced networks like CNNs and LSTMs. And the process will take several weeks.



Two paintings by artist Anatoly Metlan

Posted in Machine Learning, PyTorch | Leave a comment

Self-Organizing Maps Using Python

When I encounter a technology or concept that’s new to me, the best and worst sides of my character emerge. The good side is that I’m very persistent and will investigate the new topic until I really understand it. But this is my bad side too because I typically get obsessed and just can’t leave the new topic alone, even when the topic isn’t super important. The topic of self-organizing maps for example.

There’s a close relationship between obsession and passion.

So, this morning I set out to do an end-to-end creation of a self-organizing map (SOM), from scratch, using Python.

Conceptually, SOMs aren’t difficult to grasp, but as always, when implementing, all kinds of details pop up. Well, after a bit of work, I’m satisfied I really, really understand SOMs.

I used the UCI Digits Dataset which is 1,797 8×8 crude handwritten digits, ‘0’ through ‘9’. After creating the SOM for the data, I generated a U-Matrix. There’s a ton of not-entirely-correct information about U-Matrices on the Internet. The idea is that black areas represent similar data items and white areas indicate borders. But interpreting a U-Matrix is very subjective.

Because my source data has labels, it was possible to generate a second visualization. This second graph shows relationships between different data items. For example, the 1s (in orange) are similar to the 4s (in dark green) because those two colors are close geometrically. This makes sense because 1s and 4s have a similar vertical stroke.

When I get some free time, I’ll clean up the Python code and publish it, either here on my blog site or in Visual Studio Magazine online.



Liat and Joe (with Bloody Mary in back) in the “Happy Talk” sequence from “South Pacific” (1958). One of the most passionate and beautiful scenes in movie history.

Posted in Machine Learning | Leave a comment