Creating PyTorch Tensors

There are many neural network libraries. Three of the most popular are TensorFlow, Keras, and PyTorch. PyTorch is rapidly growing in popularity among my colleagues.

The fundamental data structure in PyTorch is the tensor. A PyTorch tensor is a one-dimensional (i.e., a vector) or multidimensional (i.e., a matrix) that can be handled by a GPU.

Working with PyTorch tensors can be mildly frustrating for beginners. Based on my learning path, I think it’s important to have a solid understanding of tensor basics, starting with different techniques for tensor creation.

Here’s the introductory example I use when I teach a PyTorch workshop.

The statement:

x = pt.tensor([[0,0,0],[0,0,0]], dtype=pt.float32)

creates a 2×3 tensor of zeros where each cell is a 32-bit floating point value. Notice the lower-case ‘t’. I like to use “pt” as an alias but almost all my colleagues spell out “torch”.

The statement:

x = pt.zeros(2, 3, dtype=pt.float32)

is a shortcut to do the same thing using the special zeros() function. I’m not a fan of most programming language shortcuts like this.

Here’s a third way to do the same thing:

x = pt.FloatTensor([[0,0,0],[0,0,0]])

which is a different shortcut. Other shortcuts include functions like LongTensor().

And now a fourth way is:

x = pt.Tensor([[0,0,0],[0,0,0]])

which is based on the idea that 32-bit float is the default numeric type for neural networks.

A fifth way, which is a bit different, is to create a PyTorch tensor from a NumPy array like:

a = np.array([[0,0,0],[0,0,0]], dtype=np.float32)
x = pt.from_numpy(a)

The from_numpy() function is especially useful when reading data from a text file using np.loadtxt().

The moral of the story is that when learning PyTorch, you have to move slower than you’d like because mastering tensor basics is a bit trickier than you might expect.



I love tiki bars, especially those that have tiki torches.

Advertisements
Posted in PyTorch | Leave a comment

NFL 2018 Week 15 Predictions – Zoltar Agrees Closely with Vegas

Zoltar is my NFL prediction computer program. It uses a deep neural network and reinforcement learning. Here are Zoltar’s predictions for week #15 of the 2018 NFL season:

Zoltar:      chiefs  by    6  dog =    chargers    Vegas:      chiefs  by  3.5
Zoltar:     broncos  by    6  dog =      browns    Vegas:     broncos  by    3
Zoltar:      texans  by    3  dog =        jets    Vegas:      texans  by    6
Zoltar:     falcons  by    6  dog =   cardinals    Vegas:   cardinals  by    3
Zoltar:       lions  by    0  dog =       bills    Vegas:       bills  by  2.5
Zoltar:       bears  by    6  dog =     packers    Vegas:       bears  by  5.5
Zoltar:     bengals  by    6  dog =     raiders    Vegas:     bengals  by    3
Zoltar:     cowboys  by    1  dog =       colts    Vegas:       colts  by    3
Zoltar:     jaguars  by    4  dog =    redskins    Vegas:     jaguars  by    7
Zoltar:     vikings  by    6  dog =    dolphins    Vegas:     vikings  by    7
Zoltar:      titans  by    2  dog =      giants    Vegas:      giants  by  2.5
Zoltar:      ravens  by    7  dog =  buccaneers    Vegas:      ravens  by    8
Zoltar:    seahawks  by    5  dog = fortyniners    Vegas:    seahawks  by  6.5
Zoltar:    patriots  by    0  dog =    steelers    Vegas:    patriots  by    3
Zoltar:        rams  by    7  dog =      eagles    Vegas:        rams  by  9.5
Zoltar:      saints  by    4  dog =    panthers    Vegas:      saints  by  6.5

Note: There’s some weirdness with the early Vegas point spreads for Arizona at Atlanta (no line), Dallas at Indianapolis (no line), and New England at Pittsburgh (no line). I’ll update this post when I figure out what’s going on.

Zoltar theoretically suggests betting when the Vegas line is more than 3.0 points different from Zoltar’s prediction. For week #15, before the point spread updates, Zoltar has just one hypothetical suggestion.

1. Zoltar likes the Vegas underdog Titans against the Giants. Zoltar thinks the Titans are 2 points better than the Giants but Vegas has the Giants as 2.5 point favorites. So, a bet on the Titans will pay off if the Titans win (by any score) or if the Giants win but by less than 2.5 points (in other words, 2 points or 1 point).

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better.

Just for fun, I track how well Zoltar does when just trying to predict just which team will win a game (not by how many points). This isn’t useful except for parlay betting.

Zoltar sometimes predicts a 0-point margin of victory. There are two such game in week #15: Lions vs. Bills and Patriots vs. Steelers. In the first four weeks of the season, Zoltar picks the home team to win. After week #4, Zoltar uses historical data for the current season (which usually, but not always, ends up in a prediction that the home team will win).

==

Zoltar did rather poorly in week #14. Against the Vegas point spread, which is what Zoltar is designed to do, Zoltar went 1-1 . . . sort of. I botched one prediction, Redskins vs. Giants when I didn’t notice a key injury at the Redskins quarterback position, because I was travelling to a conference. So I give myself a Mulligan on that one game, otherwise Zoltar was 1-2 against the spread.

For the season so far, against the Vegas spread, Zoltar is 42-25 which is about 62% accuracy.

Just predicting winners, Zoltar was a poor 8-8. Vegas was also 8-8. I believe this was the first week this season where Zoltar and Vegas completely agreed on just who’d win (even though both have had the same record in some weeks). For the season, just predicting which team will win, Zoltar is 141-65 (about 68% accuracy) and Vegas is 138-66 (also about 68% accuracy).



My system is name after the Zoltar fortune teller machine you can find in arcades. There are many variations, but I like Zoltar the best.

Posted in Machine Learning, Zoltar | Leave a comment

Generating Non-Transitive Dice or Spinners

Suppose you have three spinners, A, B, and C. Each spinner has four equally divided sections each with a number. Spinner A has (2, 4, 4, 9). Spinner B has (1, 3, 8, 8). Spinner C has (0, 5, 6, 7).

Two players each pick a spinner and spin it. The spinner that lands on the higher number wins.

Suppose the two spinners are A and B. There are 16 equally likely possible outcomes. Spinner A wins on 9 of the 16 outcomes: 2-1, 4-1, 4-3, 4-1, 4-3, 9-1, 9-3, 9-8, 9-8. Therefore spinner A is better than spinner B.

Suppose the two spinners are B and C. Spinner B wins on 10 of the 16 possibilities: 1-0, 3-0, 8-0, 8-5, 8-6, 8-7, 8-0, 8-5, 8-6, 8-7. Therefore spinner B is better than spinner C.

Now because A is better than B and B is better than C, A must be much, much better than C right? Wrong! Spinner C wins against spinner A on 9 of 16 possibilities: 5-2, 5-4, 5-4, 6-2, 6-4, 6-4, 7-2, 7-4, 7-4). Amazing!

This is an example of non-transitive spinners. If the numbers were on four-sided dice instead of four-quadrant spinners you’d have the same situation.

How did I come up with this example? I wrote a short brute force program that randomly generated spinner data and checked if A > B and B > C and C > A. There were a couple of details but my little program quickly found an example that I edited slightly to make it a bit prettier.




There’s something very joyous about a girl spinning to celebrate life. But a guy spinning in a kilt — no, no, no.

Posted in Miscellaneous | Leave a comment

The Five Neural Network Weight Initialization Algorithms

For beginners to neural networks, one of the many topics that can be confusing is weight initialization. You can think of a neural network as a complex math function that has many constants called weights (and some special weights called biases). Training the network is the process of finding values for the weights.

There are five main algorithms for setting the initial values of the weights. In the early days, for neural networks with a single hidden layer, the two most widely used algorithms were uniform and normal. These two algorithms didn’t work well with deep neural networks and so Glorot uniform and Glorot normal were devised. Neither of these worked well with very deep neural networks that use ReLU activation and so He initialization was devised.

In code, for an input-to-hidden layer where ni is the number of input nodes and nh is the number of hidden nodes, uniform and normal initialization look like:

lo = -0.01; hi = +0.01
for i in range(self.ni):
  for j in range(self.nh):
    self.ih_weights[i,j] =
      np.float32(self.rnd.uniform(lo, hi))
mu = 0.00; sd = 0.10
for i in range(self.ni):
  for j in range(self.nh):
    self.ih_weights[i,j] =
      np.float32(self.rnd.normal(mu, sd))

The main problem with uniform and normal initialization is that you have to pick values for lo and hi (uniform) or mean and stddev (normal).

Code for Glorot uniform and Glorot normal could look like:

fin = self.ni; fout = self.nh
sd = math.sqrt(6.0 / (fin + fout))
for i in range(self.ni):
  for j in range(self.nh):
    self.ih_weights[i,j] =
      np.float32(self.rnd.uniform(-sd, sd))
fin = self.ni; fout = self.nh
sd = math.sqrt(2.0 / (fin + fout))
for i in range(self.ni):
  for j in range(self.nh):
    self.ih_weights[i,j] =
      np.float32(self.rnd.normal(0.0, sd))

Here fin stands for “fan in” and fout stands for “fan out”. Glorot initializations are also called Xavier initializations because, even though the ideas were well known, they were popularized in a research paper written by a researcher named Xavier Glorot.

Code for He initialization (after a paper by He, Zhang, Ren, Sun and so sometimes called He et al. initialization) could look like:

fin = self.ni
sd = math.sqrt(2.0 / fin)
for i in range(self.ni):
  for j in range(self.nh):
    self.ih_weights[i,j] =
      np.float32(self.rnd.normal(0.0, sd))

He initialization was designed strictly for use with layers that have ReLU initialization, but the algorithm can be used on any layer.

The moral of the story is that there are roughly 100 topics involved with a more-or-less complete understanding of neural networks. Weight initialization is one of these fundamental topics. When I was learning about neural networks, it seemed like there was always one more topic, but eventually I was able to connect all the dots.



Current first lady Melania Trump wearing dots. Duchess of Cambridge Kate Middleton wearing dots. Both women are examples of elegance and class. And then former first lady Michelle Obama in dots. The term polka dots probably originated in the late 1930s during the “polka craze” in the U.S.

Posted in Machine Learning | Leave a comment

Autoencoders for Visualization Using CNTK

I wrote an article titled “Autoencoders for Visualization Using CNTK” in the December 2018 issue of Microsoft MSDN Magazine. See https://msdn.microsoft.com/en-us/magazine/mt832864.

An autoencoder is a special type of neural network and is probably best explained by an example. Some training data for a regular neural network might look like:

 
5.1, 3.5, 1.4, 0.2, setosa
7.0, 3.2, 4.7, 1.4, versicolor
6.3, 3.3, 6.0, 2.5, virginica
. . .

This is the famous iris data where the first four values on each line are predictor values and the last value is the species. You could set up a 4-7-3 neural network. After training you could use the trained neural network model to predict the species of a new, previously unseen flower by feeding predictor values such as:

unknown = np.array([[6.1, 3.1, 5.1, 1.1]],
  dtype=np.float32)
predicted = model.predict(unknown)

The training data for an autoencoder might look like:

 
5.1, 3.5, 1.4, 0.2, 5.1, 3.5, 1.4, 0.2
7.0, 3.2, 4.7, 1.4, 7.0, 3.2, 4.7, 1.4
6.3, 3.3, 6.0, 2.5, 6.3, 3.3, 6.0, 2.5
. . .

The first four values are predictors but now the goal is to predict the four inputs. If you set up a 4-2-4 neural network and trained it to predict its own inputs, you indirectly create a compressed version of the data with the values in the two hidden nodes. Put another way, the dimensionality of the data has been reduced from 4 to 2.


Data with 16 dimensions has been reduced to two dimensions so it can be graphed as x-y data.


Diagram shows a 6-3-2-3-6 autoencoder architecture to reduce data with 6 dimensions (the inputs) down to 2 dimensions (the two middle nodes.

In my article I demonstrate how to use an autoencoder to reduce/compress a data set that has 16 input values down to just 2 values. These two values can be used to create a graph of the data.

Using an autoencoder to reduce the dimensionality of data so that the data can be graphed in two dimensions is a common technique in machine learning. But the mechanism used for autoencoders also is used inside complex neural networks too.



Three reduction woodblock prints by artist Chen Yongle.

Posted in CNTK, Machine Learning | Leave a comment

NFL 2018 Week 14 Predictions – Zoltar Likes Three Bad Teams

Zoltar is my NFL prediction computer program. It uses a deep neural network and reinforcement learning. Here are Zoltar’s predictions for week #14 of the 2018 NFL season:

Zoltar:      titans  by    6  dog =     jaguars    Vegas:      titans  by  4.5
Zoltar:       bills  by    6  dog =        jets    Vegas:       bills  by  3.5
Zoltar:        rams  by    4  dog =       bears    Vegas:        rams  by    4
Zoltar:    panthers  by    4  dog =      browns    Vegas:    panthers  by    1
Zoltar:     falcons  by    0  dog =     packers    Vegas:     packers  by    6
Zoltar:      texans  by    6  dog =       colts    Vegas:      texans  by  4.5
Zoltar:      chiefs  by    6  dog =      ravens    Vegas:      chiefs  by  7.5
Zoltar:    patriots  by    5  dog =    dolphins    Vegas:    patriots  by    8
Zoltar:      saints  by    7  dog =  buccaneers    Vegas:      saints  by    8
Zoltar:    redskins  by    6  dog =      giants    Vegas:    redskins  by  1.5
Zoltar:    chargers  by   10  dog =     bengals    Vegas:    chargers  by   15
Zoltar:     broncos  by    3  dog = fortyniners    Vegas:     broncos  by    6
Zoltar:       lions  by    0  dog =   cardinals    Vegas:       lions  by    2
Zoltar:     cowboys  by    2  dog =      eagles    Vegas:     cowboys  by    4
Zoltar:    steelers  by    9  dog =     raiders    Vegas:    steelers  by 11.5
Zoltar:     vikings  by    0  dog =    seahawks    Vegas:    seahawks  by    3

Zoltar theoretically suggests betting when the Vegas line is more than 3.0 points different from Zoltar’s prediction. For week #14 Zoltar has three hypothetical suggestions.

1. Zoltar likes the Vegas underdog Falcons against the Packers. Zoltar thinks the two teams are exactly evenly matched, but Vegas has the Packers favored by 6.0 points. So, a bet on the Falcons will pay off if the Falcons win (by any score) or if the Packers win but by less than 6.0 points (in other words, 5 points or less).

2. Zoltar likes the Vegas favorite Redskins against the Giants. Zoltar thinks the Redskins are 6 points better than the Giants but Vegas has the Redskins favored only by 1.5 points. Therefore, Zoltar thinks the Redskins will win by 2 or more points and “cover the spread” as the phrase goes. Update: The Redskins quarterback is out and the Vegas point spread has moved to Giants favored by 3.0 points — Zoltar has no recommendation on his game.

3. Zoltar likes the Vegas underdog Bengals against the Chargers, thinking that the Chargers will win but will not cover the big 15.0 Vegas points.

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better.

Just for fun, I track how well Zoltar does when just trying to predict just which team will win a game (not by how many points). This isn’t useful except for parlay betting.

Zoltar sometimes predicts a 0-point margin of victory. There are three such game in week #14: Falcons vs. Packers, Lions vs. Cardinals, and Vikings vs. Seahawks.

==

Zoltar was pretty good in week #13. Against the Vegas point spread, which is what Zoltar is designed to do, Zoltar went 4-2 (which would win money as I explained above). For the season so far, against the Vegas spread Zoltar is 41-24 which is about 63% accuracy.

Just predicting winners, Zoltar was a mediocre 10-6. Vegas was also 10-6. For the season, just predicting which team will win, Zoltar is 133-57 (70% accuracy) and Vegas is 130-58 (about 69% accuracy).



My system is named after the Zoltar fortune teller machine you can find in arcades (left). Coin-operated fortune telling machines have been around for a very long time.

Posted in Machine Learning, Zoltar | Leave a comment

A Quick Look at Microsoft Azure Batch AI

Microsoft Batch AI is a set of command line tools that you can use to run machine learning programs in the (Azure) Cloud. The typical machine learning pipeline involves getting the training data prepared, writing a Python program with Keras or PyTorch, running the program to create and train an ML model, evaluate the model, use the model. All of these steps can be done on a local machine, but for difficult problems with a huge set of training data, where training could take weeks or months of run time, you might need an extremely powerful multiple-GPU machine.

So, the idea is to get everything more or less working on your local machine — because you can be very efficient on a machine sitting at your feet. Then, after your small scale program is acceptable you provision some Cloud resources, copy your full training data and Python program up to the Cloud, and run your training program (or multiple versions of the program with different hyperparameter settings) in the Cloud, which will be very fast.

I walked through a tutorial at:

https://docs.microsoft.com/en-gb/azure/batch-ai/quickstart-tensorflow-training-cli

The tutorial show how to prepare Azure AI Batch, copy files up into Azure, run the job, and fetch the results.

~$ az group create --name myResourceGroup --location eastus2
~$ az batchai workspace create . . . # create workspace
~$ az batchai cluster create . . .   # create cluster
~$ az storage account create . . .   # create storage account
~$ az storage share create . . .     # create file share
~$ az storage directory create . . . # directory for scripts
~$ az storage directory create . . . # directory for logs
~$ az storage file upload . . .      # upload neural program
~$ az batchai experiment create . .  # create experiment
~$ (create job.json config file)
~$ az batchai job create . . .       # create a job
~$ az batchai job file stream . . .  # monitor job progress
~$ az storage file list . . .        # show output files
~$ az storage file download . . .    # fetch an output file
~$ az batchai cluster resize . . .   # clean/save cluster
~$ az batchai cluster delete . . .   # delete cluster
~$ az group delete . . .             # delete group

You launch an Azure Cloud Shell, which can be based on either Windows PowerShell or Unix Bash shell and then issue commands, like the ones above. You use some commands that start with “az” which are used to configure Azure resources, and some commands that start with “az batchai” which are used to create and run the batch jobs.

My initial impression is that the process has a lot of steps. The tutorial has approximately 17 steps and there’s quite a bit that can go wrong with many of these steps, and the overall process is quite slow. Also, Batch AI performs everything on Ubuntu Linux. For developers who are used to Windows, this create a lot of friction for even common simple tasks such as file editing (using vi or nano instead of Notepad), navigating through the file system, and so on.

An alternative to Azure Batch AI is to use a Virtual Machine in the Cloud. I’m a big fan of VMs and prefer to use them when possible, but I can imagine scenarios where the Batch AI approach would be very useful.

Google has a similar “Google Cloud Shell” too, but I haven’t used either Google Cloud Shell or Azure Cloud Shell enough to make an informed opinion of how they compare. My hunch is that both cloud shells are probably quite similar — in the end batch AI just means copying files to the Cloud and running a Python script, so it’s not rocket science.



Artist Chris Foss combines rocket science and rocket art.

Posted in Machine Learning | Leave a comment