PyTorch Multi-Class Accuracy By Class Using a Set-Wise Approach

I recently revisited multi-class classification using PyTorch. My demo was to predict a person’s political type (conservative, moderate, liberal) based on sex, age, state (michigan, nebraska, oklahoma), and annual income. See jamesmccaffrey.wordpress.com/2022/09/01/multi-class-classification-using-pytorch-1-12-1-on-windows-10-11/.

My demo computed overall model accuracy. I decided to implement a function to compute accuracy by class. I usually compute accuracy by class using a simple item-by-item iteration. For example, see https://jamesmccaffrey.wordpress.com/2022/07/12/pytorch-multi-class-accuracy-by-class/. I decided to implement an accuracy by class function using a set approach that processes all items at once rather than iterating.

Here’s the result:

def do_acc(model, dataset, n_classes):
  X = dataset[0:len(dataset)][0]  # all X values
  Y = dataset[0:len(dataset)][1]  # all Y values
  with T.no_grad():
    oupt = model(X)  #  all logits

  for c in range(n_classes):
    idxs = np.where(Y==c)  # indices where Y is c
    logits_c = oupt[idxs]  # logits corresponding to Y == c
    arg_maxs_c = T.argmax(logits_c, dim=1)  # predicted class
    num_correct = T.sum(arg_maxs_c == c)
    acc_c = num_correct.item() / len(arg_maxs_c)
    print("%0.4f " % acc_c)

Writing the function took me a bit longer than I had expected. The coding part of my brain thinks iteratively rather than set-wise. This is why I’m most comfortable with languages like C# and standard Python, and less comfortable with SQL and things like Python list comprehensions.

Good fun.



Three books where it’s difficult to classify the accuracy of the title without more information. Left: “It Must’ve Been the Fish Sticks”. Center: “How to Talk to Your Cat About Gun Safety”. Right: “Mommy Drinks Because You’re Bad”.


Posted in PyTorch | Leave a comment

Binary Classification Using PyTorch 1.12.1 on Windows 10/11

There are frequent updates to the PyTorch neural network library, and I’m continuously learning new techniques and best practices. I figured it was time to update one of my standard binary classification demos for the current PyTorch version 1.12.1.

I currently use Python 3.7.6 from the Anaconda 2020.02 distribution on a Windows 10/11 machine. I located the appropriate PyTorch .whl file at https://download.pytorch.org/whl/torch_stable.html — torch-1.12.1+cpu-cp37-cp37m-win_amd64.whl. Even though I have installed PyTorch hundreds of times, I have grabbed the wrong .whl file more than once.

I opened a Windows command shell with admin privileges. I uninstalled my old PyTorch 1.10.0 using the command “pip uninstall torch”. Then I navigated to the directory holding the new .whl file and installed it with the command “pip install torch-1.12-etc-.whl”. There were no problems.

I used one of my standard datasets for binary classification. The data looks like:

 1   0.24   1 0 0   0.2950   0 0 1
 0   0.39   0 0 1   0.5120   0 1 0
 1   0.63   0 1 0   0.7580   1 0 0
 0   0.36   1 0 0   0.4450   0 1 0
. . . 

Each line of data represents a person. The fields are sex (male = 0, female = 1), age (normalized by dividing by 100), state (michigan = 100, nebraska = 010, oklahoma = 001), annual income (divided by 100,000), and politics type (conservative = 100, moderate = 010, liberal = 001). The goal is to predict the gender of a person from their age, state, income, and politics type.

My demo network used a 8-(10-10)-1 architecture with tanh() hidden activation and sigmoid() activation on the output node. I used explicit weight and bias initialization:

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(8, 10)  # 8-(10-10)-1
    self.hid2 = T.nn.Linear(10, 10)
    self.oupt = T.nn.Linear(10, 1)

    T.nn.init.xavier_uniform_(self.hid1.weight) 
    T.nn.init.zeros_(self.hid1.bias)
    T.nn.init.xavier_uniform_(self.hid2.weight) 
    T.nn.init.zeros_(self.hid2.bias)
    T.nn.init.xavier_uniform_(self.oupt.weight) 
    T.nn.init.zeros_(self.oupt.bias)

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.tanh(self.hid2(z))
    z = T.sigmoid(self.oupt(z))  # for BCELoss()
    return z

For training, I used a batch size of 10, SGD optimization with a fixed learning rate of 0.01, and BCELoss().

For binary classification problems, a simple model accuracy metric really isn’t enough. For example, if a dataset has items that are 95% of one class, then a model that predicts the majority class every time will get 95% accuracy. Therefore, I implemented a program-defined metrics() function to compute accuracy, precision, recall and F1 score.

I didn’t run into any serious problems. PyTorch is slowly but surely stabilizing. Most of the version changes are related to advanced architectures such as Transformers rather than standard architectures.

Good fun!



There are quite a few research studies that show people can correctly identify a person’s gender just by seeing their face for a fraction of a second. In science fiction movies, most aliens are assumed to be male. Here are three female aliens who aren’t obviously female. Left: Sil from “Species” (1995) was played by actress Natasha Henstridge. She was not a nice alien. Center: The Martian mastermind from “Invaders from Mars” (1953) was played by actress Luce Potter. She was not a nice alien. Right: An alien from the planet Kas-onar in “Valerian and the City of a Thousand Planets” (2017). A good alien.


Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols.

# people_gender.py
# binary classification
# PyTorch 1.12.1-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10/11 

import numpy as np
import torch as T
device = T.device('cpu')  # apply to Tensor or Module

class PeopleDataset(T.utils.data.Dataset):
  # sex age   state    income  politics
  #  0  0.27  0  1  0  0.7610  0 0 1
  #  1  0.19  0  0  1  0.6550  1 0 0
  # sex: 0 = male, 1 = female
  # state: michigan, nebraska, oklahoma
  # politics: conservative, moderate, liberal

  def __init__(self, src_file):
    all_data = np.loadtxt(src_file, usecols=range(0,9),
      delimiter="\t", comments="#", dtype=np.float32) 

    self.x_data = T.tensor(all_data[:,1:9],
      dtype=T.float32).to(device)
    self.y_data = T.tensor(all_data[:,0],
      dtype=T.float32).to(device)  # float32 required

    self.y_data = self.y_data.reshape(-1,1)  # 2-D required

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    feats = self.x_data[idx,:]  # idx row, all 8 cols
    sex = self.y_data[idx,:]    # idx row, the only col
    return feats, sex  # as a Tuple

# ---------------------------------------------------------

def metrics(model, ds, thresh=0.5):
  # note: N = total number of items = TP + FP + TN + FN
  # accuracy  = (TP + TN)  / N
  # precision = TP / (TP + FP)
  # recall    = TP / (TP + FN)
  # F1        = 2 / [(1 / precision) + (1 / recall)]

  tp = 0; tn = 0; fp = 0; fn = 0
  for i in range(len(ds)):
    inpts = ds[i][0]         # dictionary style
    target = ds[i][1]        # float32  [0.0] or [1.0]
    with T.no_grad():
      p = model(inpts)       # between 0.0 and 1.0

    # should really avoid 'target == 1.0'
    if target "gt" 0.5 and p "gte" thresh:    # TP
      tp += 1
    elif target "gt" 0.5 and p "lt" thresh:   # FP
      fp += 1
    elif target "lt" 0.5 and p "lt" thresh:   # TN
      tn += 1
    elif target "lt" 0.5 and p "gte" thresh:  # FN
      fn += 1

  N = tp + fp + tn + fn
  if N != len(ds):
    print("FATAL LOGIC ERROR in metrics()")

  accuracy = (tp + tn) / (N * 1.0)
  precision = (1.0 * tp) / (tp + fp)
  recall = (1.0 * tp) / (tp + fn)
  f1 = 2.0 / ((1.0 / precision) + (1.0 / recall))
  return (accuracy, precision, recall, f1)  # as a Tuple

# ---------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(8, 10)  # 8-(10-10)-1
    self.hid2 = T.nn.Linear(10, 10)
    self.oupt = T.nn.Linear(10, 1)

    T.nn.init.xavier_uniform_(self.hid1.weight) 
    T.nn.init.zeros_(self.hid1.bias)
    T.nn.init.xavier_uniform_(self.hid2.weight) 
    T.nn.init.zeros_(self.hid2.bias)
    T.nn.init.xavier_uniform_(self.oupt.weight) 
    T.nn.init.zeros_(self.oupt.bias)

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.tanh(self.hid2(z))
    z = T.sigmoid(self.oupt(z))  # for BCELoss()
    return z

# ----------------------------------------------------------

def main():
  # 0. get started
  print("\nPeople gender using PyTorch ")
  T.manual_seed(1)
  np.random.seed(1)

  # 1. create Dataset and DataLoader objects
  print("\nCreating People train and test Datasets ")

  train_file = ".\\Data\\people_train.txt"
  test_file = ".\\Data\\people_test.txt"

  train_ds = PeopleDataset(train_file)  # 200 rows
  test_ds = PeopleDataset(test_file)    # 40 rows

  bat_size = 10
  train_ldr = T.utils.data.DataLoader(train_ds,
    batch_size=bat_size, shuffle=True)

  # 2. create neural network
  print("\nCreating 8-(10-10)-1 binary NN classifier \n")
  net = Net().to(device)

  # 3. train network
  net.train()  # set training mode
  lrn_rate = 0.01
  loss_func = T.nn.BCELoss()  # binary cross entropy
  optimizer = T.optim.SGD(net.parameters(),
    lr=lrn_rate)
  max_epochs = 500
  ep_log_interval = 100

  print("Loss function: " + str(loss_func))
  print("Optimizer: " + str(optimizer.__class__.__name__))
  print("Learn rate: " + "%0.3f" % lrn_rate)
  print("Batch size: " + str(bat_size))
  print("Max epochs: " + str(max_epochs))

  print("\nStarting training")
  for epoch in range(0, max_epochs):
    epoch_loss = 0.0            # for one full epoch
    for (batch_idx, batch) in enumerate(train_ldr):
      X = batch[0]             # [bs,4]  inputs
      Y = batch[1]             # [bs,1]  targets
      oupt = net(X)            # [bs,1]  computeds 

      loss_val = loss_func(oupt, Y)   # a tensor
      epoch_loss += loss_val.item()  # accumulate
      optimizer.zero_grad() # reset all gradients
      loss_val.backward()   # compute new gradients
      optimizer.step()      # update all weights

    if epoch % ep_log_interval == 0:
      print("epoch = %4d   loss = %8.4f" % \
        (epoch, epoch_loss))
  print("Done ")

# ----------------------------------------------------------

  # 4. evaluate model
  net.eval()
  metrics_train = metrics(net, train_ds, thresh=0.5)
  print("\nMetrics for train data: ")
  print("accuracy  = %0.4f " % metrics_train[0])
  print("precision = %0.4f " % metrics_train[1])
  print("recall    = %0.4f " % metrics_train[2])
  print("F1        = %0.4f " % metrics_train[3])

  metrics_test = metrics(net, test_ds, thresh=0.5)
  print("\nMetrics for test data: ")
  print("accuracy  = %0.4f " % metrics_test[0])
  print("precision = %0.4f " % metrics_test[1])
  print("recall    = %0.4f " % metrics_test[2])
  print("F1        = %0.4f " % metrics_test[3])

  # 5. save model
  print("\nSaving trained model state_dict ")
  # path = ".\\Models\\people_model.pt"
  # T.save(net.state_dict(), path)

  # 6. make a prediction 
  print("\nSetting age = 30  Oklahoma  $40,000  moderate")
  inpt = np.array([[0.30, 0,0,1, 0.40, 0,1,0]],
    dtype=np.float32)
  inpt = T.tensor(inpt, dtype=T.float32).to(device)

  net.eval()
  with T.no_grad():
    oupt = net(inpt)    # a Tensor
  pred_prob = oupt.item()  # scalar, [0.0, 1.0]
  print("Computed output: ", end="")
  print("%0.4f" % pred_prob)

  if pred_prob "lt" 0.5:
    print("Prediction = male")
  else:
    print("Prediction = female")

  print("\nEnd People binary demo ")

if __name__== "__main__":
  main()

Training data. Replace comma characters with tab characters and save as people_train.txt.

# people_train.txt
# sex (0 = male, 1 = female) - dependent variable
# age, state (michigan, nebraska, oklahoma), income,
# politics type (conservative, moderate, liberal)
#
1,0.24,1,0,0,0.2950,0,0,1
0,0.39,0,0,1,0.5120,0,1,0
1,0.63,0,1,0,0.7580,1,0,0
0,0.36,1,0,0,0.4450,0,1,0
1,0.27,0,1,0,0.2860,0,0,1
1,0.50,0,1,0,0.5650,0,1,0
1,0.50,0,0,1,0.5500,0,1,0
0,0.19,0,0,1,0.3270,1,0,0
1,0.22,0,1,0,0.2770,0,1,0
0,0.39,0,0,1,0.4710,0,0,1
1,0.34,1,0,0,0.3940,0,1,0
0,0.22,1,0,0,0.3350,1,0,0
1,0.35,0,0,1,0.3520,0,0,1
0,0.33,0,1,0,0.4640,0,1,0
1,0.45,0,1,0,0.5410,0,1,0
1,0.42,0,1,0,0.5070,0,1,0
0,0.33,0,1,0,0.4680,0,1,0
1,0.25,0,0,1,0.3000,0,1,0
0,0.31,0,1,0,0.4640,1,0,0
1,0.27,1,0,0,0.3250,0,0,1
1,0.48,1,0,0,0.5400,0,1,0
0,0.64,0,1,0,0.7130,0,0,1
1,0.61,0,1,0,0.7240,1,0,0
1,0.54,0,0,1,0.6100,1,0,0
1,0.29,1,0,0,0.3630,1,0,0
1,0.50,0,0,1,0.5500,0,1,0
1,0.55,0,0,1,0.6250,1,0,0
1,0.40,1,0,0,0.5240,1,0,0
1,0.22,1,0,0,0.2360,0,0,1
1,0.68,0,1,0,0.7840,1,0,0
0,0.60,1,0,0,0.7170,0,0,1
0,0.34,0,0,1,0.4650,0,1,0
0,0.25,0,0,1,0.3710,1,0,0
0,0.31,0,1,0,0.4890,0,1,0
1,0.43,0,0,1,0.4800,0,1,0
1,0.58,0,1,0,0.6540,0,0,1
0,0.55,0,1,0,0.6070,0,0,1
0,0.43,0,1,0,0.5110,0,1,0
0,0.43,0,0,1,0.5320,0,1,0
0,0.21,1,0,0,0.3720,1,0,0
1,0.55,0,0,1,0.6460,1,0,0
1,0.64,0,1,0,0.7480,1,0,0
0,0.41,1,0,0,0.5880,0,1,0
1,0.64,0,0,1,0.7270,1,0,0
0,0.56,0,0,1,0.6660,0,0,1
1,0.31,0,0,1,0.3600,0,1,0
0,0.65,0,0,1,0.7010,0,0,1
1,0.55,0,0,1,0.6430,1,0,0
0,0.25,1,0,0,0.4030,1,0,0
1,0.46,0,0,1,0.5100,0,1,0
0,0.36,1,0,0,0.5350,1,0,0
1,0.52,0,1,0,0.5810,0,1,0
1,0.61,0,0,1,0.6790,1,0,0
1,0.57,0,0,1,0.6570,1,0,0
0,0.46,0,1,0,0.5260,0,1,0
0,0.62,1,0,0,0.6680,0,0,1
1,0.55,0,0,1,0.6270,1,0,0
0,0.22,0,0,1,0.2770,0,1,0
0,0.50,1,0,0,0.6290,1,0,0
0,0.32,0,1,0,0.4180,0,1,0
0,0.21,0,0,1,0.3560,1,0,0
1,0.44,0,1,0,0.5200,0,1,0
1,0.46,0,1,0,0.5170,0,1,0
1,0.62,0,1,0,0.6970,1,0,0
1,0.57,0,1,0,0.6640,1,0,0
0,0.67,0,0,1,0.7580,0,0,1
1,0.29,1,0,0,0.3430,0,0,1
1,0.53,1,0,0,0.6010,1,0,0
0,0.44,1,0,0,0.5480,0,1,0
1,0.46,0,1,0,0.5230,0,1,0
0,0.20,0,1,0,0.3010,0,1,0
0,0.38,1,0,0,0.5350,0,1,0
1,0.50,0,1,0,0.5860,0,1,0
1,0.33,0,1,0,0.4250,0,1,0
0,0.33,0,1,0,0.3930,0,1,0
1,0.26,0,1,0,0.4040,1,0,0
1,0.58,1,0,0,0.7070,1,0,0
1,0.43,0,0,1,0.4800,0,1,0
0,0.46,1,0,0,0.6440,1,0,0
1,0.60,1,0,0,0.7170,1,0,0
0,0.42,1,0,0,0.4890,0,1,0
0,0.56,0,0,1,0.5640,0,0,1
0,0.62,0,1,0,0.6630,0,0,1
0,0.50,1,0,0,0.6480,0,1,0
1,0.47,0,0,1,0.5200,0,1,0
0,0.67,0,1,0,0.8040,0,0,1
0,0.40,0,0,1,0.5040,0,1,0
1,0.42,0,1,0,0.4840,0,1,0
1,0.64,1,0,0,0.7200,1,0,0
0,0.47,1,0,0,0.5870,0,0,1
1,0.45,0,1,0,0.5280,0,1,0
0,0.25,0,0,1,0.4090,1,0,0
1,0.38,1,0,0,0.4840,1,0,0
1,0.55,0,0,1,0.6000,0,1,0
0,0.44,1,0,0,0.6060,0,1,0
1,0.33,1,0,0,0.4100,0,1,0
1,0.34,0,0,1,0.3900,0,1,0
1,0.27,0,1,0,0.3370,0,0,1
1,0.32,0,1,0,0.4070,0,1,0
1,0.42,0,0,1,0.4700,0,1,0
0,0.24,0,0,1,0.4030,1,0,0
1,0.42,0,1,0,0.5030,0,1,0
1,0.25,0,0,1,0.2800,0,0,1
1,0.51,0,1,0,0.5800,0,1,0
0,0.55,0,1,0,0.6350,0,0,1
1,0.44,1,0,0,0.4780,0,0,1
0,0.18,1,0,0,0.3980,1,0,0
0,0.67,0,1,0,0.7160,0,0,1
1,0.45,0,0,1,0.5000,0,1,0
1,0.48,1,0,0,0.5580,0,1,0
0,0.25,0,1,0,0.3900,0,1,0
0,0.67,1,0,0,0.7830,0,1,0
1,0.37,0,0,1,0.4200,0,1,0
0,0.32,1,0,0,0.4270,0,1,0
1,0.48,1,0,0,0.5700,0,1,0
0,0.66,0,0,1,0.7500,0,0,1
1,0.61,1,0,0,0.7000,1,0,0
0,0.58,0,0,1,0.6890,0,1,0
1,0.19,1,0,0,0.2400,0,0,1
1,0.38,0,0,1,0.4300,0,1,0
0,0.27,1,0,0,0.3640,0,1,0
1,0.42,1,0,0,0.4800,0,1,0
1,0.60,1,0,0,0.7130,1,0,0
0,0.27,0,0,1,0.3480,1,0,0
1,0.29,0,1,0,0.3710,1,0,0
0,0.43,1,0,0,0.5670,0,1,0
1,0.48,1,0,0,0.5670,0,1,0
1,0.27,0,0,1,0.2940,0,0,1
0,0.44,1,0,0,0.5520,1,0,0
1,0.23,0,1,0,0.2630,0,0,1
0,0.36,0,1,0,0.5300,0,0,1
1,0.64,0,0,1,0.7250,1,0,0
1,0.29,0,0,1,0.3000,0,0,1
0,0.33,1,0,0,0.4930,0,1,0
0,0.66,0,1,0,0.7500,0,0,1
0,0.21,0,0,1,0.3430,1,0,0
1,0.27,1,0,0,0.3270,0,0,1
1,0.29,1,0,0,0.3180,0,0,1
0,0.31,1,0,0,0.4860,0,1,0
1,0.36,0,0,1,0.4100,0,1,0
1,0.49,0,1,0,0.5570,0,1,0
0,0.28,1,0,0,0.3840,1,0,0
0,0.43,0,0,1,0.5660,0,1,0
0,0.46,0,1,0,0.5880,0,1,0
1,0.57,1,0,0,0.6980,1,0,0
0,0.52,0,0,1,0.5940,0,1,0
0,0.31,0,0,1,0.4350,0,1,0
0,0.55,1,0,0,0.6200,0,0,1
1,0.50,1,0,0,0.5640,0,1,0
1,0.48,0,1,0,0.5590,0,1,0
0,0.22,0,0,1,0.3450,1,0,0
1,0.59,0,0,1,0.6670,1,0,0
1,0.34,1,0,0,0.4280,0,0,1
0,0.64,1,0,0,0.7720,0,0,1
1,0.29,0,0,1,0.3350,0,0,1
0,0.34,0,1,0,0.4320,0,1,0
0,0.61,1,0,0,0.7500,0,0,1
1,0.64,0,0,1,0.7110,1,0,0
0,0.29,1,0,0,0.4130,1,0,0
1,0.63,0,1,0,0.7060,1,0,0
0,0.29,0,1,0,0.4000,1,0,0
0,0.51,1,0,0,0.6270,0,1,0
0,0.24,0,0,1,0.3770,1,0,0
1,0.48,0,1,0,0.5750,0,1,0
1,0.18,1,0,0,0.2740,1,0,0
1,0.18,1,0,0,0.2030,0,0,1
1,0.33,0,1,0,0.3820,0,0,1
0,0.20,0,0,1,0.3480,1,0,0
1,0.29,0,0,1,0.3300,0,0,1
0,0.44,0,0,1,0.6300,1,0,0
0,0.65,0,0,1,0.8180,1,0,0
0,0.56,1,0,0,0.6370,0,0,1
0,0.52,0,0,1,0.5840,0,1,0
0,0.29,0,1,0,0.4860,1,0,0
0,0.47,0,1,0,0.5890,0,1,0
1,0.68,1,0,0,0.7260,0,0,1
1,0.31,0,0,1,0.3600,0,1,0
1,0.61,0,1,0,0.6250,0,0,1
1,0.19,0,1,0,0.2150,0,0,1
1,0.38,0,0,1,0.4300,0,1,0
0,0.26,1,0,0,0.4230,1,0,0
1,0.61,0,1,0,0.6740,1,0,0
1,0.40,1,0,0,0.4650,0,1,0
0,0.49,1,0,0,0.6520,0,1,0
1,0.56,1,0,0,0.6750,1,0,0
0,0.48,0,1,0,0.6600,0,1,0
1,0.52,1,0,0,0.5630,0,0,1
0,0.18,1,0,0,0.2980,1,0,0
0,0.56,0,0,1,0.5930,0,0,1
0,0.52,0,1,0,0.6440,0,1,0
0,0.18,0,1,0,0.2860,0,1,0
0,0.58,1,0,0,0.6620,0,0,1
0,0.39,0,1,0,0.5510,0,1,0
0,0.46,1,0,0,0.6290,0,1,0
0,0.40,0,1,0,0.4620,0,1,0
0,0.60,1,0,0,0.7270,0,0,1
1,0.36,0,1,0,0.4070,0,0,1
1,0.44,1,0,0,0.5230,0,1,0
1,0.28,1,0,0,0.3130,0,0,1
1,0.54,0,0,1,0.6260,1,0,0

Test data. Replace comma characters with tab characters and save as people_test.txt.

0,0.51,1,0,0,0.6120,0,1,0
0,0.32,0,1,0,0.4610,0,1,0
1,0.55,1,0,0,0.6270,1,0,0
1,0.25,0,0,1,0.2620,0,0,1
1,0.33,0,0,1,0.3730,0,0,1
0,0.29,0,1,0,0.4620,1,0,0
1,0.65,1,0,0,0.7270,1,0,0
0,0.43,0,1,0,0.5140,0,1,0
0,0.54,0,1,0,0.6480,0,0,1
1,0.61,0,1,0,0.7270,1,0,0
1,0.52,0,1,0,0.6360,1,0,0
1,0.30,0,1,0,0.3350,0,0,1
1,0.29,1,0,0,0.3140,0,0,1
0,0.47,0,0,1,0.5940,0,1,0
1,0.39,0,1,0,0.4780,0,1,0
1,0.47,0,0,1,0.5200,0,1,0
0,0.49,1,0,0,0.5860,0,1,0
0,0.63,0,0,1,0.6740,0,0,1
0,0.30,1,0,0,0.3920,1,0,0
0,0.61,0,0,1,0.6960,0,0,1
0,0.47,0,0,1,0.5870,0,1,0
1,0.30,0,0,1,0.3450,0,0,1
0,0.51,0,0,1,0.5800,0,1,0
0,0.24,1,0,0,0.3880,0,1,0
0,0.49,1,0,0,0.6450,0,1,0
1,0.66,0,0,1,0.7450,1,0,0
0,0.65,1,0,0,0.7690,1,0,0
0,0.46,0,1,0,0.5800,1,0,0
0,0.45,0,0,1,0.5180,0,1,0
0,0.47,1,0,0,0.6360,1,0,0
0,0.29,1,0,0,0.4480,1,0,0
0,0.57,0,0,1,0.6930,0,0,1
0,0.20,1,0,0,0.2870,0,0,1
0,0.35,1,0,0,0.4340,0,1,0
0,0.61,0,0,1,0.6700,0,0,1
0,0.31,0,0,1,0.3730,0,1,0
1,0.18,1,0,0,0.2080,0,0,1
1,0.26,0,0,1,0.2920,0,0,1
0,0.28,1,0,0,0.3640,0,0,1
0,0.59,0,0,1,0.6940,0,0,1
Posted in PyTorch | Leave a comment

“Researchers Make Computer Chess Programs More Human” on the Pure AI Web Site

I contributed to an article titled “Researchers Make Computer Chess Programs More Human” on the September 2022 edition of the Pure AI web site. See https://pureai.com/articles/2022/09/06/more-human-chess-programs.aspx.

In some machine learning scenarios, it’s useful to make a prediction system that is more human rather than more accurate. The article describes the Maia chess program which is designed to do just that.

Traditional chess programs such as Stockfish reached superhuman levels of performance about 10 years ago. In 2017, the AlphaZero chess program, based on just nine hours of deep reinforcement learning, stunned the chess and research worlds by beating Stockfish in a 100-game match by a score of 28 wins, 0 losses and 72 draws. In 2018, the Leela chess program, based on AlphaZero, was released as an open source project.

There are nine different versions of Maia: Maia 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800 and 1900. The Maia 1100 program is designed to play like a human player who is rated 1100-1199 (beginner). Maia 1500 is designed to play like a human who is rated 1500-1599, which is typical club player strength. Maia 1900 is designed to play like human who is rated 1900-1999, which is expert level but not master strength.

For a given chess position, a Maia program correctly predicts the move that would be made by a human player about 50 percent of the time. A Maia program also predicts human mistakes more accurately than standard chess programs. For moderate mistakes, Maia 1500 predicts with about 35 percent accuracy.

I was quoted in the article. “The idea of creating machine learning models that are more human rather than more accurate has interesting potential applications,” McCaffrey said. “Imagine a pilot flight training scenario. Flight simulators collect huge amounts of information. A system that predicts pilot errors could be extremely valuable.”

McCaffrey added, “Robotics is another scenario where a more human-like system could be useful. Imagine an industrial setting where humans and robots work together. A robot that is trained to be more human rather than more accurate could be safer and give an overall system that is more efficient.”



Posted in Machine Learning | Leave a comment

NFL 2022 Week 3 Predictions – Zoltar Likes Six Vegas Underdogs

Zoltar is my NFL football prediction computer program. It uses reinforcement learning and a neural network. Here are Zoltar’s predictions for week #3 of the 2022 season. These predictions are fuzzy, in the sense that it usually takes Zoltar about four weeks to hit his stride.

Zoltar:    steelers  by    0  dog =      browns    Vegas:      browns  by    5
Zoltar:      saints  by    0  dog =    panthers    Vegas:      saints  by    3
Zoltar:       bears  by    6  dog =      texans    Vegas:       bears  by    3
Zoltar:      chiefs  by    0  dog =       colts    Vegas:      chiefs  by    7
Zoltar:       bills  by    0  dog =    dolphins    Vegas:       bills  by    4
Zoltar:     vikings  by    6  dog =       lions    Vegas:     vikings  by    7
Zoltar:    patriots  by    6  dog =      ravens    Vegas:      ravens  by    3
Zoltar:     bengals  by    2  dog =        jets    Vegas:     bengals  by    5
Zoltar:      titans  by    6  dog =     raiders    Vegas:     raiders  by    1
Zoltar:      eagles  by    0  dog =  commanders    Vegas:      eagles  by    4
Zoltar:    chargers  by    8  dog =     jaguars    Vegas:    chargers  by    7
Zoltar:   cardinals  by    2  dog =        rams    Vegas:        rams  by    4
Zoltar:    seahawks  by    2  dog =     falcons    Vegas:    seahawks  by    2
Zoltar:  buccaneers  by    4  dog =     packers    Vegas:  buccaneers  by  2.5
Zoltar: fortyniners  by    0  dog =     broncos    Vegas: fortyniners  by    2
Zoltar:     cowboys  by    4  dog =      giants    Vegas:      giants  by  2.5

Zoltar theoretically suggests betting when the Vegas line is “significantly” different from Zoltar’s prediction. In mid-season I use 3.0 points difference but for the first few weeks of the season I am a bit more conservative and use 4.0 points difference as the advice threshold criterion.

At the beginning of the season, because of Zoltar’s initialization (all teams regress to an average power rating) and other algorithms, Zoltar is very strongly biased towards Vegas underdogs. I probably need to fix this. For week #3 Zoltar likes six Vegas underdogs:

1. Zoltar likes Vegas underdog Steelers against the Browns.
2. Zoltar likes Vegas underdog Colts against the Chiefs.
3. Zoltar likes Vegas underdog Patriots against the Ravens.
4. Zoltar likes Vegas underdog Titans against the Raiders.
5. Zoltar likes Vegas underdog Cardinals against the Rams.
6. Zoltar likes Vegas underdog Cowboys against the Giants.

For example, a bet on the underdog Steelers against the Browns will pay off if the Steelers win by any score, or if the favored Browns win but by less than 5.0 points (in other words, by 4 points or less). If the favored Browns win by exactly 5 points, the wager is a push.

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better.

In week #2, against the Vegas point spread, Zoltar went a very good (but lucky) 3-1 (using 4.0 points as the advice threshold). Zoltar’s predictions against the point spread were correct except for sadly recommending the underdog Titans against the Bills (the Bills won by a score of 41-7 and easily covered the 10.0 point spread).

Just for fun, I track how well Zoltar does when just trying to predict just which team will win a game. This isn’t useful except for parlay betting. In week #2, just predicting the winning team, Zoltar went 10-6 which is OK but not great. Vegas was slightly better just predicting winners in week #2, going 11-4.

Zoltar sometimes predicts a 0-point margin of victory. There are six such games in week #3. In those situations, to pick a winner (only so I can track raw number of correct predictions) in the first few weeks of the season, Zoltar picks the home team to win. After that, Zoltar uses his algorithms to pick a winner.



My football prediction program is named after the Zoltar fortune teller machine you can find in arcades. Fortune teller machines have been around for over 100 years.


Posted in Zoltar | Leave a comment

A Quick Look at the .NET MAUI (Multi-Platform User Interface) Library

I was updating my Microsoft Visual Studio program and decided to take a look at using the .NET MAUI template. Suppose you want to create an application that runs on a Windows desktop machine, and a Mac machine and an Android phone. You could write the application three times, which is difficult and ugly but in many cases is the best approach to take.

Or you can use MAUI to write the application once and it will build three versions for you. At least in theory.

The idea is that MAUI gives you a single set of APIs that are wrappers over the low level platform implementations for each target platform. I’ve looked at MAUI documentation and it’s very complex.

MAUI is the successor to Xamarin Forms which had the same purpose. The problem with Xamarin, and presumably with MAUI, is that writing a single set of code for multiple platforms is really difficult and complicated. For some applications, learning Xamarin and dealing with its quirks and bugs was more difficult than just writing different versions of the application.

I launched Visual Studio 2022 and updated it to the most recent 17.3.3 version. Then I launched the Visual Studio Installer program and added the MAUI workload.

I restarted Visual Studio and selected the basic MAUI application template. The template code gives a dummy application where you can click on a button and the app keeps track of how many clicks have been made. I selected the Windows Machine emulator and clicked on the green Run arrow. The application built and ran without trouble.

I think the most interesting factor related to MAUI is the cost-benefit analysis regarding when to use MAUI and when to bite the bullet and implement separate code bases for each target platform. If you only have two target platforms, say Android and iOS, then separate code bases are quite manageable (but by no means trivial). If your application is very complex, then MAUI will have inevitable glitches. But there’s a sweet spot combination of number of target platforms and application complexity where MAUI would be a good choice.



Hawaii became a popular travel destination for wealthy Americans starting in the 1930s. The Matson shipping company operated elegant white passenger ships Matsonia, Lurline, Mariposa, and Monterey. Artist Frank McIntosh (1901-1985) created beautiful menu covers for the ships’ dining rooms.


Posted in Miscellaneous | Leave a comment

PyTorch Not-Fully-Connected Layer Using prune.custom_from_mask()

I ran across an interesting PyTorch function that I hadn’t seen before. The torch.nn.utils.prune.custom_from_mask() function can mask out weights and biases in a neural layer. This allows you to create layers that are not fully connected.

I checked the PyTorch documentation, and sadly, as usual, it wasn’t much help:

Prunes tensor corresponding to parameter called name in module
by applying the pre-computed mask in mask. Modifies module in
place (and also return the modified module) by:

1.) adding a named buffer called name+'_mask' corresponding
to the binary mask applied to the parameter name by the
pruning method.

2.) replacing the parameter name by its pruned version, while
the original (unpruned) parameter is stored in a new parameter
named name+'_orig'.

So I decided to experiment. I started with one of my standard multi-class classification examples. The goal is to predict employee job-type (mgmt, supp, tech) from sex, age, city (one of three), and income. My starting network was 6-(10-10)-3.

I modified the network to delete the weight from input node [0] (the sex node) to hidden1 layer node [1]. I also deleted the bias to hidden1 layer node [1].

The key code is:


class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(6, 10)  # 6-(10-10)-3
    self.hid2 = T.nn.Linear(10, 10)
    self.oupt = T.nn.Linear(10, 3)

    # prune
    print("\nMasking hid1 from node 0 to node 1 ")
    msk_wts = T.ones((10,6),
      dtype=T.float32).to(device) # [to, from]
    msk_wts[1][0] = 0  # to [1] from [0]
    T.nn.utils.prune.custom_from_mask(self.hid1,
      name='weight', mask=msk_wts)

    msk_bias = T.tensor([1,0,1,1,1,1,1,1,1,1],
      dtype=T.float32).to(device)
    T.nn.utils.prune.custom_from_mask(self.hid1,
      name='bias', mask=msk_bias)
 
    # default init

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.tanh(self.hid2(z))
    z = self.oupt(z)  # no softmax: CrossEntropyLoss() 
    return z

Like most PyTorch topics, the idea is relatively simple on the surface but there are many very complex ideas lurking under the covers.

Interesting and good fun.



PyTorch masks are used quite often. Masks in fantasy movies are also common. Unusual masks with a sort of Asian theme in three of my favorite fantasy movies. Left: In “The Fall” (2006) Evelyn (actress Justine Waddell) is saved by the hero. Center: Miao Yin (played by actress Suzee Pai) is menaced by an evil wizard in “Big Trouble in Little China” (1986). Right: Princess Su Lin (actress Ni Ni) in “Enter the Warriors Gate” (2016).


Demo code. The training and test data can be found at jamesmccaffrey.wordpress.com/2022/04/29/predicting-employee-job-type-using-pytorch-1-10-on-windows-11/

# employee_job_prune.py
# predict job type from sex, age, city, income
# PyTorch 1.10.0-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10/11 

# explore T.nn.utils.prune.custom_from_mask()

import numpy as np
import time
import torch as T
import torch.nn.utils.prune
device = T.device('cpu')  # apply to Tensor or Module

# -----------------------------------------------------------

class EmployeeDataset(T.utils.data.Dataset):
  # sex  age    city      income  job-type
  # -1   0.27   0  1  0   0.7610   2
  # +1   0.19   0  0  1   0.6550   0
  # sex: -1 = male, +1 = female
  # city: anaheim, boulder, concord
  # job type: mgmt, supp, tech

  def __init__(self, src_file, num_rows=None):
    all_xy = np.loadtxt(src_file, max_rows=num_rows,
      usecols=range(0,7), delimiter="\t", comments="#",
      dtype=np.float32)
    tmp_x = all_xy[0:num_rows,0:6]   # cols [0,6) = [0,5]
    tmp_y = all_xy[0:num_rows,6]  # 1-D
    
    self.x_data = T.tensor(tmp_x, 
      dtype=T.float32).to(device)
    self.y_data = T.tensor(tmp_y,
      dtype=T.int64).to(device)  # 1-D

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    preds = self.x_data[idx]
    trgts = self.y_data[idx] 
    return preds, trgts  # as a Tuple

# -----------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(6, 10)  # 6-(10-10)-3
    self.hid2 = T.nn.Linear(10, 10)
    self.oupt = T.nn.Linear(10, 3)

    # prune
    print("\nMasking hid1 from node 0 to node 1 ")
    msk_wts = T.ones((10,6),
      dtype=T.float32).to(device) # [to, from]
    msk_wts[1][0] = 0  # to [1] from [0]
    T.nn.utils.prune.custom_from_mask(self.hid1,
      name='weight', mask=msk_wts)

    msk_bias = T.tensor([1,0,1,1,1,1,1,1,1,1],
      dtype=T.float32).to(device)
    T.nn.utils.prune.custom_from_mask(self.hid1,
      name='bias', mask=msk_bias)
 
    # default init

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.tanh(self.hid2(z))
    z = self.oupt(z)  # no softmax: CrossEntropyLoss() 
    return z

# -----------------------------------------------------------

def accuracy(model, ds):
  # assumes model.eval()
  # item-by-item version
  n_correct = 0; n_wrong = 0
  for i in range(len(ds)):
    X = ds[i][0]
    Y = ds[i][1]  # 0 1 or 2
    with T.no_grad():
      oupt = model(X)  # logits form

    big_idx = T.argmax(oupt)  # 0 or 1 or 2
    if big_idx == Y:
      n_correct += 1
    else:
      n_wrong += 1

  acc = (n_correct * 1.0) / (n_correct + n_wrong)
  return acc

# -----------------------------------------------------------

def main():
  # 0. get started
  print("\nBegin Employee predict job type pruning demo")
  T.manual_seed(1)
  np.random.seed(1)
  
  # 1. create DataLoader objects
  print("\nCreating Employee Datasets ")

  train_file = ".\\Data\\employee_train.txt"
  train_ds = EmployeeDataset(train_file)  # all 200 rows

  test_file = ".\\Data\\employee_test.txt"
  test_ds = EmployeeDataset(test_file)  # all 40 rows

  bat_size = 10
  train_ldr = T.utils.data.DataLoader(train_ds,
    batch_size=bat_size, shuffle=True)

# -----------------------------------------------------------

  # 2. create network
  print("\nCreating 6-(10-10)-3 pruned neural network ")
  net = Net().to(device)

# -----------------------------------------------------------

  # 3. train model
  max_epochs = 1000
  ep_log_interval = 100
  lrn_rate = 0.01

  loss_func = T.nn.CrossEntropyLoss()  # applies log-softmax()
  optimizer = T.optim.SGD(net.parameters(), lr=lrn_rate)

  print("\nbat_size = %3d " % bat_size)
  print("loss = " + str(loss_func))
  print("optimizer = SGD")
  print("max_epochs = %3d " % max_epochs)
  print("lrn_rate = %0.3f " % lrn_rate)

  print("\nStarting training")
  net.train()  # or net = net.train()
  for epoch in range(0, max_epochs):
    T.manual_seed(epoch+1)  # checkpoint reproducibility
    epoch_loss = 0  # for one full epoch

    for (batch_idx, batch) in enumerate(train_ldr):
      X = batch[0]  # inputs
      Y = batch[1]     # correct class/label/job

      optimizer.zero_grad()
      oupt = net(X)
      loss_val = loss_func(oupt, Y)  # a tensor
      epoch_loss += loss_val.item()  # accumulate
      loss_val.backward()
      optimizer.step()

    if epoch % ep_log_interval == 0:
      print("epoch = %5d  |  loss = %10.4f" % \
        (epoch, epoch_loss))
  print("Done ")

  # print(net.hid1.weight)  # one wt is 0
  # print(net.hid1.bias)    # corresponding bias is 0

# -----------------------------------------------------------

  # 4. evaluate model accuracy
  print("\nComputing model accuracy")
  net.eval()
  acc_train = accuracy(net, train_ds)  # item-by-item
  print("Accuracy on training data = %0.4f" % acc_train)
  acc_test = accuracy(net, test_ds) 
  print("Accuracy on test data = %0.4f" % acc_test)

  # 5. make a prediction
  print("\nPredicting job for M  30  concord  $50,000: ")
  X = np.array([[-1, 0.30,  0,0,1,  0.5000]], dtype=np.float32)
  X = T.tensor(X, dtype=T.float32).to(device) 

  with T.no_grad():
    logits = net(X)  # do not sum to 1.0
  probs = T.softmax(logits, dim=1)  # tensor
  probs = probs.numpy()  # numpy vector prints better
  np.set_printoptions(precision=4, suppress=True)
  print(probs)

  # 6. save model (state_dict approach)
  print("\nSaving trained model state")
  # fn = ".\\Models\\employee_model.pth"
  # T.save(net.state_dict(), fn)

  print("\nEnd Employee predict job pruning demo")

if __name__ == "__main__":
  main()
Posted in PyTorch | Leave a comment

Why PyTorch Layer Weight Matrix Shape Seems Backward

A PyTorch weight matrix has shape [num_out, num_in] rather than the more logical [num_in, num_out]. This seems a bit strange. Furthermore, when computing a set of output nodes, the weight matrix must be transposed before applying matrix multiplication. This seems very inefficient, especially because output nodes are computed many, many (often millions) times during training.

Surprisingly, the PyTorch apparently backward weight matrix shape is better because 1.) behind the scenes the matrix transpose operation is “free” (there’s no actual transposition involved), and 2.) behind the scenes the backward pass to compute gradients is usually (but not always) faster with a [num_out, num_in] shape than with a [num_in, num_out] shape. See the discussion at discuss.pytorch.org/t/why-does-the-linear-module-seems-to-do-unnecessary-transposing/6277/7.

Note: The Keras neural network library stores weight matrices in [num_in, num_out] shape.

Here’s a concrete example of a 4-7-3 neural network for the Iris dataset. Iris data has four inputs (sepal length, width, petal length, width), and three outputs (“setosa”, “versicolor”, “virginica”).

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()  # Python 3.2 and earlier
    self.hid1 = T.nn.Linear(4, 7)  # 4-7-3
    self.oupt = T.nn.Linear(7, 3)
    
    lo = -0.10; hi = +0.10
    T.nn.init.uniform_(self.hid1.weight, lo, hi)
    T.nn.init.zeros_(self.hid1.bias)
    T.nn.init.uniform_(self.oupt.weight, lo, hi)
    T.nn.init.zeros_(self.oupt.bias)
    
  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.log_softmax(self.oupt(z), dim=1)  # NLLLoss() 
    return z

The hid1 layer weight matrix has shape [7,4] and the oupt layer weight matrix has shape [3,7].

One scenario where the shape of weight matrices is relevant is when writing custom weight initialization code.



Many of the movie posters for early James Bond films featured the backs of women. I have no idea why this was done or what it means. Left: “Dr. No” (1962), the first movie in the series. Center: Thunderball” (1965), the fourth movie in the series. Right: “You Only Live Twice” (1967), the fifth movie in the series.


Posted in PyTorch | Leave a comment

“Multi-Class Classification Using PyTorch, Part 1: New Best Practices” in Visual Studio Magazine

I wrote an article titled “Multi-Class Classification Using PyTorch, Part 1: New Best Practices” in the September 2022 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2022/09/06/multi-class-pytorch.aspx.

A multi-class classification problem is one where the goal is to predict a discrete value where there are three or more possibilities. For example, you might want to predict the political leaning (conservative, moderate, liberal) of a person based on their sex, age, state where they live and annual income.

I’d written previous articles on multi-class classification, but machine learning with deep neural techniques has advanced quickly. The article updates multi-class classification techniques and best practices based on experience I’d gained over the past two years.

The article explains a demo program. The demo begins by loading a 200-item file of training data and a 40-item set of test data. Each tab-delimited line represents a person. The fields are sex, age, state of residence, annual income and politics type (0 = conservative, 1 = moderate and 2 = liberal). The goal is to predict politics type from sex, age, state and income.

After the training data is loaded into memory, the demo creates a 6-(10-10)-3 neural network. This means there are six input nodes, two hidden neural layers with 10 nodes each and three output nodes.

The demo prepares to train the network by setting a batch size of 10, stochastic gradient descent (SGD) optimization with a learning rate of 0.01 and maximum training epochs of 1,000 passes through the training data.

The demo program monitors training by computing and displaying the loss value for one epoch. The loss value slowly decreases, which indicates that training is probably succeeding. The magnitude of the loss values isn’t directly interpretable; the important thing is that the loss decreases.

After 1,000 training epochs, the demo program computes the accuracy of the trained model on the training data as 81.50 percent (163 out of 200 correct). The model accuracy on the test data is 75.00 percent (30 out of 40 correct).

After evaluating the trained network, the demo predicts the politics type for a person who is male, 30 years old, from Oklahoma, who makes $50,000 annually. The prediction is [0.6905, 0.3049, 0.0047]. These values are pseudo-probabilities. The largest value (0.6905) is at index [0] so the prediction is class 0 = conservative.

The demo concludes by saving the trained model to file so that it can be used without having to retrain the network from scratch. There are two different ways to save a PyTorch model. The demo uses the save-state approach.

The article explains the first parts of the demo: installing PyTorch, creating the data files and Dataset object, and defining the neural network. A second article will explain how to train the network, evaluate network accuracy, use a trained network, and save a network.



I lived in Hawaii from 1984 to 1997. Left: Style classification = 1950s mid-century modern. Center: Style classification = 1930s art deco. Right: Style classification = 1960s psychedelic.


Posted in PyTorch | Leave a comment

NFL 2022 Week 2 Predictions – Zoltar Likes Four Vegas Underdogs

Zoltar is my NFL football prediction computer program. It uses reinforcement learning and a neural network. Here are Zoltar’s predictions for week #2 of the 2022 season. These predictions are fuzzy, in the sense that it usually takes Zoltar about four weeks to hit his stride.

Zoltar:      chiefs  by    6  dog =    chargers    Vegas:      chiefs  by  3.5
Zoltar:      browns  by    6  dog =        jets    Vegas:      browns  by    6
Zoltar:  commanders  by    0  dog =       lions    Vegas:       lions  by  2.5
Zoltar:       colts  by    2  dog =     jaguars    Vegas:       colts  by    4
Zoltar:  buccaneers  by    0  dog =      saints    Vegas:  buccaneers  by    3
Zoltar:      giants  by    2  dog =    panthers    Vegas:      giants  by  2.5
Zoltar:    steelers  by    2  dog =    patriots    Vegas:    patriots  by  1.5
Zoltar:      ravens  by    1  dog =    dolphins    Vegas:      ravens  by  3.5
Zoltar:        rams  by    6  dog =     falcons    Vegas:        rams  by 10.5
Zoltar: fortyniners  by    6  dog =    seahawks    Vegas: fortyniners  by    9
Zoltar:     cowboys  by    6  dog =     bengals    Vegas:     bengals  by  7.5
Zoltar:     broncos  by    6  dog =      texans    Vegas:     broncos  by   10
Zoltar:     raiders  by    1  dog =   cardinals    Vegas:     raiders  by    6
Zoltar:     packers  by    9  dog =       bears    Vegas:     packers  by   10
Zoltar:       bills  by    2  dog =      titans    Vegas:       bills  by   10
Zoltar:      eagles  by    4  dog =     vikings    Vegas:      eagles  by  1.5

Zoltar theoretically suggests betting when the Vegas line is “significantly” different from Zoltar’s prediction. In mid-season I use 3.0 points difference but for the first few weeks of the season I go a bit more conservative and use 4.0 points difference as the advice threshold criterion.

At the beginning of the season, because of Zoltar’s initialization (all teams regress to an average power rating) and other algorithms, Zoltar is very strongly biased towards Vegas underdogs. I probably need to fix this. For week #2:

1. Zoltar likes Vegas underdog Falcons against the Rams.
2. Zoltar likes Vegas underdog Cowboys against the Bengals.
3. Zoltar likes Vegas underdog Cardinals against the Raiders.
4. Zoltar likes Vegas underdog Titans against the Bills.

For example, a bet on the underdog Falcons against the Rams will pay off if the Falcons win by any score, or if the favored Rams win but by less than 10.5 points (in other words, by 10 points or less).

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better.

In week #1, against the Vegas point spread, Zoltar went a very good (but lucky) 5-2 (using 4.0 points as the advice threshold). Zoltar’s predictions against the point spread were correct except for recommending the Rams over the Bills (the Rams were beaten badly, 31-10, and so the Bills easily covered their 2.5 point spread), and recommending the terrible Jets against the Ravens (the Jets lost 24-9 and so the Ravens covered their 5.5 point spread).

Just for fun, I track how well Zoltar does when just trying to predict just which team will win a game. This isn’t useful except for parlay betting. In week #1, just predicting the winning team, Zoltar went 7-8 which isn’t very good but is typical of the first few weeks of the season. Note: There was one tie game: Colts 20, Texans 20. Vegas did very well just predicting winners in week #1, going 10-5.

Zoltar sometimes predicts a 0-point margin of victory. There are two such games in week #2. In those situations, to pick a winner (only so I can track raw number of correct predictions) in the first few weeks of the season, Zoltar picks the home team to win. After that, Zoltar uses his algorithms to pick a winner.



My NFL prediction system is named after the Zoltar fortune teller machine you can find in arcades. Left: A Zoltar machine outside of Houdini’s Magic Shop in the New York New York hotel in Las Vegas. Sadly, I think the shop is gone now. Center: Wagering on NFL games is a multi-billion dollar business. This is a photo of the sports book (betting area) at the MGM Grand hotel in Las Vegas. Right: This is a sheet where you can see the point spread for a given week. You can place a bet by going to a person at a desk in the sports book, or you can place a bet using a terminal or online.


Posted in Zoltar | Leave a comment

Using the Simplest Possible Transformer Sequence-to-Sequence Example

I’ve been exploring PyTorch Transformer Architecture models sequence-to-sequence problems for several months. TA architecture systems are among the most complicated software things I’ve ever worked with.

I recently completed a demo implementation of my idea of the simplest possible sequence-to-sequence. That demo is incomplete because it trained a seq-to-seq model but did not use the trained model to make a prediction. See https://jamesmccaffrey.wordpress.com/2022/09/09/simplest-transformer-seq-to-seq-example/.

Unlike relatively simple neural networks, such as a multi-class classifier, using a trained seq-to-seq model is a significant challenge. So I took the trained model and wrote a demo program to use the model to make a prediction.

My input sequence is [1, 4,5,6,7,6,5,4, 2]. The 1 is start-of-sequence, the 2 is end-of-sequence. Token 3 is for unknown and token 0 is for padding. I didn’t use 0 or 3 in my demo. The correct output is [1, 5, 6, 7, 8, 7, 6, 5, 2]. My demo didn’t do too well but at least it emitted a legal output sequence: [1, 5, 5, 4, 5, 8, 4, 4, 2].

There are many things that I don’t fully understand about Transformer seq-to-seq systems, including my own demo. But for difficult machine learning topics, persistence and determination are the keys to successful learning.



Transformer software systems are difficult to figure out. There are a surprisingly large number of movies where a human transforms into a snake. Here are three where the plot is difficult to figure out. Left: “The Reptile” (1966) is an English movie about a young woman who transforms into a snake because of a Malay curse. Center: “Cult of the Cobra” (1955) is movie about six men who unintentionally witness a ceremony of an evil cult of women who can transform into snakes. You’d think they’d stay away from mysterious women with dark reptilian eyes after that, but no, they don’t. Right: “The Sorcerer and the White Snake” (2011) is a Chinese movie. The plot baffled me but there are two women who can turn into snakes.


Demo code:

# seq2seq_use.py
# Transformer seq-to-seq usage example

# PyTorch 1.12.1-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10/11

import numpy as np
import torch as T
import math

device = T.device('cpu')
T.set_num_threads(1)

# -----------------------------------------------------------

class TransformerNet(T.nn.Module):
  def __init__(self):
    # vocab_size = 12, embed_dim = d_model = 4, seq_len = 9/10
    super(TransformerNet, self).__init__()  # classic syntax
    self.embed = T.nn.Embedding(12, 4)       # word embedding
    self.pos_enc = PositionalEncoding(4)    # positional
    self.trans = T.nn.Transformer(d_model=4, nhead=2, \
      dropout=0.0, batch_first=True)  # d_model div by nhead
    self.fc = T.nn.Linear(4, 12)  # embed_dim to vocab_size
    
  def forward(self, src, tgt, tgt_mask):
    s = self.embed(src)
    t = self.embed(tgt)

    s = self.pos_enc(s)  # [bs,seq=10,embed]
    t = self.pos_enc(t)  # [bs,seq=9,embed]

    z = self.trans(src=s, tgt=t, tgt_mask=tgt_mask)
    z = self.fc(z)   
    return z 

# -----------------------------------------------------------

class PositionalEncoding(T.nn.Module):  # documentation code
  def __init__(self, d_model: int, dropout: float=0.0,
   max_len: int=5000):
    super(PositionalEncoding, self).__init__()  # old syntax
    self.dropout = T.nn.Dropout(p=dropout)
    pe = T.zeros(max_len, d_model)  # like 10x4
    position = \
      T.arange(0, max_len, dtype=T.float).unsqueeze(1)
    div_term = T.exp(T.arange(0, d_model, 2).float() * \
      (-np.log(10_000.0) / d_model))
    pe[:, 0::2] = T.sin(position * div_term)
    pe[:, 1::2] = T.cos(position * div_term)
    pe = pe.unsqueeze(0).transpose(0, 1)
    self.register_buffer('pe', pe)  # allows state-save

  def forward(self, x):
    x = x + self.pe[:x.size(0), :]
    return self.dropout(x)

# -----------------------------------------------------------

def main():
  # 0. get started
  print("\nBegin PyTorch Transformer seq-to-seq use demo ")
  T.manual_seed(1)  
  np.random.seed(1)

  # 1. create Transformer network
  print("\nCreating batch-first Transformer network ")
  model = TransformerNet().to(device)
  model.eval()

  # 2. load trained model wts and biases
  print("\nLoading saved model weights and biases ")
  fn = ".\\Models\\transformer_seq_model_200_epochs.pt"
  model.load_state_dict(T.load(fn))

# -----------------------------------------------------------
  
  src = T.tensor([[1, 4,5,6,7,6,5,4, 2]],
    dtype=T.int64).to(device)
  print("\nsrc sequence: ")
  print(src)
  print("\ncorrect output: ")
  print("[[1, 5, 6, 7, 8, 7, 6, 5, 2]]")

  print("\nPredicted output: ")
  tgt_in = T.tensor([[1]], dtype=T.int64).to(device)  # SOS
  for i in range(20):  # max output 20 tokens
    n = tgt_in.size(1)
    t_mask = \
      T.nn.Transformer.generate_square_subsequent_mask(n)
    with T.no_grad():
      preds = model(src, tgt_in, tgt_mask=t_mask) 
      # [bs,tgt_in,embed] 
   
    next_token = T.argmax( preds[-1][-1] )  # last set 12 values
    # print(next_token); input()
    next_token = next_token.reshape(1,1)

    tgt_in = T.cat((tgt_in, next_token), dim=1)
    print(tgt_in)  

    if next_token[0][0].item() == 2:  # EOS
      break

  print("\nEnd PyTorch Transformer seq-to-seq use demo ")

if __name__ == "__main__":
  main()
Posted in PyTorch, Transformers | Leave a comment