Binary Classification Using PyTorch 1.12.1 on Windows 10/11

There are frequent updates to the PyTorch neural network library, and I’m continuously learning new techniques and best practices. I figured it was time to update one of my standard binary classification demos for the current PyTorch version 1.12.1.

I currently use Python 3.7.6 from the Anaconda 2020.02 distribution on a Windows 10/11 machine. I located the appropriate PyTorch .whl file at https://download.pytorch.org/whl/torch_stable.html — torch-1.12.1+cpu-cp37-cp37m-win_amd64.whl. Even though I have installed PyTorch hundreds of times, I have grabbed the wrong .whl file more than once.

I opened a Windows command shell with admin privileges. I uninstalled my old PyTorch 1.10.0 using the command “pip uninstall torch”. Then I navigated to the directory holding the new .whl file and installed it with the command “pip install torch-1.12-etc-.whl”. There were no problems.

I used one of my standard datasets for binary classification. The data looks like:

 1   0.24   1 0 0   0.2950   0 0 1
 0   0.39   0 0 1   0.5120   0 1 0
 1   0.63   0 1 0   0.7580   1 0 0
 0   0.36   1 0 0   0.4450   0 1 0
. . .

Each line of data represents a person. The fields are sex (male = 0, female = 1), age (normalized by dividing by 100), state (michigan = 100, nebraska = 010, oklahoma = 001), annual income (divided by 100,000), and politics type (conservative = 100, moderate = 010, liberal = 001). The goal is to predict the gender of a person from their age, state, income, and politics type.

My demo network used a 8-(10-10)-1 architecture with tanh() hidden activation and sigmoid() activation on the output node. I used explicit weight and bias initialization:

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(8, 10)  # 8-(10-10)-1
    self.hid2 = T.nn.Linear(10, 10)
    self.oupt = T.nn.Linear(10, 1)

    T.nn.init.xavier_uniform_(self.hid1.weight) 
    T.nn.init.zeros_(self.hid1.bias)
    T.nn.init.xavier_uniform_(self.hid2.weight) 
    T.nn.init.zeros_(self.hid2.bias)
    T.nn.init.xavier_uniform_(self.oupt.weight) 
    T.nn.init.zeros_(self.oupt.bias)

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.tanh(self.hid2(z))
    z = T.sigmoid(self.oupt(z))  # for BCELoss()
    return z

For training, I used a batch size of 10, SGD optimization with a fixed learning rate of 0.01, and BCELoss().

For binary classification problems, a simple model accuracy metric really isn’t enough. For example, if a dataset has items that are 95% of one class, then a model that predicts the majority class every time will get 95% accuracy. Therefore, I implemented a program-defined metrics() function to compute accuracy, precision, recall and F1 score.

I didn’t run into any serious problems. PyTorch is slowly but surely stabilizing. Most of the version changes are related to advanced architectures such as Transformers rather than standard architectures.

Good fun!

There are quite a few research studies that show people can correctly identify a person’s gender just by seeing their face for a fraction of a second. In science fiction movies, most aliens are assumed to be male. Here are three female aliens who aren’t obviously female. Left: Sil from “Species” (1995) was played by actress Natasha Henstridge. She was not a nice alien. Center: The Martian mastermind from “Invaders from Mars” (1953) was played by actress Luce Potter. She was not a nice alien. Right: An alien from the planet Kas-onar in “Valerian and the City of a Thousand Planets” (2017). A good alien.

Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols.

# people_gender.py
# binary classification
# PyTorch 1.12.1-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10/11 

import numpy as np
import torch as T
device = T.device('cpu')  # apply to Tensor or Module

class PeopleDataset(T.utils.data.Dataset):
  # sex age   state    income  politics
  #  0  0.27  0  1  0  0.7610  0 0 1
  #  1  0.19  0  0  1  0.6550  1 0 0
  # sex: 0 = male, 1 = female
  # state: michigan, nebraska, oklahoma
  # politics: conservative, moderate, liberal

  def __init__(self, src_file):
    all_data = np.loadtxt(src_file, usecols=range(0,9),
      delimiter="\t", comments="#", dtype=np.float32) 

    self.x_data = T.tensor(all_data[:,1:9],
      dtype=T.float32).to(device)
    self.y_data = T.tensor(all_data[:,0],
      dtype=T.float32).to(device)  # float32 required

    self.y_data = self.y_data.reshape(-1,1)  # 2-D required

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    feats = self.x_data[idx,:]  # idx row, all 8 cols
    sex = self.y_data[idx,:]    # idx row, the only col
    return feats, sex  # as a Tuple

# ---------------------------------------------------------

def metrics(model, ds, thresh=0.5):
  # note: N = total number of items = TP + FP + TN + FN
  # accuracy  = (TP + TN)  / N
  # precision = TP / (TP + FP)
  # recall    = TP / (TP + FN)
  # F1        = 2 / [(1 / precision) + (1 / recall)]

  tp = 0; tn = 0; fp = 0; fn = 0
  for i in range(len(ds)):
    inpts = ds[i][0]         # dictionary style
    target = ds[i][1]        # float32  [0.0] or [1.0]
    target = target.type(T.int64)  # make it an int
    with T.no_grad():
      p = model(inpts)       # between 0.0 and 1.0

    # should really avoid 'target == 1.0'
    if target == 1 and p "gte" thresh:    # TP
      tp += 1
    elif target == 1 and p "lt" thresh:   # FP
      fn += 1
    elif target == 0 and p "lt" thresh:   # TN
      tn += 1
    elif target == 0 and p "gte" thresh:  # FN
      fp += 1

  N = tp + fp + tn + fn
  if N != len(ds):
    print("FATAL LOGIC ERROR in metrics()")

  accuracy = (tp + tn) / (N * 1.0)
  precision = (1.0 * tp) / (tp + fp)
  recall = (1.0 * tp) / (tp + fn)
  f1 = 2.0 / ((1.0 / precision) + (1.0 / recall))
  return (accuracy, precision, recall, f1)  # as a Tuple

# ---------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(8, 10)  # 8-(10-10)-1
    self.hid2 = T.nn.Linear(10, 10)
    self.oupt = T.nn.Linear(10, 1)

    T.nn.init.xavier_uniform_(self.hid1.weight) 
    T.nn.init.zeros_(self.hid1.bias)
    T.nn.init.xavier_uniform_(self.hid2.weight) 
    T.nn.init.zeros_(self.hid2.bias)
    T.nn.init.xavier_uniform_(self.oupt.weight) 
    T.nn.init.zeros_(self.oupt.bias)

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.tanh(self.hid2(z))
    z = T.sigmoid(self.oupt(z))  # for BCELoss()
    return z

# ----------------------------------------------------------

def main():
  # 0. get started
  print("\nPeople gender using PyTorch ")
  T.manual_seed(1)
  np.random.seed(1)

  # 1. create Dataset and DataLoader objects
  print("\nCreating People train and test Datasets ")

  train_file = ".\\Data\\people_train.txt"
  test_file = ".\\Data\\people_test.txt"

  train_ds = PeopleDataset(train_file)  # 200 rows
  test_ds = PeopleDataset(test_file)    # 40 rows

  bat_size = 10
  train_ldr = T.utils.data.DataLoader(train_ds,
    batch_size=bat_size, shuffle=True)

  # 2. create neural network
  print("\nCreating 8-(10-10)-1 binary NN classifier \n")
  net = Net().to(device)

  # 3. train network
  net.train()  # set training mode
  lrn_rate = 0.01
  loss_func = T.nn.BCELoss()  # binary cross entropy
  optimizer = T.optim.SGD(net.parameters(),
    lr=lrn_rate)
  max_epochs = 500
  ep_log_interval = 100

  print("Loss function: " + str(loss_func))
  print("Optimizer: " + str(optimizer.__class__.__name__))
  print("Learn rate: " + "%0.3f" % lrn_rate)
  print("Batch size: " + str(bat_size))
  print("Max epochs: " + str(max_epochs))

  print("\nStarting training")
  for epoch in range(0, max_epochs):
    epoch_loss = 0.0            # for one full epoch
    for (batch_idx, batch) in enumerate(train_ldr):
      X = batch[0]             # [bs,4]  inputs
      Y = batch[1]             # [bs,1]  targets
      oupt = net(X)            # [bs,1]  computeds 

      loss_val = loss_func(oupt, Y)   # a tensor
      epoch_loss += loss_val.item()  # accumulate
      optimizer.zero_grad() # reset all gradients
      loss_val.backward()   # compute new gradients
      optimizer.step()      # update all weights

    if epoch % ep_log_interval == 0:
      print("epoch = %4d   loss = %8.4f" % \
        (epoch, epoch_loss))
  print("Done ")

# ----------------------------------------------------------

  # 4. evaluate model
  net.eval()
  metrics_train = metrics(net, train_ds, thresh=0.5)
  print("\nMetrics for train data: ")
  print("accuracy  = %0.4f " % metrics_train[0])
  print("precision = %0.4f " % metrics_train[1])
  print("recall    = %0.4f " % metrics_train[2])
  print("F1        = %0.4f " % metrics_train[3])

  metrics_test = metrics(net, test_ds, thresh=0.5)
  print("\nMetrics for test data: ")
  print("accuracy  = %0.4f " % metrics_test[0])
  print("precision = %0.4f " % metrics_test[1])
  print("recall    = %0.4f " % metrics_test[2])
  print("F1        = %0.4f " % metrics_test[3])

  # 5. save model
  print("\nSaving trained model state_dict ")
  # path = ".\\Models\\people_model.pt"
  # T.save(net.state_dict(), path)

  # 6. make a prediction 
  print("\nSetting age = 30  Oklahoma  $40,000  moderate")
  inpt = np.array([[0.30, 0,0,1, 0.40, 0,1,0]],
    dtype=np.float32)
  inpt = T.tensor(inpt, dtype=T.float32).to(device)

  net.eval()
  with T.no_grad():
    oupt = net(inpt)    # a Tensor
  pred_prob = oupt.item()  # scalar, [0.0, 1.0]
  print("Computed output: ", end="")
  print("%0.4f" % pred_prob)

  if pred_prob "lt" 0.5:
    print("Prediction = male")
  else:
    print("Prediction = female")

  print("\nEnd People binary demo ")

if __name__== "__main__":
  main()

Training data. Replace comma characters with tab characters and save as people_train.txt.

# people_train.txt
# sex (0 = male, 1 = female) - dependent variable
# age, state (michigan, nebraska, oklahoma), income,
# politics type (conservative, moderate, liberal)
#
1,0.24,1,0,0,0.2950,0,0,1
0,0.39,0,0,1,0.5120,0,1,0
1,0.63,0,1,0,0.7580,1,0,0
0,0.36,1,0,0,0.4450,0,1,0
1,0.27,0,1,0,0.2860,0,0,1
1,0.50,0,1,0,0.5650,0,1,0
1,0.50,0,0,1,0.5500,0,1,0
0,0.19,0,0,1,0.3270,1,0,0
1,0.22,0,1,0,0.2770,0,1,0
0,0.39,0,0,1,0.4710,0,0,1
1,0.34,1,0,0,0.3940,0,1,0
0,0.22,1,0,0,0.3350,1,0,0
1,0.35,0,0,1,0.3520,0,0,1
0,0.33,0,1,0,0.4640,0,1,0
1,0.45,0,1,0,0.5410,0,1,0
1,0.42,0,1,0,0.5070,0,1,0
0,0.33,0,1,0,0.4680,0,1,0
1,0.25,0,0,1,0.3000,0,1,0
0,0.31,0,1,0,0.4640,1,0,0
1,0.27,1,0,0,0.3250,0,0,1
1,0.48,1,0,0,0.5400,0,1,0
0,0.64,0,1,0,0.7130,0,0,1
1,0.61,0,1,0,0.7240,1,0,0
1,0.54,0,0,1,0.6100,1,0,0
1,0.29,1,0,0,0.3630,1,0,0
1,0.50,0,0,1,0.5500,0,1,0
1,0.55,0,0,1,0.6250,1,0,0
1,0.40,1,0,0,0.5240,1,0,0
1,0.22,1,0,0,0.2360,0,0,1
1,0.68,0,1,0,0.7840,1,0,0
0,0.60,1,0,0,0.7170,0,0,1
0,0.34,0,0,1,0.4650,0,1,0
0,0.25,0,0,1,0.3710,1,0,0
0,0.31,0,1,0,0.4890,0,1,0
1,0.43,0,0,1,0.4800,0,1,0
1,0.58,0,1,0,0.6540,0,0,1
0,0.55,0,1,0,0.6070,0,0,1
0,0.43,0,1,0,0.5110,0,1,0
0,0.43,0,0,1,0.5320,0,1,0
0,0.21,1,0,0,0.3720,1,0,0
1,0.55,0,0,1,0.6460,1,0,0
1,0.64,0,1,0,0.7480,1,0,0
0,0.41,1,0,0,0.5880,0,1,0
1,0.64,0,0,1,0.7270,1,0,0
0,0.56,0,0,1,0.6660,0,0,1
1,0.31,0,0,1,0.3600,0,1,0
0,0.65,0,0,1,0.7010,0,0,1
1,0.55,0,0,1,0.6430,1,0,0
0,0.25,1,0,0,0.4030,1,0,0
1,0.46,0,0,1,0.5100,0,1,0
0,0.36,1,0,0,0.5350,1,0,0
1,0.52,0,1,0,0.5810,0,1,0
1,0.61,0,0,1,0.6790,1,0,0
1,0.57,0,0,1,0.6570,1,0,0
0,0.46,0,1,0,0.5260,0,1,0
0,0.62,1,0,0,0.6680,0,0,1
1,0.55,0,0,1,0.6270,1,0,0
0,0.22,0,0,1,0.2770,0,1,0
0,0.50,1,0,0,0.6290,1,0,0
0,0.32,0,1,0,0.4180,0,1,0
0,0.21,0,0,1,0.3560,1,0,0
1,0.44,0,1,0,0.5200,0,1,0
1,0.46,0,1,0,0.5170,0,1,0
1,0.62,0,1,0,0.6970,1,0,0
1,0.57,0,1,0,0.6640,1,0,0
0,0.67,0,0,1,0.7580,0,0,1
1,0.29,1,0,0,0.3430,0,0,1
1,0.53,1,0,0,0.6010,1,0,0
0,0.44,1,0,0,0.5480,0,1,0
1,0.46,0,1,0,0.5230,0,1,0
0,0.20,0,1,0,0.3010,0,1,0
0,0.38,1,0,0,0.5350,0,1,0
1,0.50,0,1,0,0.5860,0,1,0
1,0.33,0,1,0,0.4250,0,1,0
0,0.33,0,1,0,0.3930,0,1,0
1,0.26,0,1,0,0.4040,1,0,0
1,0.58,1,0,0,0.7070,1,0,0
1,0.43,0,0,1,0.4800,0,1,0
0,0.46,1,0,0,0.6440,1,0,0
1,0.60,1,0,0,0.7170,1,0,0
0,0.42,1,0,0,0.4890,0,1,0
0,0.56,0,0,1,0.5640,0,0,1
0,0.62,0,1,0,0.6630,0,0,1
0,0.50,1,0,0,0.6480,0,1,0
1,0.47,0,0,1,0.5200,0,1,0
0,0.67,0,1,0,0.8040,0,0,1
0,0.40,0,0,1,0.5040,0,1,0
1,0.42,0,1,0,0.4840,0,1,0
1,0.64,1,0,0,0.7200,1,0,0
0,0.47,1,0,0,0.5870,0,0,1
1,0.45,0,1,0,0.5280,0,1,0
0,0.25,0,0,1,0.4090,1,0,0
1,0.38,1,0,0,0.4840,1,0,0
1,0.55,0,0,1,0.6000,0,1,0
0,0.44,1,0,0,0.6060,0,1,0
1,0.33,1,0,0,0.4100,0,1,0
1,0.34,0,0,1,0.3900,0,1,0
1,0.27,0,1,0,0.3370,0,0,1
1,0.32,0,1,0,0.4070,0,1,0
1,0.42,0,0,1,0.4700,0,1,0
0,0.24,0,0,1,0.4030,1,0,0
1,0.42,0,1,0,0.5030,0,1,0
1,0.25,0,0,1,0.2800,0,0,1
1,0.51,0,1,0,0.5800,0,1,0
0,0.55,0,1,0,0.6350,0,0,1
1,0.44,1,0,0,0.4780,0,0,1
0,0.18,1,0,0,0.3980,1,0,0
0,0.67,0,1,0,0.7160,0,0,1
1,0.45,0,0,1,0.5000,0,1,0
1,0.48,1,0,0,0.5580,0,1,0
0,0.25,0,1,0,0.3900,0,1,0
0,0.67,1,0,0,0.7830,0,1,0
1,0.37,0,0,1,0.4200,0,1,0
0,0.32,1,0,0,0.4270,0,1,0
1,0.48,1,0,0,0.5700,0,1,0
0,0.66,0,0,1,0.7500,0,0,1
1,0.61,1,0,0,0.7000,1,0,0
0,0.58,0,0,1,0.6890,0,1,0
1,0.19,1,0,0,0.2400,0,0,1
1,0.38,0,0,1,0.4300,0,1,0
0,0.27,1,0,0,0.3640,0,1,0
1,0.42,1,0,0,0.4800,0,1,0
1,0.60,1,0,0,0.7130,1,0,0
0,0.27,0,0,1,0.3480,1,0,0
1,0.29,0,1,0,0.3710,1,0,0
0,0.43,1,0,0,0.5670,0,1,0
1,0.48,1,0,0,0.5670,0,1,0
1,0.27,0,0,1,0.2940,0,0,1
0,0.44,1,0,0,0.5520,1,0,0
1,0.23,0,1,0,0.2630,0,0,1
0,0.36,0,1,0,0.5300,0,0,1
1,0.64,0,0,1,0.7250,1,0,0
1,0.29,0,0,1,0.3000,0,0,1
0,0.33,1,0,0,0.4930,0,1,0
0,0.66,0,1,0,0.7500,0,0,1
0,0.21,0,0,1,0.3430,1,0,0
1,0.27,1,0,0,0.3270,0,0,1
1,0.29,1,0,0,0.3180,0,0,1
0,0.31,1,0,0,0.4860,0,1,0
1,0.36,0,0,1,0.4100,0,1,0
1,0.49,0,1,0,0.5570,0,1,0
0,0.28,1,0,0,0.3840,1,0,0
0,0.43,0,0,1,0.5660,0,1,0
0,0.46,0,1,0,0.5880,0,1,0
1,0.57,1,0,0,0.6980,1,0,0
0,0.52,0,0,1,0.5940,0,1,0
0,0.31,0,0,1,0.4350,0,1,0
0,0.55,1,0,0,0.6200,0,0,1
1,0.50,1,0,0,0.5640,0,1,0
1,0.48,0,1,0,0.5590,0,1,0
0,0.22,0,0,1,0.3450,1,0,0
1,0.59,0,0,1,0.6670,1,0,0
1,0.34,1,0,0,0.4280,0,0,1
0,0.64,1,0,0,0.7720,0,0,1
1,0.29,0,0,1,0.3350,0,0,1
0,0.34,0,1,0,0.4320,0,1,0
0,0.61,1,0,0,0.7500,0,0,1
1,0.64,0,0,1,0.7110,1,0,0
0,0.29,1,0,0,0.4130,1,0,0
1,0.63,0,1,0,0.7060,1,0,0
0,0.29,0,1,0,0.4000,1,0,0
0,0.51,1,0,0,0.6270,0,1,0
0,0.24,0,0,1,0.3770,1,0,0
1,0.48,0,1,0,0.5750,0,1,0
1,0.18,1,0,0,0.2740,1,0,0
1,0.18,1,0,0,0.2030,0,0,1
1,0.33,0,1,0,0.3820,0,0,1
0,0.20,0,0,1,0.3480,1,0,0
1,0.29,0,0,1,0.3300,0,0,1
0,0.44,0,0,1,0.6300,1,0,0
0,0.65,0,0,1,0.8180,1,0,0
0,0.56,1,0,0,0.6370,0,0,1
0,0.52,0,0,1,0.5840,0,1,0
0,0.29,0,1,0,0.4860,1,0,0
0,0.47,0,1,0,0.5890,0,1,0
1,0.68,1,0,0,0.7260,0,0,1
1,0.31,0,0,1,0.3600,0,1,0
1,0.61,0,1,0,0.6250,0,0,1
1,0.19,0,1,0,0.2150,0,0,1
1,0.38,0,0,1,0.4300,0,1,0
0,0.26,1,0,0,0.4230,1,0,0
1,0.61,0,1,0,0.6740,1,0,0
1,0.40,1,0,0,0.4650,0,1,0
0,0.49,1,0,0,0.6520,0,1,0
1,0.56,1,0,0,0.6750,1,0,0
0,0.48,0,1,0,0.6600,0,1,0
1,0.52,1,0,0,0.5630,0,0,1
0,0.18,1,0,0,0.2980,1,0,0
0,0.56,0,0,1,0.5930,0,0,1
0,0.52,0,1,0,0.6440,0,1,0
0,0.18,0,1,0,0.2860,0,1,0
0,0.58,1,0,0,0.6620,0,0,1
0,0.39,0,1,0,0.5510,0,1,0
0,0.46,1,0,0,0.6290,0,1,0
0,0.40,0,1,0,0.4620,0,1,0
0,0.60,1,0,0,0.7270,0,0,1
1,0.36,0,1,0,0.4070,0,0,1
1,0.44,1,0,0,0.5230,0,1,0
1,0.28,1,0,0,0.3130,0,0,1
1,0.54,0,0,1,0.6260,1,0,0

Test data. Replace comma characters with tab characters and save as people_test.txt.

0,0.51,1,0,0,0.6120,0,1,0
0,0.32,0,1,0,0.4610,0,1,0
1,0.55,1,0,0,0.6270,1,0,0
1,0.25,0,0,1,0.2620,0,0,1
1,0.33,0,0,1,0.3730,0,0,1
0,0.29,0,1,0,0.4620,1,0,0
1,0.65,1,0,0,0.7270,1,0,0
0,0.43,0,1,0,0.5140,0,1,0
0,0.54,0,1,0,0.6480,0,0,1
1,0.61,0,1,0,0.7270,1,0,0
1,0.52,0,1,0,0.6360,1,0,0
1,0.30,0,1,0,0.3350,0,0,1
1,0.29,1,0,0,0.3140,0,0,1
0,0.47,0,0,1,0.5940,0,1,0
1,0.39,0,1,0,0.4780,0,1,0
1,0.47,0,0,1,0.5200,0,1,0
0,0.49,1,0,0,0.5860,0,1,0
0,0.63,0,0,1,0.6740,0,0,1
0,0.30,1,0,0,0.3920,1,0,0
0,0.61,0,0,1,0.6960,0,0,1
0,0.47,0,0,1,0.5870,0,1,0
1,0.30,0,0,1,0.3450,0,0,1
0,0.51,0,0,1,0.5800,0,1,0
0,0.24,1,0,0,0.3880,0,1,0
0,0.49,1,0,0,0.6450,0,1,0
1,0.66,0,0,1,0.7450,1,0,0
0,0.65,1,0,0,0.7690,1,0,0
0,0.46,0,1,0,0.5800,1,0,0
0,0.45,0,0,1,0.5180,0,1,0
0,0.47,1,0,0,0.6360,1,0,0
0,0.29,1,0,0,0.4480,1,0,0
0,0.57,0,0,1,0.6930,0,0,1
0,0.20,1,0,0,0.2870,0,0,1
0,0.35,1,0,0,0.4340,0,1,0
0,0.61,0,0,1,0.6700,0,0,1
0,0.31,0,0,1,0.3730,0,1,0
1,0.18,1,0,0,0.2080,0,0,1
1,0.26,0,0,1,0.2920,0,0,1
0,0.28,1,0,0,0.3640,0,0,1
0,0.59,0,0,1,0.6940,0,0,1