There are frequent updates to the PyTorch neural network library, and I’m continuously learning new techniques and best practices. I figured it was time to update one of my standard binary classification demos for the current PyTorch version 1.12.1.
I currently use Python 3.7.6 from the Anaconda 2020.02 distribution on a Windows 10/11 machine. I located the appropriate PyTorch .whl file at https://download.pytorch.org/whl/torch_stable.html — torch-1.12.1+cpu-cp37-cp37m-win_amd64.whl. Even though I have installed PyTorch hundreds of times, I have grabbed the wrong .whl file more than once.
I opened a Windows command shell with admin privileges. I uninstalled my old PyTorch 1.10.0 using the command “pip uninstall torch”. Then I navigated to the directory holding the new .whl file and installed it with the command “pip install torch-1.12-etc-.whl”. There were no problems.
I used one of my standard datasets for binary classification. The data looks like:
1 0.24 1 0 0 0.2950 0 0 1 0 0.39 0 0 1 0.5120 0 1 0 1 0.63 0 1 0 0.7580 1 0 0 0 0.36 1 0 0 0.4450 0 1 0 . . .
Each line of data represents a person. The fields are sex (male = 0, female = 1), age (normalized by dividing by 100), state (michigan = 100, nebraska = 010, oklahoma = 001), annual income (divided by 100,000), and politics type (conservative = 100, moderate = 010, liberal = 001). The goal is to predict the gender of a person from their age, state, income, and politics type.
My demo network used a 8-(10-10)-1 architecture with tanh() hidden activation and sigmoid() activation on the output node. I used explicit weight and bias initialization:
class Net(T.nn.Module): def __init__(self): super(Net, self).__init__() self.hid1 = T.nn.Linear(8, 10) # 8-(10-10)-1 self.hid2 = T.nn.Linear(10, 10) self.oupt = T.nn.Linear(10, 1) T.nn.init.xavier_uniform_(self.hid1.weight) T.nn.init.zeros_(self.hid1.bias) T.nn.init.xavier_uniform_(self.hid2.weight) T.nn.init.zeros_(self.hid2.bias) T.nn.init.xavier_uniform_(self.oupt.weight) T.nn.init.zeros_(self.oupt.bias) def forward(self, x): z = T.tanh(self.hid1(x)) z = T.tanh(self.hid2(z)) z = T.sigmoid(self.oupt(z)) # for BCELoss() return z
For training, I used a batch size of 10, SGD optimization with a fixed learning rate of 0.01, and BCELoss().
For binary classification problems, a simple model accuracy metric really isn’t enough. For example, if a dataset has items that are 95% of one class, then a model that predicts the majority class every time will get 95% accuracy. Therefore, I implemented a program-defined metrics() function to compute accuracy, precision, recall and F1 score.
I didn’t run into any serious problems. PyTorch is slowly but surely stabilizing. Most of the version changes are related to advanced architectures such as Transformers rather than standard architectures.
Good fun!
There are quite a few research studies that show people can correctly identify a person’s gender just by seeing their face for a fraction of a second. In science fiction movies, most aliens are assumed to be male. Here are three female aliens who aren’t obviously female. Left: Sil from “Species” (1995) was played by actress Natasha Henstridge. She was not a nice alien. Center: The Martian mastermind from “Invaders from Mars” (1953) was played by actress Luce Potter. She was not a nice alien. Right: An alien from the planet Kas-onar in “Valerian and the City of a Thousand Planets” (2017). A good alien.
Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols.
# people_gender.py # binary classification # PyTorch 1.12.1-CPU Anaconda3-2020.02 Python 3.7.6 # Windows 10/11 import numpy as np import torch as T device = T.device('cpu') # apply to Tensor or Module class PeopleDataset(T.utils.data.Dataset): # sex age state income politics # 0 0.27 0 1 0 0.7610 0 0 1 # 1 0.19 0 0 1 0.6550 1 0 0 # sex: 0 = male, 1 = female # state: michigan, nebraska, oklahoma # politics: conservative, moderate, liberal def __init__(self, src_file): all_data = np.loadtxt(src_file, usecols=range(0,9), delimiter="\t", comments="#", dtype=np.float32) self.x_data = T.tensor(all_data[:,1:9], dtype=T.float32).to(device) self.y_data = T.tensor(all_data[:,0], dtype=T.float32).to(device) # float32 required self.y_data = self.y_data.reshape(-1,1) # 2-D required def __len__(self): return len(self.x_data) def __getitem__(self, idx): feats = self.x_data[idx,:] # idx row, all 8 cols sex = self.y_data[idx,:] # idx row, the only col return feats, sex # as a Tuple # --------------------------------------------------------- def metrics(model, ds, thresh=0.5): # note: N = total number of items = TP + FP + TN + FN # accuracy = (TP + TN) / N # precision = TP / (TP + FP) # recall = TP / (TP + FN) # F1 = 2 / [(1 / precision) + (1 / recall)] tp = 0; tn = 0; fp = 0; fn = 0 for i in range(len(ds)): inpts = ds[i][0] # dictionary style target = ds[i][1] # float32 [0.0] or [1.0] target = target.type(T.int64) # make it an int with T.no_grad(): p = model(inpts) # between 0.0 and 1.0 # should really avoid 'target == 1.0' if target == 1 and p "gte" thresh: # TP tp += 1 elif target == 1 and p "lt" thresh: # FP fn += 1 elif target == 0 and p "lt" thresh: # TN tn += 1 elif target == 0 and p "gte" thresh: # FN fp += 1 N = tp + fp + tn + fn if N != len(ds): print("FATAL LOGIC ERROR in metrics()") accuracy = (tp + tn) / (N * 1.0) precision = (1.0 * tp) / (tp + fp) recall = (1.0 * tp) / (tp + fn) f1 = 2.0 / ((1.0 / precision) + (1.0 / recall)) return (accuracy, precision, recall, f1) # as a Tuple # --------------------------------------------------------- class Net(T.nn.Module): def __init__(self): super(Net, self).__init__() self.hid1 = T.nn.Linear(8, 10) # 8-(10-10)-1 self.hid2 = T.nn.Linear(10, 10) self.oupt = T.nn.Linear(10, 1) T.nn.init.xavier_uniform_(self.hid1.weight) T.nn.init.zeros_(self.hid1.bias) T.nn.init.xavier_uniform_(self.hid2.weight) T.nn.init.zeros_(self.hid2.bias) T.nn.init.xavier_uniform_(self.oupt.weight) T.nn.init.zeros_(self.oupt.bias) def forward(self, x): z = T.tanh(self.hid1(x)) z = T.tanh(self.hid2(z)) z = T.sigmoid(self.oupt(z)) # for BCELoss() return z # ---------------------------------------------------------- def main(): # 0. get started print("\nPeople gender using PyTorch ") T.manual_seed(1) np.random.seed(1) # 1. create Dataset and DataLoader objects print("\nCreating People train and test Datasets ") train_file = ".\\Data\\people_train.txt" test_file = ".\\Data\\people_test.txt" train_ds = PeopleDataset(train_file) # 200 rows test_ds = PeopleDataset(test_file) # 40 rows bat_size = 10 train_ldr = T.utils.data.DataLoader(train_ds, batch_size=bat_size, shuffle=True) # 2. create neural network print("\nCreating 8-(10-10)-1 binary NN classifier \n") net = Net().to(device) # 3. train network net.train() # set training mode lrn_rate = 0.01 loss_func = T.nn.BCELoss() # binary cross entropy optimizer = T.optim.SGD(net.parameters(), lr=lrn_rate) max_epochs = 500 ep_log_interval = 100 print("Loss function: " + str(loss_func)) print("Optimizer: " + str(optimizer.__class__.__name__)) print("Learn rate: " + "%0.3f" % lrn_rate) print("Batch size: " + str(bat_size)) print("Max epochs: " + str(max_epochs)) print("\nStarting training") for epoch in range(0, max_epochs): epoch_loss = 0.0 # for one full epoch for (batch_idx, batch) in enumerate(train_ldr): X = batch[0] # [bs,4] inputs Y = batch[1] # [bs,1] targets oupt = net(X) # [bs,1] computeds loss_val = loss_func(oupt, Y) # a tensor epoch_loss += loss_val.item() # accumulate optimizer.zero_grad() # reset all gradients loss_val.backward() # compute new gradients optimizer.step() # update all weights if epoch % ep_log_interval == 0: print("epoch = %4d loss = %8.4f" % \ (epoch, epoch_loss)) print("Done ") # ---------------------------------------------------------- # 4. evaluate model net.eval() metrics_train = metrics(net, train_ds, thresh=0.5) print("\nMetrics for train data: ") print("accuracy = %0.4f " % metrics_train[0]) print("precision = %0.4f " % metrics_train[1]) print("recall = %0.4f " % metrics_train[2]) print("F1 = %0.4f " % metrics_train[3]) metrics_test = metrics(net, test_ds, thresh=0.5) print("\nMetrics for test data: ") print("accuracy = %0.4f " % metrics_test[0]) print("precision = %0.4f " % metrics_test[1]) print("recall = %0.4f " % metrics_test[2]) print("F1 = %0.4f " % metrics_test[3]) # 5. save model print("\nSaving trained model state_dict ") # path = ".\\Models\\people_model.pt" # T.save(net.state_dict(), path) # 6. make a prediction print("\nSetting age = 30 Oklahoma $40,000 moderate") inpt = np.array([[0.30, 0,0,1, 0.40, 0,1,0]], dtype=np.float32) inpt = T.tensor(inpt, dtype=T.float32).to(device) net.eval() with T.no_grad(): oupt = net(inpt) # a Tensor pred_prob = oupt.item() # scalar, [0.0, 1.0] print("Computed output: ", end="") print("%0.4f" % pred_prob) if pred_prob "lt" 0.5: print("Prediction = male") else: print("Prediction = female") print("\nEnd People binary demo ") if __name__== "__main__": main()
Training data. Replace comma characters with tab characters and save as people_train.txt.
# people_train.txt # sex (0 = male, 1 = female) - dependent variable # age, state (michigan, nebraska, oklahoma), income, # politics type (conservative, moderate, liberal) # 1,0.24,1,0,0,0.2950,0,0,1 0,0.39,0,0,1,0.5120,0,1,0 1,0.63,0,1,0,0.7580,1,0,0 0,0.36,1,0,0,0.4450,0,1,0 1,0.27,0,1,0,0.2860,0,0,1 1,0.50,0,1,0,0.5650,0,1,0 1,0.50,0,0,1,0.5500,0,1,0 0,0.19,0,0,1,0.3270,1,0,0 1,0.22,0,1,0,0.2770,0,1,0 0,0.39,0,0,1,0.4710,0,0,1 1,0.34,1,0,0,0.3940,0,1,0 0,0.22,1,0,0,0.3350,1,0,0 1,0.35,0,0,1,0.3520,0,0,1 0,0.33,0,1,0,0.4640,0,1,0 1,0.45,0,1,0,0.5410,0,1,0 1,0.42,0,1,0,0.5070,0,1,0 0,0.33,0,1,0,0.4680,0,1,0 1,0.25,0,0,1,0.3000,0,1,0 0,0.31,0,1,0,0.4640,1,0,0 1,0.27,1,0,0,0.3250,0,0,1 1,0.48,1,0,0,0.5400,0,1,0 0,0.64,0,1,0,0.7130,0,0,1 1,0.61,0,1,0,0.7240,1,0,0 1,0.54,0,0,1,0.6100,1,0,0 1,0.29,1,0,0,0.3630,1,0,0 1,0.50,0,0,1,0.5500,0,1,0 1,0.55,0,0,1,0.6250,1,0,0 1,0.40,1,0,0,0.5240,1,0,0 1,0.22,1,0,0,0.2360,0,0,1 1,0.68,0,1,0,0.7840,1,0,0 0,0.60,1,0,0,0.7170,0,0,1 0,0.34,0,0,1,0.4650,0,1,0 0,0.25,0,0,1,0.3710,1,0,0 0,0.31,0,1,0,0.4890,0,1,0 1,0.43,0,0,1,0.4800,0,1,0 1,0.58,0,1,0,0.6540,0,0,1 0,0.55,0,1,0,0.6070,0,0,1 0,0.43,0,1,0,0.5110,0,1,0 0,0.43,0,0,1,0.5320,0,1,0 0,0.21,1,0,0,0.3720,1,0,0 1,0.55,0,0,1,0.6460,1,0,0 1,0.64,0,1,0,0.7480,1,0,0 0,0.41,1,0,0,0.5880,0,1,0 1,0.64,0,0,1,0.7270,1,0,0 0,0.56,0,0,1,0.6660,0,0,1 1,0.31,0,0,1,0.3600,0,1,0 0,0.65,0,0,1,0.7010,0,0,1 1,0.55,0,0,1,0.6430,1,0,0 0,0.25,1,0,0,0.4030,1,0,0 1,0.46,0,0,1,0.5100,0,1,0 0,0.36,1,0,0,0.5350,1,0,0 1,0.52,0,1,0,0.5810,0,1,0 1,0.61,0,0,1,0.6790,1,0,0 1,0.57,0,0,1,0.6570,1,0,0 0,0.46,0,1,0,0.5260,0,1,0 0,0.62,1,0,0,0.6680,0,0,1 1,0.55,0,0,1,0.6270,1,0,0 0,0.22,0,0,1,0.2770,0,1,0 0,0.50,1,0,0,0.6290,1,0,0 0,0.32,0,1,0,0.4180,0,1,0 0,0.21,0,0,1,0.3560,1,0,0 1,0.44,0,1,0,0.5200,0,1,0 1,0.46,0,1,0,0.5170,0,1,0 1,0.62,0,1,0,0.6970,1,0,0 1,0.57,0,1,0,0.6640,1,0,0 0,0.67,0,0,1,0.7580,0,0,1 1,0.29,1,0,0,0.3430,0,0,1 1,0.53,1,0,0,0.6010,1,0,0 0,0.44,1,0,0,0.5480,0,1,0 1,0.46,0,1,0,0.5230,0,1,0 0,0.20,0,1,0,0.3010,0,1,0 0,0.38,1,0,0,0.5350,0,1,0 1,0.50,0,1,0,0.5860,0,1,0 1,0.33,0,1,0,0.4250,0,1,0 0,0.33,0,1,0,0.3930,0,1,0 1,0.26,0,1,0,0.4040,1,0,0 1,0.58,1,0,0,0.7070,1,0,0 1,0.43,0,0,1,0.4800,0,1,0 0,0.46,1,0,0,0.6440,1,0,0 1,0.60,1,0,0,0.7170,1,0,0 0,0.42,1,0,0,0.4890,0,1,0 0,0.56,0,0,1,0.5640,0,0,1 0,0.62,0,1,0,0.6630,0,0,1 0,0.50,1,0,0,0.6480,0,1,0 1,0.47,0,0,1,0.5200,0,1,0 0,0.67,0,1,0,0.8040,0,0,1 0,0.40,0,0,1,0.5040,0,1,0 1,0.42,0,1,0,0.4840,0,1,0 1,0.64,1,0,0,0.7200,1,0,0 0,0.47,1,0,0,0.5870,0,0,1 1,0.45,0,1,0,0.5280,0,1,0 0,0.25,0,0,1,0.4090,1,0,0 1,0.38,1,0,0,0.4840,1,0,0 1,0.55,0,0,1,0.6000,0,1,0 0,0.44,1,0,0,0.6060,0,1,0 1,0.33,1,0,0,0.4100,0,1,0 1,0.34,0,0,1,0.3900,0,1,0 1,0.27,0,1,0,0.3370,0,0,1 1,0.32,0,1,0,0.4070,0,1,0 1,0.42,0,0,1,0.4700,0,1,0 0,0.24,0,0,1,0.4030,1,0,0 1,0.42,0,1,0,0.5030,0,1,0 1,0.25,0,0,1,0.2800,0,0,1 1,0.51,0,1,0,0.5800,0,1,0 0,0.55,0,1,0,0.6350,0,0,1 1,0.44,1,0,0,0.4780,0,0,1 0,0.18,1,0,0,0.3980,1,0,0 0,0.67,0,1,0,0.7160,0,0,1 1,0.45,0,0,1,0.5000,0,1,0 1,0.48,1,0,0,0.5580,0,1,0 0,0.25,0,1,0,0.3900,0,1,0 0,0.67,1,0,0,0.7830,0,1,0 1,0.37,0,0,1,0.4200,0,1,0 0,0.32,1,0,0,0.4270,0,1,0 1,0.48,1,0,0,0.5700,0,1,0 0,0.66,0,0,1,0.7500,0,0,1 1,0.61,1,0,0,0.7000,1,0,0 0,0.58,0,0,1,0.6890,0,1,0 1,0.19,1,0,0,0.2400,0,0,1 1,0.38,0,0,1,0.4300,0,1,0 0,0.27,1,0,0,0.3640,0,1,0 1,0.42,1,0,0,0.4800,0,1,0 1,0.60,1,0,0,0.7130,1,0,0 0,0.27,0,0,1,0.3480,1,0,0 1,0.29,0,1,0,0.3710,1,0,0 0,0.43,1,0,0,0.5670,0,1,0 1,0.48,1,0,0,0.5670,0,1,0 1,0.27,0,0,1,0.2940,0,0,1 0,0.44,1,0,0,0.5520,1,0,0 1,0.23,0,1,0,0.2630,0,0,1 0,0.36,0,1,0,0.5300,0,0,1 1,0.64,0,0,1,0.7250,1,0,0 1,0.29,0,0,1,0.3000,0,0,1 0,0.33,1,0,0,0.4930,0,1,0 0,0.66,0,1,0,0.7500,0,0,1 0,0.21,0,0,1,0.3430,1,0,0 1,0.27,1,0,0,0.3270,0,0,1 1,0.29,1,0,0,0.3180,0,0,1 0,0.31,1,0,0,0.4860,0,1,0 1,0.36,0,0,1,0.4100,0,1,0 1,0.49,0,1,0,0.5570,0,1,0 0,0.28,1,0,0,0.3840,1,0,0 0,0.43,0,0,1,0.5660,0,1,0 0,0.46,0,1,0,0.5880,0,1,0 1,0.57,1,0,0,0.6980,1,0,0 0,0.52,0,0,1,0.5940,0,1,0 0,0.31,0,0,1,0.4350,0,1,0 0,0.55,1,0,0,0.6200,0,0,1 1,0.50,1,0,0,0.5640,0,1,0 1,0.48,0,1,0,0.5590,0,1,0 0,0.22,0,0,1,0.3450,1,0,0 1,0.59,0,0,1,0.6670,1,0,0 1,0.34,1,0,0,0.4280,0,0,1 0,0.64,1,0,0,0.7720,0,0,1 1,0.29,0,0,1,0.3350,0,0,1 0,0.34,0,1,0,0.4320,0,1,0 0,0.61,1,0,0,0.7500,0,0,1 1,0.64,0,0,1,0.7110,1,0,0 0,0.29,1,0,0,0.4130,1,0,0 1,0.63,0,1,0,0.7060,1,0,0 0,0.29,0,1,0,0.4000,1,0,0 0,0.51,1,0,0,0.6270,0,1,0 0,0.24,0,0,1,0.3770,1,0,0 1,0.48,0,1,0,0.5750,0,1,0 1,0.18,1,0,0,0.2740,1,0,0 1,0.18,1,0,0,0.2030,0,0,1 1,0.33,0,1,0,0.3820,0,0,1 0,0.20,0,0,1,0.3480,1,0,0 1,0.29,0,0,1,0.3300,0,0,1 0,0.44,0,0,1,0.6300,1,0,0 0,0.65,0,0,1,0.8180,1,0,0 0,0.56,1,0,0,0.6370,0,0,1 0,0.52,0,0,1,0.5840,0,1,0 0,0.29,0,1,0,0.4860,1,0,0 0,0.47,0,1,0,0.5890,0,1,0 1,0.68,1,0,0,0.7260,0,0,1 1,0.31,0,0,1,0.3600,0,1,0 1,0.61,0,1,0,0.6250,0,0,1 1,0.19,0,1,0,0.2150,0,0,1 1,0.38,0,0,1,0.4300,0,1,0 0,0.26,1,0,0,0.4230,1,0,0 1,0.61,0,1,0,0.6740,1,0,0 1,0.40,1,0,0,0.4650,0,1,0 0,0.49,1,0,0,0.6520,0,1,0 1,0.56,1,0,0,0.6750,1,0,0 0,0.48,0,1,0,0.6600,0,1,0 1,0.52,1,0,0,0.5630,0,0,1 0,0.18,1,0,0,0.2980,1,0,0 0,0.56,0,0,1,0.5930,0,0,1 0,0.52,0,1,0,0.6440,0,1,0 0,0.18,0,1,0,0.2860,0,1,0 0,0.58,1,0,0,0.6620,0,0,1 0,0.39,0,1,0,0.5510,0,1,0 0,0.46,1,0,0,0.6290,0,1,0 0,0.40,0,1,0,0.4620,0,1,0 0,0.60,1,0,0,0.7270,0,0,1 1,0.36,0,1,0,0.4070,0,0,1 1,0.44,1,0,0,0.5230,0,1,0 1,0.28,1,0,0,0.3130,0,0,1 1,0.54,0,0,1,0.6260,1,0,0
Test data. Replace comma characters with tab characters and save as people_test.txt.
0,0.51,1,0,0,0.6120,0,1,0 0,0.32,0,1,0,0.4610,0,1,0 1,0.55,1,0,0,0.6270,1,0,0 1,0.25,0,0,1,0.2620,0,0,1 1,0.33,0,0,1,0.3730,0,0,1 0,0.29,0,1,0,0.4620,1,0,0 1,0.65,1,0,0,0.7270,1,0,0 0,0.43,0,1,0,0.5140,0,1,0 0,0.54,0,1,0,0.6480,0,0,1 1,0.61,0,1,0,0.7270,1,0,0 1,0.52,0,1,0,0.6360,1,0,0 1,0.30,0,1,0,0.3350,0,0,1 1,0.29,1,0,0,0.3140,0,0,1 0,0.47,0,0,1,0.5940,0,1,0 1,0.39,0,1,0,0.4780,0,1,0 1,0.47,0,0,1,0.5200,0,1,0 0,0.49,1,0,0,0.5860,0,1,0 0,0.63,0,0,1,0.6740,0,0,1 0,0.30,1,0,0,0.3920,1,0,0 0,0.61,0,0,1,0.6960,0,0,1 0,0.47,0,0,1,0.5870,0,1,0 1,0.30,0,0,1,0.3450,0,0,1 0,0.51,0,0,1,0.5800,0,1,0 0,0.24,1,0,0,0.3880,0,1,0 0,0.49,1,0,0,0.6450,0,1,0 1,0.66,0,0,1,0.7450,1,0,0 0,0.65,1,0,0,0.7690,1,0,0 0,0.46,0,1,0,0.5800,1,0,0 0,0.45,0,0,1,0.5180,0,1,0 0,0.47,1,0,0,0.6360,1,0,0 0,0.29,1,0,0,0.4480,1,0,0 0,0.57,0,0,1,0.6930,0,0,1 0,0.20,1,0,0,0.2870,0,0,1 0,0.35,1,0,0,0.4340,0,1,0 0,0.61,0,0,1,0.6700,0,0,1 0,0.31,0,0,1,0.3730,0,1,0 1,0.18,1,0,0,0.2080,0,0,1 1,0.26,0,0,1,0.2920,0,0,1 0,0.28,1,0,0,0.3640,0,0,1 0,0.59,0,0,1,0.6940,0,0,1
Pingback: New Best Practices — Visual Studio Magazine – Open Source Biology & Genetics Interest Group
Pingback: Binary Classification Using PyTorch, Part 1: New Best Practices - Visual Studio Magazine - VirtualBits.com
Pingback: Logistic Regression Using Raw Python | James D. McCaffrey