I’ve been exploring the idea of training a PyTorch neural network using an evolutionary algorithm. The basic idea is to create a population of solutions (here, a set of neural weights and biases) and then repeatedly combine two solutions to create a better child solution.
Conceptually, the ideas are quite subtle. The two main points of using PyTorch are 1.) use GPU tensors for speed, 2.) use built-in gradient computation for back-propagation training. But an evolutionary algorithm doesn’t use gradients. However, it still kind of makes sense to use PyTorch for an evolutionary approach because I can leverage the enormous PyTorch infrastructure which includes DataLoader objects, and so on.
For my demo, I created a 6-(10-10)-3 neural network where I imagine the goal is to predict Employee job-type (one of three) from sex, age, city (one of three) , and income.
import torch as T class Net(T.nn.Module): def __init__(self): super(Net, self).__init__() self.hid1 = T.nn.Linear(6, 10) # 6-(10-10)-3 self.hid2 = T.nn.Linear(10, 10) self.oupt = T.nn.Linear(10, 3) def forward(self, x): z = T.tanh(self.hid1(x)) z = T.tanh(self.hid2(z)) z = T.softmax(self.oupt(z), dim=1) # note return z
The hid1 layer weights are size [10,6] and the biases are [10]. Similarly, hid2 weights and biases are size [10,10] and [10], and the oupt weights and biases are size [3,10] and [3]. Therefore, there are a total of 60 + 10 + 100 + 10 + 30 + 3 = 213 weights and biases.
I wrote a function that accepts a tensor of 213 values and then distributes them to the network weights and biases:
def load_weights(net, wts): if len(wts) != (10*6) + 10 + (10*10) + 10 + (3*10) + 3: print("FATAL: incorrect number wts in load_weights() ") net.hid1.weight.data = wts[0:60].reshape((10,6)) net.hid1.bias.data = wts[60:70] net.hid2.weight.data = wts[70:170].reshape((10,10)) net.hid2.bias.data = wts[170:180] net.oupt.weight.data = wts[180:210].reshape((3,10)) net.oupt.bias.data = wts[210:213]
Important: The values are being copied into the weights and biases by reference rather than by value, so this technique (probably — I’m not sure) can’t be used for a standard PyTorch scenario where gradients are used during training.
I created a short program to create a network and load weights from 0.000 to 0.213. The demo seemed to work. Now I have the infrastructure I need to explore training the network using an evolutionary algorithm.
The demo program is a multi-class classification problem. In general, regression problems, where the goal is to predict a single numeric value, such as an SAT (Scholastic Aptitude Test) college readiness test score, tend to be more difficult than classification problems. I’m hopeful that evolutionary algorithms can improve accuracy on regression problems.
SAT math scores are quite stable over time. The ability gap between groups has been consistent and large. Predicting SAT scores is not difficult. Note: the big jump in scores in 2017 was due to a change in the test, not a jump in ability.
Demo code. The data can be found at https://jamesmccaffrey.wordpress.com/2022/04/29/predicting-employee-job-type-using-pytorch-1-10-on-windows-11/.
# employee_job_load_wts.py # predict job type from sex, age, city, income # PyTorch 1.12.1-CPU Anaconda3-2020.02 Python 3.7.6 # Windows 10/11 # load weights (after initialization) import numpy as np import torch as T device = T.device('cpu') # apply to Tensor or Module # ----------------------------------------------------------- class EmployeeDataset(T.utils.data.Dataset): # sex age city income job-type # -1 0.27 0 1 0 0.7610 2 # +1 0.19 0 0 1 0.6550 0 # sex: -1 = male, +1 = female # city: anaheim, boulder, concord # job type: mgmt, supp, tech def __init__(self, src_file): all_xy = np.loadtxt(src_file, usecols=range(0,7), delimiter="\t", comments="#", dtype=np.float32) tmp_x = all_xy[:,0:6] # cols [0,6) = [0,5] tmp_y = all_xy[:,6] # 1-D self.x_data = T.tensor(tmp_x, dtype=T.float32).to(device) self.y_data = T.tensor(tmp_y, dtype=T.int64).to(device) # 1-D def __len__(self): return len(self.x_data) def __getitem__(self, idx): preds = self.x_data[idx] trgts = self.y_data[idx] return (preds, trgts) # as a Tuple # ----------------------------------------------------------- class Net(T.nn.Module): def __init__(self): super(Net, self).__init__() self.hid1 = T.nn.Linear(6, 10) # 6-(10-10)-3 self.hid2 = T.nn.Linear(10, 10) self.oupt = T.nn.Linear(10, 3) def forward(self, x): z = T.tanh(self.hid1(x)) z = T.tanh(self.hid2(z)) z = T.softmax(self.oupt(z), dim=1) # note return z # ----------------------------------------------------------- def accuracy_quick(model, dataset): # assumes model.eval() X = dataset[0:len(dataset)][0] Y = T.flatten(dataset[0:len(dataset)][1]) with T.no_grad(): oupt = model(X) # (_, arg_maxs) = T.max(oupt, dim=1) arg_maxs = T.argmax(oupt, dim=1) # argmax() is new num_correct = T.sum(Y==arg_maxs) acc = (num_correct * 1.0 / len(dataset)) return acc.item() # ----------------------------------------------------------- def load_weights(net, wts): if len(wts) != (10*6) + 10 + (10*10) + 10 + (3*10) + 3: print("FATAL: incorrect number wts in load_weights() ") net.hid1.weight.data = wts[0:60].reshape((10,6)) net.hid1.bias.data = wts[60:70] net.hid2.weight.data = wts[70:170].reshape((10,10)) net.hid2.bias.data = wts[170:180] net.oupt.weight.data = wts[180:210].reshape((3,10)) net.oupt.bias.data = wts[210:213] # ----------------------------------------------------------- def main(): # 0. get started print("\nBegin PyTorch load weights demo ") T.manual_seed(1) np.random.seed(1) # 1. create DataLoader objects print("\nCreating Employee Datasets ") train_file = ".\\Data\\employee_train.txt" train_ds = EmployeeDataset(train_file) # 200 rows test_file = ".\\Data\\employee_test.txt" test_ds = EmployeeDataset(test_file) # 40 rows # ----------------------------------------------------------- # 2. create network print("\nCreating 6-(10-10)-3 NN default init ") net = Net().to(device) net.eval() X = np.array([[-1, 0.30, 0,0,1, 0.5000]], dtype=np.float32) X = T.tensor(X, dtype=T.float32).to(device) print("\nInput = ") print(X) with T.no_grad(): z = net(X) print("\nOutput = ") print(z) # 3. load weights and biases wts = T.arange(start=0, end=213, step=1, dtype=T.float32).to(device) wts /= 100.0 print("\nSetting 213 weight/bias values: ") print(wts[0:6], end=""); print(" . . . ") print("\nLoading weights/biases into net ") load_weights(net, wts) print("\nInput = ") print(X) with T.no_grad(): z = net(X) print("\nOutput = ") print(z) # ----------------------------------------------------------- # 4. evaluate model accuracy print("\nComputing model accuracy") acc_train = accuracy_quick(net, train_ds) print("Accuracy on train data = %0.4f" % acc_train) acc_test = accuracy_quick(net, test_ds) print("Accuracy on test data = %0.4f" % acc_test) print("\nEnd load weights demo ") if __name__ == "__main__": main()
You must be logged in to post a comment.