Loading Custom Weight Values Into a PyTorch Network

I’ve been exploring the idea of training a PyTorch neural network using an evolutionary algorithm. The basic idea is to create a population of solutions (here, a set of neural weights and biases) and then repeatedly combine two solutions to create a better child solution.

Conceptually, the ideas are quite subtle. The two main points of using PyTorch are 1.) use GPU tensors for speed, 2.) use built-in gradient computation for back-propagation training. But an evolutionary algorithm doesn’t use gradients. However, it still kind of makes sense to use PyTorch for an evolutionary approach because I can leverage the enormous PyTorch infrastructure which includes DataLoader objects, and so on.

For my demo, I created a 6-(10-10)-3 neural network where I imagine the goal is to predict Employee job-type (one of three) from sex, age, city (one of three) , and income.

import torch as T
class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(6, 10)  # 6-(10-10)-3
    self.hid2 = T.nn.Linear(10, 10)
    self.oupt = T.nn.Linear(10, 3)

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.tanh(self.hid2(z))
    z = T.softmax(self.oupt(z), dim=1) # note
    return z

The hid1 layer weights are size [10,6] and the biases are [10]. Similarly, hid2 weights and biases are size [10,10] and [10], and the oupt weights and biases are size [3,10] and [3]. Therefore, there are a total of 60 + 10 + 100 + 10 + 30 + 3 = 213 weights and biases.

I wrote a function that accepts a tensor of 213 values and then distributes them to the network weights and biases:

def load_weights(net, wts):
  if len(wts) != (10*6) + 10 + (10*10) + 10 + (3*10) + 3:
    print("FATAL: incorrect number wts in load_weights() ")

  net.hid1.weight.data = wts[0:60].reshape((10,6))
  net.hid1.bias.data = wts[60:70]
  net.hid2.weight.data = wts[70:170].reshape((10,10))
  net.hid2.bias.data = wts[170:180]
  net.oupt.weight.data = wts[180:210].reshape((3,10))
  net.oupt.bias.data = wts[210:213]

Important: The values are being copied into the weights and biases by reference rather than by value, so this technique (probably — I’m not sure) can’t be used for a standard PyTorch scenario where gradients are used during training.

I created a short program to create a network and load weights from 0.000 to 0.213. The demo seemed to work. Now I have the infrastructure I need to explore training the network using an evolutionary algorithm.

The demo program is a multi-class classification problem. In general, regression problems, where the goal is to predict a single numeric value, such as an SAT (Scholastic Aptitude Test) college readiness test score, tend to be more difficult than classification problems. I’m hopeful that evolutionary algorithms can improve accuracy on regression problems.



SAT math scores are quite stable over time. The ability gap between groups has been consistent and large. Predicting SAT scores is not difficult. Note: the big jump in scores in 2017 was due to a change in the test, not a jump in ability.


Demo code. The data can be found at https://jamesmccaffrey.wordpress.com/2022/04/29/predicting-employee-job-type-using-pytorch-1-10-on-windows-11/.

# employee_job_load_wts.py
# predict job type from sex, age, city, income
# PyTorch 1.12.1-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10/11 

# load weights (after initialization)

import numpy as np
import torch as T
device = T.device('cpu')  # apply to Tensor or Module

# -----------------------------------------------------------

class EmployeeDataset(T.utils.data.Dataset):
  # sex  age    city      income  job-type
  # -1   0.27   0  1  0   0.7610   2
  # +1   0.19   0  0  1   0.6550   0
  # sex: -1 = male, +1 = female
  # city: anaheim, boulder, concord
  # job type: mgmt, supp, tech

  def __init__(self, src_file):
    all_xy = np.loadtxt(src_file, usecols=range(0,7),
      delimiter="\t", comments="#", dtype=np.float32)
    tmp_x = all_xy[:,0:6]  # cols [0,6) = [0,5]
    tmp_y = all_xy[:,6]    # 1-D

    self.x_data = T.tensor(tmp_x, 
      dtype=T.float32).to(device)
    self.y_data = T.tensor(tmp_y,
      dtype=T.int64).to(device)  # 1-D

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    preds = self.x_data[idx]
    trgts = self.y_data[idx] 
    return (preds, trgts)  # as a Tuple

# -----------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(6, 10)  # 6-(10-10)-3
    self.hid2 = T.nn.Linear(10, 10)
    self.oupt = T.nn.Linear(10, 3)

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.tanh(self.hid2(z))
    z = T.softmax(self.oupt(z), dim=1) # note
    return z

# -----------------------------------------------------------

def accuracy_quick(model, dataset):
  # assumes model.eval()
  X = dataset[0:len(dataset)][0]
  Y = T.flatten(dataset[0:len(dataset)][1])

  with T.no_grad():
    oupt = model(X)
  # (_, arg_maxs) = T.max(oupt, dim=1)
  arg_maxs = T.argmax(oupt, dim=1)  # argmax() is new
  num_correct = T.sum(Y==arg_maxs)
  acc = (num_correct * 1.0 / len(dataset))
  return acc.item()

# -----------------------------------------------------------

def load_weights(net, wts):
  if len(wts) != (10*6) + 10 + (10*10) + 10 + (3*10) + 3:
    print("FATAL: incorrect number wts in load_weights() ")

  net.hid1.weight.data = wts[0:60].reshape((10,6))
  net.hid1.bias.data = wts[60:70]
  net.hid2.weight.data = wts[70:170].reshape((10,10))
  net.hid2.bias.data = wts[170:180]
  net.oupt.weight.data = wts[180:210].reshape((3,10))
  net.oupt.bias.data = wts[210:213]

# -----------------------------------------------------------

def main():
  # 0. get started
  print("\nBegin PyTorch load weights demo ")
  T.manual_seed(1)
  np.random.seed(1)
  
  # 1. create DataLoader objects
  print("\nCreating Employee Datasets ")

  train_file = ".\\Data\\employee_train.txt"
  train_ds = EmployeeDataset(train_file)  # 200 rows

  test_file = ".\\Data\\employee_test.txt"
  test_ds = EmployeeDataset(test_file)  # 40 rows

# -----------------------------------------------------------

  # 2. create network
  print("\nCreating 6-(10-10)-3 NN default init ")
  net = Net().to(device)
  net.eval()

  X = np.array([[-1, 0.30,  0,0,1,  0.5000]],
    dtype=np.float32)
  X = T.tensor(X, dtype=T.float32).to(device)

  print("\nInput = ")
  print(X)
  with T.no_grad(): 
    z = net(X)
  print("\nOutput = ")
  print(z)
  
  # 3. load weights and biases
  wts = T.arange(start=0, end=213, step=1,
    dtype=T.float32).to(device)
  wts /= 100.0
  print("\nSetting 213 weight/bias values: ")
  print(wts[0:6], end=""); print(" . . . ")

  print("\nLoading weights/biases into net ")
  load_weights(net, wts)
 
  print("\nInput = ")
  print(X)
  with T.no_grad(): 
    z = net(X)
  print("\nOutput = ")
  print(z)

# -----------------------------------------------------------

  # 4. evaluate model accuracy
  print("\nComputing model accuracy")
  acc_train = accuracy_quick(net, train_ds) 
  print("Accuracy on train data = %0.4f" % acc_train)
  acc_test = accuracy_quick(net, test_ds) 
  print("Accuracy on test data = %0.4f" % acc_test)

  print("\nEnd load weights demo ")

if __name__ == "__main__":
  main()
This entry was posted in PyTorch. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s