Parameterizing PyTorch Neural Network Architecture and Training Values for Evolutionary Optimization

When creating a neural network prediction model, you have to set values for the architecture (number hidden layers, number hidden nodes in each layer, hidden activation, etc.) and training (optimizer, batch size, etc.) In some scenarios you can manually experiment with these hyperparameter values. In other scenarios, you can set up lists of possible values and then use random search or grid search.

A more sophisticated approach is to use evolutionary optimization to find a good set of architecture and training values. This is a project I’ve been looking at recently. As part of my experiments, I put together a demo that parameterizes a network and training values and then computes a fitness value. The idea is best explained by code.

Suppose you want to predict the political leaning of a person (conservative = 0, moderate = 1, liberal = 2) from their sex (male = -1, female = +1), age (divided by 100), State (Michigan = 100, Nebraska = 010, Oklahoma = 001), and income (divided by $100,000). Now, consider this code:

  # first, create train_ds and test_ds
  print("Setting 6-(10-10)-3 tanh 10 0.01 1000 SGD")
  f = fitness(n_hid=10, activ='tanh',
    trn_ds=train_ds, tst_ds=test_ds, 
    bs=10, lr=0.01, me=1000, opt='sgd')
  print("Fitness = %0.4f " % f)

The fitness function creates a 6-(10-10)-3 neural network classifier with tanh() hidden node activation, and trains it using a batch size of 10, stochastic gradient descent with a learning rate of 0.01, and 1000 epochs. The return value is a measure of how good the network is, often called a fitness value in evolutionary optimization terminology.

The fitness() function is very short and simple because the function farms out most of the work to program-defined train() and accuracy() functions:

def fitness(n_hid=10, activ='tanh', trn_ds=None, tst_ds=None, 
 bs=10, lr=0.01, me=1000, opt='sgd'):

  T.manual_seed(1)  # prepare
  np.random.seed(1)

  net = Net(n_hid, activ).to(device)  # create

  net.train()
  train(net, trn_ds, bs, lr, me, opt)  # train

  net.eval()
  acc_train = accuracy_quick(net, trn_ds)  # evaluate
  acc_test = accuracy_quick(net, tst_ds) 
  return (acc_train + acc_test) / 2

I decided to define fitness as the average of the accuracy of the trained network on the training and test data. This is something I need to give more thought to.

I don’t believe it’s feasible to create a general purpose framework for parameterization — each problem is significantly different. The real decisions are what to parameterize and what to hard-code. For example, my demo hard-codes the architecture with a fixed two hidden layers rather than a variable number of layers.

The parameterization is just the first part of an evolutionary optimization system. My next steps will be to add functions to generate random solutions, select two parent solutions, combine two parents to produce a child solution, and mutate child solutions.

Fascinating stuff (to me anyway).



Evolution has produced some strange animals. Left: Tullimonstrum, informally known as the Tully monster, is an extinct invertebrate that lived about 300 million years ago. It was about 14 inches long and had two primitive eye stalks. Right: Opabinia is an extinct arthropod that lived about 500 million years ago. It was about three inches long and had five eyes. Images like these in my head are one of several reasons why I don’t eat calamari.


Demo code below. The training and test data can be found at https://jamesmccaffrey.wordpress.com/2022/09/01/multi-class-classification-using-pytorch-1-12-1-on-windows-10-11/.

# people_politics_encoded.py
# predict politics type from sex, age, state, income
# PyTorch 2.0.1-CPU Anaconda3-2022.10  Python 3.9.13
# Windows 10/11 

# experiemnt for hyperparameter evolutionary optimization

import numpy as np
import torch as T
device = T.device('cpu')  # apply to Tensor or Module

# -----------------------------------------------------------

class PeopleDataset(T.utils.data.Dataset):
  # sex  age    state    income   politics
  # -1   0.27   0  1  0   0.7610   2
  # +1   0.19   0  0  1   0.6550   0
  # sex: -1 = male, +1 = female
  # state: michigan, nebraska, oklahoma
  # politics: conservative, moderate, liberal

  def __init__(self, src_file):
    all_xy = np.loadtxt(src_file, usecols=range(0,7),
      delimiter="\t", comments="#", dtype=np.float32)
    tmp_x = all_xy[:,0:6]   # cols [0,6) = [0,5]
    tmp_y = all_xy[:,6]     # 1-D

    self.x_data = T.tensor(tmp_x, 
      dtype=T.float32).to(device)
    self.y_data = T.tensor(tmp_y,
      dtype=T.int64).to(device)  # 1-D

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    preds = self.x_data[idx]
    trgts = self.y_data[idx] 
    return preds, trgts  # as a Tuple

# -----------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self, n_hid, activ='tanh'):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(6, n_hid)  # 6-(nh-nh)-3
    self.hid2 = T.nn.Linear(n_hid, n_hid)
    self.oupt = T.nn.Linear(n_hid, 3)

    if activ == 'tanh':
      self.activ = T.nn.Tanh()
    elif activ == 'relu':
      self.activ = T.nn.ReLU()

    # use default weight init

  def forward(self, x):
    z = self.activ(self.hid1(x))
    z = self.activ(self.hid2(z)) 
    z = T.log_softmax(self.oupt(z), dim=1)  # NLLLoss() 
    return z

# -----------------------------------------------------------

def accuracy_quick(model, dataset):
  # assumes model.eval()
  X = dataset[0:len(dataset)][0]
  Y = dataset[0:len(dataset)][1]
  with T.no_grad():
    oupt = model(X)  #  [40,3]  logits
  arg_maxs = T.argmax(oupt, dim=1)  # argmax() is new
  num_correct = T.sum(Y==arg_maxs)
  acc = (num_correct * 1.0 / len(dataset))
  return acc.item()

# -----------------------------------------------------------

def train(net, ds, bs, lr, me, opt='sgd'):
  # dataset, bat_size, lrn_rate, max_epochs, optimizer
  train_ldr = T.utils.data.DataLoader(ds, batch_size=bs,
    shuffle=True)
  loss_func = T.nn.NLLLoss()
  if opt == 'sgd':
    optimizer = T.optim.SGD(net.parameters(), lr=lr)
  elif opt == 'adam':
    optimizer = T.optim.Adam(net.parameters(), lr=lr)  

  print("\nStarting training ")
  le = me // 5  # log interval: 5 log prints
  for epoch in range(0, me):
    epoch_loss = 0.0  # for one full epoch
    for (batch_idx, batch) in enumerate(train_ldr):
      X = batch[0]  # inputs
      Y = batch[1]  # correct class/label/politics

      optimizer.zero_grad()
      oupt = net(X)
      loss_val = loss_func(oupt, Y)  # a tensor
      epoch_loss += loss_val.item()  # accumulate
      loss_val.backward()
      optimizer.step()

    if epoch % le == 0:
      print("epoch = %5d  |  loss = %10.4f" % \
        (epoch, epoch_loss)) 
  print("Done ") 

# -----------------------------------------------------------

def fitness(n_hid=10, activ='tanh', trn_ds=None, tst_ds=None, 
 bs=10, lr=0.01, me=1000, opt='sgd'):

  T.manual_seed(1)  # prepare
  np.random.seed(1)

  net = Net(n_hid, activ).to(device)  # create

  net.train()
  train(net, trn_ds, bs, lr, me, opt)  # train

  net.eval()
  acc_train = accuracy_quick(net, trn_ds)  # evaluate
  acc_test = accuracy_quick(net, tst_ds) 
  return (acc_train + acc_test) / 2

# -----------------------------------------------------------

def main():
  # 0. get started
  print("\nBegin People predict politics type ")
  
  # 1. create DataLoader objects
  print("\nCreating People Datasets ")

  train_file = ".\\Data\\people_train.txt"
  train_ds = PeopleDataset(train_file)  # 200 rows

  test_file = ".\\Data\\people_test.txt"
  test_ds = PeopleDataset(test_file)    # 40 rows

  # 2. compute fitness for architecture and train parameters
  print("\nSetting 6-(10-10)-3 tanh 10 0.01 1000 SGD")
  f = fitness(n_hid=10, activ='tanh',
    trn_ds=train_ds, tst_ds=test_ds, 
    bs=10, lr=0.01, me=1000, opt='sgd')
  print("\nFitness = %0.4f " % f)

  print("\nSetting 6-(8-8)-3 relu 10 0.01 1000 Adam")
  f = fitness(n_hid=8, activ='relu',
    trn_ds=train_ds, tst_ds=test_ds, 
    bs=10, lr=0.01, me=1000, opt='adam')
  print("\nFitness = %0.4f " % f)

  # 3. TODO: verify trained model is valid
  # 4. TODO: save trained model
 
  print("\nEnd People predict politics encoding demo")

if __name__ == "__main__":
  main()
This entry was posted in PyTorch. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s