Hyperparameter Tuning Using Evolutionary Optimization

All neural systems have hyperparameter values that must be determined by trial and error. Architecture hyperparameters include things like number of hidden layers, and hidden node activation. Training hyperparameters include things like learning rate and batch size. The system random number seed is a kind of a combined architecture-training hyperparameter because the seed determines initial weight values and order of processing during training.

In many scenarios, manually tuning hyperparameter values is feasible. But for large, complex neural systems, it’s sometimes necessary to programmatically find good hyperparameter values. Random search is simple but is inefficient. In grid search, you set up about 4 to 8 possible values for each hyperparameter, for example learn_rates = [0.001, 0.01, 0.05, 0.10] and bat_sizes = [4, 8, 16, 32], and then try all possible combinations. However, in some scenarios there are just too many combinations. If you have 10 hyperparameters and each can be one of 8 possible values, then there are 8 * 8 * . . 8 = 8^10 = 1,073,741,824 combinations of values.

For serious neural systems, I often use evolutionary optimization to search through all possible combinations of hyperparameter values. In very high-level pseudo-code:

create a population of solutions
loop max_generations times
  pick two parent solutions
  make child solution (crossover)
  mutate child
  use child to create, train, eval network
  replace weak solution with child
end-loop
return best solution found

I put together a demo. I set up a synthetic scenario with 10 hypothetical hyperparameters (random seed, number hidden nodes, batch size, etc.) where each hyperparameter has 8 possible values. Each possible hyperparameter value is an index from 0 to 7 into an array of actual values. I created a dummy error function that is optimal for a solution of [0, 0, 0, 0, 0, 0, 0, 0, 0, 0].

Evolutionary optimization is a meta-heuristic with a huge number of concrete implementations — every step in the pseudo-code above can be implemented in dozens of different ways.

For my demo, I used a population size of 6 possible solutions. To pick two parents, I pick one randomly from the best 3 and the other parent randomly from the worst 3. To make a child solution, I use a fixed crossover point at the middle of parent and child, which is at [5], and use the left half of parent1 and the right half of parent2. For example:

parent1 = [2 4 6 8 0 1 3 5 7 7]
parent2 = [3 3 3 3 3 6 6 6 6 6]

child   = [2 4 6 8 0 6 6 6 6 6]

To mutate, I select a random hyperparameter and then randomly add 1 or subtract 1. To replace a weak solution with the child, I randomly select one of the 3 worst solutions in the population.

The demo code found a very good solution of [1 0 1 1 0 0 0 0 2 0] in just 60 iterations/generations of the evolutionary algorithm.

My demo program uses a nested class definition, which is a rare technique for me. The outer primary class is Tuner. It has a nested Individual class that holds a solution (array of values) and its corresponding error. I place these together so that an array of Individual objects (the population) can be sorted by error. I could have used parallel arrays but the nested class seemed to make sense to me.

Good fun.



The evolution of science fiction art has always interested me. Artist Paul Lehr (1930-1998) did many book and magazines covers in the 1960s and 1970s. Lehr had a style that bridged the early pulp styles of the 1930s and the photo-based styles of the 1980s. I haven’t read any of these three novels. Maybe someday.


Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols.

# evolutionary_hyperparameter.py

import numpy as np

# n_hp = 10  10 hyperparameters
# n_hpv = 8  each hyperparam has 8 possible values
# n_pop = 6  population size


# -----------------------------------------------------------

class Tuner():

  # ---------------------------------------------------------
  class Individual():
    def __init__(self, n_hp, n_hpv, seed):
      # like [7,0,3,6,2,5,0,0,1,4] - 10 hyperparams in [0,7]
      self.rnd = np.random.RandomState(seed)
      self.soln = np.zeros(n_hp, dtype=np.int64)
      for i in range(n_hp):
        self.soln[i] = self.rnd.randint(0,n_hpv)
      self.err = 0.0

    def __lt__(self, other):  # sorting
      return self.err "lt" other.err

    def __eq__(self, other):
      return self.err == other.err

    def show(self):
      print(self.soln, end="")
      print(" | ", end="")
      print(self.err)
  # ---------------------------------------------------------

  def __init__(self, n_hp, n_hpv, n_pop, seed):
    self.rnd = np.random.RandomState(seed)
    self.n_hp = n_hp
    self.n_hpv = n_hpv
    self.n_pop = n_pop

    self.pop = np.empty((n_pop), dtype=object)
    for i in range(n_pop):
      self.pop[i] = self.Individual(self.n_hp, self.n_hpv, i)
      self.pop[i].err = self.compute_error(self.pop[i].soln)
    self.sort()

    # best found at any point in time
    self.best_soln = np.copy(self.pop[0].soln)
    self.best_err = self.pop[0].err

  def sort(self):
    self.pop = sorted(self.pop)  # Individual sort

  def show(self):
    for i in range(self.n_pop):
      self.pop[i].show()  # calls Individual.show()
    print("-----")
    print(self.best_soln, end="")
    print(" | ", end="")
    print(self.best_err)
 
  def compute_error(self, soln): 
    # in non-demo, soln would be used to create 
    # and train a neural system
    err = 0.0
    for i in range(self.n_hp):  # each hyperparam
      err += soln[i]            # small val is low error
    return err 

  def pick_parents(self):
    # pick indices of two solns
    first = self.rnd.randint(0, self.n_pop // 2)  # good
    second = self.rnd.randint(self.n_pop // 2, self.n_pop) 
    while second == first:
      second = self.rnd.randint(self.n_pop // 2, self.n_pop)
    flip = self.rnd.randint(2)  # 0 or 1
    if flip == 0:
      return (first, second)
    else:
      return (second, first)

  def crossover(self, i, j):
    child_soln = np.zeros(self.n_hp, dtype=np.int64)
    for k in range(0, self.n_hp//2):  # left half
      child_soln[k] = self.pop[i].soln[k]
    for k in range(self.n_hp//2, self.n_hp):  # right half
      child_soln[k] = self.pop[j].soln[k]
    return child_soln

  def mutate(self, soln):
    # a soln is an array with 10 cells, each in [0,7]
    idx = self.rnd.randint(0, self.n_hp)
    flip = self.rnd.randint(2)  # 0 or 1
    if flip == 0:
      soln[idx] -= 1
      if soln[idx] == -1:
        soln[idx] = self.n_hpv-1  # largest
    else:
      soln[idx] += 1
      if soln[idx] == self.n_hpv: # too much
        soln[idx] = 0

  def search(self, max_gen):
    for gen in range(max_gen):
      # 1. make a child soln using crossover
      (i, j) = self.pick_parents()
      child_soln = self.crossover(i, j)

      # 2. mutate child
      self.mutate(child_soln)
      child_err = self.compute_error(child_soln)

      # 2b. if new child has already been evaluated, 
      #     then continue
      
      # 3. is child a new best soln?
      if child_err "lt" self.best_err:
        print("New best soln found at gen " + str(gen))
        self.best_soln = np.copy(child_soln)
        self.best_err = child_err

      # 4. replace a weak soln with child
      idx = self.rnd.randint(self.n_pop // 2, self.n_pop)
      # print("replace idx = " + str(idx))
      for k in range(self.n_hp):
        self.pop[idx].soln[k] = child_soln[k]
      self.pop[idx].err = child_err

      # 5. sort solns from best to worst
      self.sort()
  
# -----------------------------------------------------------

# create a population of solns
# loop
#   pick two parent solns
#   make child
#   mutate child
#   evaluate child
#   replace weak soln with child
# end-loop

def main():
  print("\nBegin evolutionary hyperparameter tune/search ")
  print("Setting 10 hyperparams, each with 8 possible values ")
  print("Optimal param values all 0 ")

  n_hp = 10  # 10 hyperparameters like lrn_rate, etc.
  n_hpv = 8  # each hyperparam has 8 possible values
  n_pop = 6  # population size
  tuner = Tuner(n_hp, n_hpv, n_pop, seed=99)
  
  print("\nInitial population: ")
  tuner.show()

  print("\nBegin hyperparameter search \n")
  tuner.search(60)
  print("\nDone ")

  print("\nFinal population: ")
  tuner.show()

  print("\nEnd ")

if __name__ == "__main__":
  main()

Here’s a different version that uses static/global n_HP, n_HPV, rnd.

# evolutionary_hyperparameter_2.py

import numpy as np

# n_hp = 10  10 hyperparameters
# n_hpv = 8  each hyperparam has 8 possible values
# n_pop = 6  population size


# -----------------------------------------------------------

class Tuner():

  # static so that nested Individual can access
  g_rnd = None
  g_n_hp = 0
  g_n_hpv = 0
  
  def __init__(self, n_hp, n_hpv, n_pop, seed):
    Tuner.g_n_hp = n_hp
    Tuner.g_n_hpv = n_hpv
    Tuner.g_rnd = np.random.RandomState(seed)

    self.n_pop = n_pop
    self.pop = np.empty((n_pop), dtype=object)

    for i in range(n_pop):
      self.pop[i] = self.Individual()
      self.pop[i].err = self.compute_error(self.pop[i].soln)
    self.sort()

    # best found at any point in time
    self.best_soln = np.copy(self.pop[0].soln)
    self.best_err = self.pop[0].err

  def sort(self):
    self.pop = sorted(self.pop)  # Individual sort

  def show(self):
    for i in range(self.n_pop):
      self.pop[i].display()  # calls Individual.display()
    print("-----")
    print(self.best_soln, end="")
    print(" | ", end="")
    print(self.best_err)
 
  def compute_error(self, soln): 
    # in non-demo, soln would be used to create 
    # and train a neural system
    err = 0.0
    for i in range(len(soln)):  # each hyperparam
      err += soln[i]            # small val is low error
    return err 

  def pick_parents(self):
    # pick indices of two solns
    first = Tuner.g_rnd.randint(0, self.n_pop // 2)  # good
    second = Tuner.g_rnd.randint(self.n_pop // 2, self.n_pop)
    while second == first:
      second = Tuner.g_rnd.randint(self.n_pop // 2, self.n_pop)
    flip = Tuner.g_rnd.randint(2)  # 0 or 1
    if flip == 0:
      return (first, second)
    else:
      return (second, first)

  def crossover(self, i, j):
    child_soln = np.zeros(Tuner.g_n_hp, dtype=np.int64)
    for k in range(0, Tuner.g_n_hp//2):  # left half
      child_soln[k] = self.pop[i].soln[k]
    for k in range(Tuner.g_n_hp//2, Tuner.g_n_hp):  # right half
      child_soln[k] = self.pop[j].soln[k]
    return child_soln

  def mutate(self, soln):
    # a soln is an array with 10 cells, each in [0,7]
    idx = Tuner.g_rnd.randint(0, Tuner.g_n_hp)
    flip = Tuner.g_rnd.randint(2)  # 0 or 1
    if flip == 0:
      soln[idx] -= 1
      if soln[idx] == -1:
        soln[idx] = Tuner.g_n_hpv-1  # largest
    else:
      soln[idx] += 1
      if soln[idx] == Tuner.g_n_hpv: # too much
        soln[idx] = 0

  def search(self, max_gen):
    for gen in range(max_gen):
      # 1. make a child soln using crossover
      (i, j) = self.pick_parents()
      child_soln = self.crossover(i, j)

      # 2. mutate child
      self.mutate(child_soln)
      child_err = self.compute_error(child_soln)

      # 2b. if new child has already been evaluated, 
      #     then continue
      
      # 3. is child a new best soln?
      if child_err "lt" self.best_err:
        print("New best soln found at gen " + str(gen))
        self.best_soln = np.copy(child_soln)
        self.best_err = child_err

      # 4. replace a weak soln with child
      idx = Tuner.g_rnd.randint(self.n_pop // 2, self.n_pop)
      # print("replace idx = " + str(idx))
      for k in range(Tuner.g_n_hp):
        self.pop[idx].soln[k] = child_soln[k]
      self.pop[idx].err = child_err

      # 5. sort solns from best to worst
      self.sort()

  # ---------------------------------------------------------
  class Individual():
    def __init__(self):
      # like [7,0,3,6,2,5,0,0,1,4] - 10 hyperparams in [0,7]
      self.soln = np.zeros(Tuner.g_n_hp, dtype=np.int64)
      for i in range(len(self.soln)):
        self.soln[i] = Tuner.g_rnd.randint(0,Tuner.g_n_hpv)
      self.err = 0.0

    def __lt__(self, other):  # sorting
      return self.err "lt" other.err

    def __eq__(self, other):
      return self.err == other.err

    def display(self):
      print(self.soln, end="")
      print(" | ", end="")
      print(self.err)
  # ---------------------------------------------------------
  
# -----------------------------------------------------------

def main():
  print("\nBegin evolutionary hyperparameter tune/search ")
  print("Setting 10 hyperparams, each with 8 possible values ")
  print("Optimal param values all 0 ")

  n_hp = 10  # 10 hyperparameters like lrn_rate, etc.
  n_hpv = 8  # each hyperparam has 8 possible values
  n_pop = 6  # population size
  tuner = Tuner(n_hp, n_hpv, n_pop, seed=0)
  
  print("\nInitial population: ")
  tuner.show()

  print("\nBegin hyperparameter search \n")
  tuner.search(60)
  print("\nDone ")

  print("\nFinal population: ")
  tuner.show()

  print("\nEnd ")

if __name__ == "__main__":
  main()
This entry was posted in Machine Learning. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s