All neural systems have hyperparameter values that must be determined by trial and error. Architecture hyperparameters include things like number of hidden layers, and hidden node activation. Training hyperparameters include things like learning rate and batch size. The system random number seed is a kind of a combined architecture-training hyperparameter because the seed determines initial weight values and order of processing during training.
In many scenarios, manually tuning hyperparameter values is feasible. But for large, complex neural systems, it’s sometimes necessary to programmatically find good hyperparameter values. Random search is simple but is inefficient. In grid search, you set up about 4 to 8 possible values for each hyperparameter, for example learn_rates = [0.001, 0.01, 0.05, 0.10] and bat_sizes = [4, 8, 16, 32], and then try all possible combinations. However, in some scenarios there are just too many combinations. If you have 10 hyperparameters and each can be one of 8 possible values, then there are 8 * 8 * . . 8 = 8^10 = 1,073,741,824 combinations of values.
For serious neural systems, I often use evolutionary optimization to search through all possible combinations of hyperparameter values. In very high-level pseudo-code:
create a population of solutions
loop max_generations times
pick two parent solutions
make child solution (crossover)
mutate child
use child to create, train, eval network
replace weak solution with child
end-loop
return best solution found
I put together a demo. I set up a synthetic scenario with 10 hypothetical hyperparameters (random seed, number hidden nodes, batch size, etc.) where each hyperparameter has 8 possible values. Each possible hyperparameter value is an index from 0 to 7 into an array of actual values. I created a dummy error function that is optimal for a solution of [0, 0, 0, 0, 0, 0, 0, 0, 0, 0].
Evolutionary optimization is a meta-heuristic with a huge number of concrete implementations — every step in the pseudo-code above can be implemented in dozens of different ways.
For my demo, I used a population size of 6 possible solutions. To pick two parents, I pick one randomly from the best 3 and the other parent randomly from the worst 3. To make a child solution, I use a fixed crossover point at the middle of parent and child, which is at [5], and use the left half of parent1 and the right half of parent2. For example:
parent1 = [2 4 6 8 0 1 3 5 7 7]
parent2 = [3 3 3 3 3 6 6 6 6 6]
child = [2 4 6 8 0 6 6 6 6 6]
To mutate, I select a random hyperparameter and then randomly add 1 or subtract 1. To replace a weak solution with the child, I randomly select one of the 3 worst solutions in the population.
The demo code found a very good solution of [1 0 1 1 0 0 0 0 2 0] in just 60 iterations/generations of the evolutionary algorithm.
My demo program uses a nested class definition, which is a rare technique for me. The outer primary class is Tuner. It has a nested Individual class that holds a solution (array of values) and its corresponding error. I place these together so that an array of Individual objects (the population) can be sorted by error. I could have used parallel arrays but the nested class seemed to make sense to me.
Good fun.
The evolution of science fiction art has always interested me. Artist Paul Lehr (1930-1998) did many book and magazines covers in the 1960s and 1970s. Lehr had a style that bridged the early pulp styles of the 1930s and the photo-based styles of the 1980s. I haven’t read any of these three novels. Maybe someday.
Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols.
# evolutionary_hyperparameter.py
import numpy as np
# n_hp = 10 10 hyperparameters
# n_hpv = 8 each hyperparam has 8 possible values
# n_pop = 6 population size
# -----------------------------------------------------------
class Tuner():
# ---------------------------------------------------------
class Individual():
def __init__(self, n_hp, n_hpv, seed):
# like [7,0,3,6,2,5,0,0,1,4] - 10 hyperparams in [0,7]
self.rnd = np.random.RandomState(seed)
self.soln = np.zeros(n_hp, dtype=np.int64)
for i in range(n_hp):
self.soln[i] = self.rnd.randint(0,n_hpv)
self.err = 0.0
def __lt__(self, other): # sorting
return self.err "lt" other.err
def __eq__(self, other):
return self.err == other.err
def show(self):
print(self.soln, end="")
print(" | ", end="")
print(self.err)
# ---------------------------------------------------------
def __init__(self, n_hp, n_hpv, n_pop, seed):
self.rnd = np.random.RandomState(seed)
self.n_hp = n_hp
self.n_hpv = n_hpv
self.n_pop = n_pop
self.pop = np.empty((n_pop), dtype=object)
for i in range(n_pop):
self.pop[i] = self.Individual(self.n_hp, self.n_hpv, i)
self.pop[i].err = self.compute_error(self.pop[i].soln)
self.sort()
# best found at any point in time
self.best_soln = np.copy(self.pop[0].soln)
self.best_err = self.pop[0].err
def sort(self):
self.pop = sorted(self.pop) # Individual sort
def show(self):
for i in range(self.n_pop):
self.pop[i].show() # calls Individual.show()
print("-----")
print(self.best_soln, end="")
print(" | ", end="")
print(self.best_err)
def compute_error(self, soln):
# in non-demo, soln would be used to create
# and train a neural system
err = 0.0
for i in range(self.n_hp): # each hyperparam
err += soln[i] # small val is low error
return err
def pick_parents(self):
# pick indices of two solns
first = self.rnd.randint(0, self.n_pop // 2) # good
second = self.rnd.randint(self.n_pop // 2, self.n_pop)
while second == first:
second = self.rnd.randint(self.n_pop // 2, self.n_pop)
flip = self.rnd.randint(2) # 0 or 1
if flip == 0:
return (first, second)
else:
return (second, first)
def crossover(self, i, j):
child_soln = np.zeros(self.n_hp, dtype=np.int64)
for k in range(0, self.n_hp//2): # left half
child_soln[k] = self.pop[i].soln[k]
for k in range(self.n_hp//2, self.n_hp): # right half
child_soln[k] = self.pop[j].soln[k]
return child_soln
def mutate(self, soln):
# a soln is an array with 10 cells, each in [0,7]
idx = self.rnd.randint(0, self.n_hp)
flip = self.rnd.randint(2) # 0 or 1
if flip == 0:
soln[idx] -= 1
if soln[idx] == -1:
soln[idx] = self.n_hpv-1 # largest
else:
soln[idx] += 1
if soln[idx] == self.n_hpv: # too much
soln[idx] = 0
def search(self, max_gen):
for gen in range(max_gen):
# 1. make a child soln using crossover
(i, j) = self.pick_parents()
child_soln = self.crossover(i, j)
# 2. mutate child
self.mutate(child_soln)
child_err = self.compute_error(child_soln)
# 2b. if new child has already been evaluated,
# then continue
# 3. is child a new best soln?
if child_err "lt" self.best_err:
print("New best soln found at gen " + str(gen))
self.best_soln = np.copy(child_soln)
self.best_err = child_err
# 4. replace a weak soln with child
idx = self.rnd.randint(self.n_pop // 2, self.n_pop)
# print("replace idx = " + str(idx))
for k in range(self.n_hp):
self.pop[idx].soln[k] = child_soln[k]
self.pop[idx].err = child_err
# 5. sort solns from best to worst
self.sort()
# -----------------------------------------------------------
# create a population of solns
# loop
# pick two parent solns
# make child
# mutate child
# evaluate child
# replace weak soln with child
# end-loop
def main():
print("\nBegin evolutionary hyperparameter tune/search ")
print("Setting 10 hyperparams, each with 8 possible values ")
print("Optimal param values all 0 ")
n_hp = 10 # 10 hyperparameters like lrn_rate, etc.
n_hpv = 8 # each hyperparam has 8 possible values
n_pop = 6 # population size
tuner = Tuner(n_hp, n_hpv, n_pop, seed=99)
print("\nInitial population: ")
tuner.show()
print("\nBegin hyperparameter search \n")
tuner.search(60)
print("\nDone ")
print("\nFinal population: ")
tuner.show()
print("\nEnd ")
if __name__ == "__main__":
main()
Here’s a different version that uses static/global n_HP, n_HPV, rnd.
# evolutionary_hyperparameter_2.py
import numpy as np
# n_hp = 10 10 hyperparameters
# n_hpv = 8 each hyperparam has 8 possible values
# n_pop = 6 population size
# -----------------------------------------------------------
class Tuner():
# static so that nested Individual can access
g_rnd = None
g_n_hp = 0
g_n_hpv = 0
def __init__(self, n_hp, n_hpv, n_pop, seed):
Tuner.g_n_hp = n_hp
Tuner.g_n_hpv = n_hpv
Tuner.g_rnd = np.random.RandomState(seed)
self.n_pop = n_pop
self.pop = np.empty((n_pop), dtype=object)
for i in range(n_pop):
self.pop[i] = self.Individual()
self.pop[i].err = self.compute_error(self.pop[i].soln)
self.sort()
# best found at any point in time
self.best_soln = np.copy(self.pop[0].soln)
self.best_err = self.pop[0].err
def sort(self):
self.pop = sorted(self.pop) # Individual sort
def show(self):
for i in range(self.n_pop):
self.pop[i].display() # calls Individual.display()
print("-----")
print(self.best_soln, end="")
print(" | ", end="")
print(self.best_err)
def compute_error(self, soln):
# in non-demo, soln would be used to create
# and train a neural system
err = 0.0
for i in range(len(soln)): # each hyperparam
err += soln[i] # small val is low error
return err
def pick_parents(self):
# pick indices of two solns
first = Tuner.g_rnd.randint(0, self.n_pop // 2) # good
second = Tuner.g_rnd.randint(self.n_pop // 2, self.n_pop)
while second == first:
second = Tuner.g_rnd.randint(self.n_pop // 2, self.n_pop)
flip = Tuner.g_rnd.randint(2) # 0 or 1
if flip == 0:
return (first, second)
else:
return (second, first)
def crossover(self, i, j):
child_soln = np.zeros(Tuner.g_n_hp, dtype=np.int64)
for k in range(0, Tuner.g_n_hp//2): # left half
child_soln[k] = self.pop[i].soln[k]
for k in range(Tuner.g_n_hp//2, Tuner.g_n_hp): # right half
child_soln[k] = self.pop[j].soln[k]
return child_soln
def mutate(self, soln):
# a soln is an array with 10 cells, each in [0,7]
idx = Tuner.g_rnd.randint(0, Tuner.g_n_hp)
flip = Tuner.g_rnd.randint(2) # 0 or 1
if flip == 0:
soln[idx] -= 1
if soln[idx] == -1:
soln[idx] = Tuner.g_n_hpv-1 # largest
else:
soln[idx] += 1
if soln[idx] == Tuner.g_n_hpv: # too much
soln[idx] = 0
def search(self, max_gen):
for gen in range(max_gen):
# 1. make a child soln using crossover
(i, j) = self.pick_parents()
child_soln = self.crossover(i, j)
# 2. mutate child
self.mutate(child_soln)
child_err = self.compute_error(child_soln)
# 2b. if new child has already been evaluated,
# then continue
# 3. is child a new best soln?
if child_err "lt" self.best_err:
print("New best soln found at gen " + str(gen))
self.best_soln = np.copy(child_soln)
self.best_err = child_err
# 4. replace a weak soln with child
idx = Tuner.g_rnd.randint(self.n_pop // 2, self.n_pop)
# print("replace idx = " + str(idx))
for k in range(Tuner.g_n_hp):
self.pop[idx].soln[k] = child_soln[k]
self.pop[idx].err = child_err
# 5. sort solns from best to worst
self.sort()
# ---------------------------------------------------------
class Individual():
def __init__(self):
# like [7,0,3,6,2,5,0,0,1,4] - 10 hyperparams in [0,7]
self.soln = np.zeros(Tuner.g_n_hp, dtype=np.int64)
for i in range(len(self.soln)):
self.soln[i] = Tuner.g_rnd.randint(0,Tuner.g_n_hpv)
self.err = 0.0
def __lt__(self, other): # sorting
return self.err "lt" other.err
def __eq__(self, other):
return self.err == other.err
def display(self):
print(self.soln, end="")
print(" | ", end="")
print(self.err)
# ---------------------------------------------------------
# -----------------------------------------------------------
def main():
print("\nBegin evolutionary hyperparameter tune/search ")
print("Setting 10 hyperparams, each with 8 possible values ")
print("Optimal param values all 0 ")
n_hp = 10 # 10 hyperparameters like lrn_rate, etc.
n_hpv = 8 # each hyperparam has 8 possible values
n_pop = 6 # population size
tuner = Tuner(n_hp, n_hpv, n_pop, seed=0)
print("\nInitial population: ")
tuner.show()
print("\nBegin hyperparameter search \n")
tuner.search(60)
print("\nDone ")
print("\nFinal population: ")
tuner.show()
print("\nEnd ")
if __name__ == "__main__":
main()
You must be logged in to post a comment.