## Particle Swarm Optimization Variants

Particle swarm optimization (PSO) is a meta-heuristic that can be used to construct a specific algorithm to find the minimum of an error function. In theory, PSO could improve neural network training because PSO does not use Calculus gradients like virtually every standard training algorithm — stochastic gradient descent and all of its variations such as Adam optimization.

In practice however, PSO is much too slow for very large, very deep neural networks. PSO maintains a collection (swarm) of possible solutions (particles). The “position” of a particle is a possible solution. For example, if some particle has position (4.5, -0.8, 3.2, 6.6, -1.7) this means (4.5, -0.8, 3.2, 6.6, -1.7) is a possible solution, for, say the five weights for a tiny neural network.

At a very high level, PSO looks like:

```create a swarm of random possible solutions
loop max_iterations
for-each particle/solution in swarm
compute a velocity
use velocity to update particle/solution
end-for
end-loop
return best particle/solution found by any particle
```

In code, each particle/solution is repeatedly updated like so:

```for k in range(dim):
swarm[i].position[k] += swarm[i].velocity[k]
```

In words, each particle (swarm[i]) component (position[k]) is the current component plus a “velocity”. A new velocity is computed as:

```swarm[i].velocity[k] = ( (w * swarm[i].velocity[k]) +
(c1 * r1 * (swarm[i].best_part_pos[k] -
swarm[i].position[k])) +
(c2 * r2 * (best_swarm_pos[k] -
swarm[i].position[k])) )
```

It’s not immediately obvious but the new velocity of a particle is based on the best position/solution the particle has seen so far (best_part_pos) and the best position/solution seen by any particle in the swarm (best_swarm_pos).

There are several constants and two randomization (r1, r2). The w constant is called inertia, the c1 constant is called cognitive/particle/self, and the c2 constant is called social/swarm/group. Most implementations of PSO use w = 0.729, c1 = 1.49445, c2 = 1.49445.

Because PSO is a loosely defined meta-heuristic rather than a tightly defined algorithm, there are many hundreds of reasonable variations that researchers can explore.

Many of the PSO variations involve using variable/adaptive values for w, c1, c2 rather than fixed constants. Other PSO variations do periodic resets or use multiple swarms to prevent the swarm from focusing too closely on one solution (premature convergence) that might not be optimal.

Four examples of the adaptive approach include “Linear Decreasing Inertia Weight” PSO (LDIW-PSO) which slowly decreases the w-inertia value, “Chaotic Descending Inertia Weight” PSO (CDIW-PSO) which adjusts the w-inertia value in a more randomized way, “Random Inertia Weight and Evolutionary Strategy” PSO (REPSO) which uses random w-inertia values plus some ideas from simulated annealing optimization, and “Adaptive Particle Swarm Optimization” (APSO) which uses the mutation idea from genetic algorithms.

In some sense all of these variations of PSO are really just excuses to publish yet-another research paper. The explorations are legitimate but they have almost no practical value — there are so many hyperparameters involved that any PSO variant can be shown to be better than any other variant by fiddling with the hyperparameters values.

But in the long run, optimization algorithms based on PSO and other similar meta-heuristics (spiral optimization, evolutionary algorithms, bacterial foraging optimization, fireworks optimization, etc., etc.) have great potential when vastly increased processing power becomes available — perhaps through quantum computing breakthroughs, or perhaps just a slow but steady increase in classical computing hardware.

An example of combining computer science with art. I stumbled across a 2020 research paper “Generative Art with Swarm Landscapes” by D. de Andrade, N. Fachada, C. Fernandes, and A. Rosa. The authors took the Mona Lisa, digitized it and used the pixel values to define an error function. Then they applied particle swarm optimization and graphed the paths explored by the particles as they searched the image. Art like this is interesting and has a particular (semi-pun) form of beauty, but I much prefer human, non-computer generated, art.

Basic PSO demo code below.

```# particle_swarm_demo.py
# python 3.7.6
# demo of particle swarm optimization (PSO)
# solves Rastrigin's function

import random
import math    # cos() for Rastrigin
import copy    # array-copying convenience
import sys     # max float

# ------------------------------------

def show_vector(vector):
for i in range(len(vector)):
if i % 8 == 0: # 8 columns
print("\n", end="")
if vector[i] "greater-equal" 0.0:  # replace here
print(' ', end="")
print("%.4f" % vector[i], end="") # 4 decimals
print(" ", end="")
print("\n")

def error(position):
err = 0.0
for i in range(len(position)):
xi = position[i]
# one of several minor Rastrigin variants
err += (xi * xi) - (10 * math.cos(2 * math.pi * xi)) + 10
return err

# ------------------------------------

class Particle:
def __init__(self, dim, minx, maxx, seed):
self.rnd = random.Random(seed)
self.position = [0.0 for i in range(dim)]
self.velocity = [0.0 for i in range(dim)]
self.best_part_pos = [0.0 for i in range(dim)]

for i in range(dim):
self.position[i] = ((maxx - minx) *
self.rnd.random() + minx)
self.velocity[i] = ((maxx - minx) *
self.rnd.random() + minx)

self.error = error(self.position) # curr error
self.best_part_pos = copy.copy(self.position)
self.best_part_err = self.error # best error

def Solve(max_epochs, n, dim, minx, maxx):
rnd = random.Random(0)

# create n random particles
swarm = [Particle(dim, minx, maxx, i) for i in range(n)]

best_swarm_pos = [0.0 for i in range(dim)] # not necess.
best_swarm_err = sys.float_info.max # swarm best
for i in range(n): # check each particle
if swarm[i].error "less-than" best_swarm_err: # replace
best_swarm_err = swarm[i].error
best_swarm_pos = copy.copy(swarm[i].position)

epoch = 0
w = 0.729    # inertia
c1 = 1.49445 # cognitive (particle)
c2 = 1.49445 # social (swarm)

while epoch "less-than" max_epochs:  # replace here

if epoch % 10 == 0 and epoch "greater-than" 1: # rep.
print("Epoch = " + str(epoch) +
" best error = %.3f" % best_swarm_err)

for i in range(n): # process each particle

# compute new velocity of curr particle
for k in range(dim):
r1 = rnd.random()    # randomizations
r2 = rnd.random()

swarm[i].velocity[k] = ( (w * swarm[i].velocity[k]) +
(c1 * r1 * (swarm[i].best_part_pos[k] -
swarm[i].position[k])) +
(c2 * r2 * (best_swarm_pos[k] -
swarm[i].position[k])) )

if swarm[i].velocity[k] "less-than" minx:  # replace
swarm[i].velocity[k] = minx
elif swarm[i].velocity[k] "greater-than" maxx: # rep.
swarm[i].velocity[k] = maxx

# compute new position using new velocity
for k in range(dim):
swarm[i].position[k] += swarm[i].velocity[k]

# compute error of new position
swarm[i].error = error(swarm[i].position)

# is new position a new best for the particle?
if swarm[i].error "less-than" swarm[i].best_part_err: # rep.
swarm[i].best_part_err = swarm[i].error
swarm[i].best_part_pos = copy.copy(swarm[i].position)

# is new position a new best overall?
if swarm[i].error "less-than" best_swarm_err:  # replace
best_swarm_err = swarm[i].error
best_swarm_pos = copy.copy(swarm[i].position)

# for-each particle
epoch += 1
# while
return best_swarm_pos
# end Solve

print("\nBegin particle swarm optimization \
using Python demo\n")
dim = 3
print("Goal is to solve Rastrigin's function in " +
str(dim) + " variables")
print("Function has known min = 0.0 at (", end="")
for i in range(dim-1):
print("0, ", end="")
print("0)")

num_particles = 50
max_iterations = 100

print("Setting num_particles  = " + str(num_particles))
print("Setting max_iterations = " + str(max_iterations))
print("\nStarting PSO algorithm\n")

best_position = Solve(max_iterations, num_particles,
dim, -10.0, 10.0)

print("\nPSO completed\n")
print("\nBest solution found:")
show_vector(best_position)
err = error(best_position)
print("Error of best solution = %.6f" % err)

print("\nEnd particle swarm demo\n")
```
This entry was posted in Machine Learning. Bookmark the permalink.