Particle swarm optimization (PSO) is a meta-heuristic that can be used to construct a specific algorithm to find the minimum of an error function. In theory, PSO could improve neural network training because PSO does not use Calculus gradients like virtually every standard training algorithm — stochastic gradient descent and all of its variations such as Adam optimization.
In practice however, PSO is much too slow for very large, very deep neural networks. PSO maintains a collection (swarm) of possible solutions (particles). The “position” of a particle is a possible solution. For example, if some particle has position (4.5, -0.8, 3.2, 6.6, -1.7) this means (4.5, -0.8, 3.2, 6.6, -1.7) is a possible solution, for, say the five weights for a tiny neural network.
At a very high level, PSO looks like:
create a swarm of random possible solutions loop max_iterations for-each particle/solution in swarm compute a velocity use velocity to update particle/solution end-for end-loop return best particle/solution found by any particle
In code, each particle/solution is repeatedly updated like so:
for k in range(dim): swarm[i].position[k] += swarm[i].velocity[k]
In words, each particle (swarm[i]) component (position[k]) is the current component plus a “velocity”. A new velocity is computed as:
swarm[i].velocity[k] = ( (w * swarm[i].velocity[k]) + (c1 * r1 * (swarm[i].best_part_pos[k] - swarm[i].position[k])) + (c2 * r2 * (best_swarm_pos[k] - swarm[i].position[k])) )
It’s not immediately obvious but the new velocity of a particle is based on the best position/solution the particle has seen so far (best_part_pos) and the best position/solution seen by any particle in the swarm (best_swarm_pos).
There are several constants and two randomization (r1, r2). The w constant is called inertia, the c1 constant is called cognitive/particle/self, and the c2 constant is called social/swarm/group. Most implementations of PSO use w = 0.729, c1 = 1.49445, c2 = 1.49445.
Because PSO is a loosely defined meta-heuristic rather than a tightly defined algorithm, there are many hundreds of reasonable variations that researchers can explore.
Many of the PSO variations involve using variable/adaptive values for w, c1, c2 rather than fixed constants. Other PSO variations do periodic resets or use multiple swarms to prevent the swarm from focusing too closely on one solution (premature convergence) that might not be optimal.
Four examples of the adaptive approach include “Linear Decreasing Inertia Weight” PSO (LDIW-PSO) which slowly decreases the w-inertia value, “Chaotic Descending Inertia Weight” PSO (CDIW-PSO) which adjusts the w-inertia value in a more randomized way, “Random Inertia Weight and Evolutionary Strategy” PSO (REPSO) which uses random w-inertia values plus some ideas from simulated annealing optimization, and “Adaptive Particle Swarm Optimization” (APSO) which uses the mutation idea from genetic algorithms.
In some sense all of these variations of PSO are really just excuses to publish yet-another research paper. The explorations are legitimate but they have almost no practical value — there are so many hyperparameters involved that any PSO variant can be shown to be better than any other variant by fiddling with the hyperparameters values.
But in the long run, optimization algorithms based on PSO and other similar meta-heuristics (spiral optimization, evolutionary algorithms, bacterial foraging optimization, fireworks optimization, etc., etc.) have great potential when vastly increased processing power becomes available — perhaps through quantum computing breakthroughs, or perhaps just a slow but steady increase in classical computing hardware.
An example of combining computer science with art. I stumbled across a 2020 research paper “Generative Art with Swarm Landscapes” by D. de Andrade, N. Fachada, C. Fernandes, and A. Rosa. The authors took the Mona Lisa, digitized it and used the pixel values to define an error function. Then they applied particle swarm optimization and graphed the paths explored by the particles as they searched the image. Art like this is interesting and has a particular (semi-pun) form of beauty, but I much prefer human, non-computer generated, art.
Basic PSO demo code below.
# particle_swarm_demo.py # python 3.7.6 # demo of particle swarm optimization (PSO) # solves Rastrigin's function import random import math # cos() for Rastrigin import copy # array-copying convenience import sys # max float # ------------------------------------ def show_vector(vector): for i in range(len(vector)): if i % 8 == 0: # 8 columns print("\n", end="") if vector[i] "greater-equal" 0.0: # replace here print(' ', end="") print("%.4f" % vector[i], end="") # 4 decimals print(" ", end="") print("\n") def error(position): err = 0.0 for i in range(len(position)): xi = position[i] # one of several minor Rastrigin variants err += (xi * xi) - (10 * math.cos(2 * math.pi * xi)) + 10 return err # ------------------------------------ class Particle: def __init__(self, dim, minx, maxx, seed): self.rnd = random.Random(seed) self.position = [0.0 for i in range(dim)] self.velocity = [0.0 for i in range(dim)] self.best_part_pos = [0.0 for i in range(dim)] for i in range(dim): self.position[i] = ((maxx - minx) * self.rnd.random() + minx) self.velocity[i] = ((maxx - minx) * self.rnd.random() + minx) self.error = error(self.position) # curr error self.best_part_pos = copy.copy(self.position) self.best_part_err = self.error # best error def Solve(max_epochs, n, dim, minx, maxx): rnd = random.Random(0) # create n random particles swarm = [Particle(dim, minx, maxx, i) for i in range(n)] best_swarm_pos = [0.0 for i in range(dim)] # not necess. best_swarm_err = sys.float_info.max # swarm best for i in range(n): # check each particle if swarm[i].error "less-than" best_swarm_err: # replace best_swarm_err = swarm[i].error best_swarm_pos = copy.copy(swarm[i].position) epoch = 0 w = 0.729 # inertia c1 = 1.49445 # cognitive (particle) c2 = 1.49445 # social (swarm) while epoch "less-than" max_epochs: # replace here if epoch % 10 == 0 and epoch "greater-than" 1: # rep. print("Epoch = " + str(epoch) + " best error = %.3f" % best_swarm_err) for i in range(n): # process each particle # compute new velocity of curr particle for k in range(dim): r1 = rnd.random() # randomizations r2 = rnd.random() swarm[i].velocity[k] = ( (w * swarm[i].velocity[k]) + (c1 * r1 * (swarm[i].best_part_pos[k] - swarm[i].position[k])) + (c2 * r2 * (best_swarm_pos[k] - swarm[i].position[k])) ) if swarm[i].velocity[k] "less-than" minx: # replace swarm[i].velocity[k] = minx elif swarm[i].velocity[k] "greater-than" maxx: # rep. swarm[i].velocity[k] = maxx # compute new position using new velocity for k in range(dim): swarm[i].position[k] += swarm[i].velocity[k] # compute error of new position swarm[i].error = error(swarm[i].position) # is new position a new best for the particle? if swarm[i].error "less-than" swarm[i].best_part_err: # rep. swarm[i].best_part_err = swarm[i].error swarm[i].best_part_pos = copy.copy(swarm[i].position) # is new position a new best overall? if swarm[i].error "less-than" best_swarm_err: # replace best_swarm_err = swarm[i].error best_swarm_pos = copy.copy(swarm[i].position) # for-each particle epoch += 1 # while return best_swarm_pos # end Solve print("\nBegin particle swarm optimization \ using Python demo\n") dim = 3 print("Goal is to solve Rastrigin's function in " + str(dim) + " variables") print("Function has known min = 0.0 at (", end="") for i in range(dim-1): print("0, ", end="") print("0)") num_particles = 50 max_iterations = 100 print("Setting num_particles = " + str(num_particles)) print("Setting max_iterations = " + str(max_iterations)) print("\nStarting PSO algorithm\n") best_position = Solve(max_iterations, num_particles, dim, -10.0, 10.0) print("\nPSO completed\n") print("\nBest solution found:") show_vector(best_position) err = error(best_position) print("Error of best solution = %.6f" % err) print("\nEnd particle swarm demo\n")