## Logistic Regression Classification with Multi-Swarm Optimization

I wrote an article titled “Logistic Regression Classification with Multi-Swarm Optimization” in the January 2015 issue of MSDN Magazine. See http://msdn.microsoft.com/en-us/magazine/dn890377.aspx.

I started my article by explaining what logistic regression classification is:

One of the most fundamental forms of machine learning is logistic regression (LR) classification. The goal of LR classification is to create a model that predicts a variable that can take on one of two possible values. For example, you might want to predict which of two candidates a voter will select (“Smith” = 0, “Jones” = 1) based on the voter’s age (x1), sex (x2) and annual income (x3).

If Y is the predicted value, an LR model for this problem would take the form:

z = b0 + b1(x1) + b2(x2) + b3(x3)
Y = 1.0 / (1.0 + e^-z)

Here, b0, b1, b2 and b3 are weights, which are just numeric values that must be determined. In words, you compute a value z that is the sum of input values times b-weights, plus a b0 constant, then feed the z value to the equation that uses math constant e. It turns out that Y will always be between 0 and 1. If Y is less than 0.5, you conclude the output is 0 and if Y is greater than 0.5 you conclude the output is 1.

For example, suppose a voter’s age is 32, sex is male (-1), and annual income in tens of thousands of dollars is 48.0. And suppose b0 = -9.0, b1 = 8.0, b2 = 5.0, and b3 = -5.0. Then z = -9.0 + (8.0)(32) + (5.0)(-1) + (-5.0)(48.0) = 2.0 and so Y = 1.0 / (1.0 + e^-2.0) = 0.88.

Because Y is greater than 0.5, you’d conclude the voter will pick candidate 1 (“Jones”). But where do the b-weight values come from? Training an LR classifier is the process of finding the values for the b-weights. The idea is to use training data that has known output values and then find the set of b values so that the difference between the computed output values and the known output values is minimized. This is a math problem that’s often called numerical optimization.

Next, I explained what multi-swarm optimization is, and how it’s used in logistic regression classification:

There are about a dozen major optimization algorithms used in machine learning. For logistic regression classification training, two of the most common algorithms are called iterative Newton-Raphson and L-BFGS. In this article, I present a technique called multi-swarm optimization (MSO). MSO is a variation of particle swarm optimization (PSO). In MSO, a virtual particle has a position that corresponds to a set of b-weight values. A swarm is a collection of particles that move in a way inspired by group behavior such as the flocking of birds. MSO maintains several swarms that interact with each other, as opposed to PSO, which uses just one swarm.

In the reminder of the article, I presented a code implementation.