## Logistic Regression Using PyTorch with L-BFGS in Visual Studio Magazine

I wrote an article titled “Logistic Regression Using PyTorch with L-BFGS” in the June 2021 edition of the online Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2021/06/23/logistic-regression-pytorch.aspx.

Logistic regression is one of many machine learning techniques for binary classification — predicting one of two possible discrete values. In my article I give an end-to-end demo that predicts the sex (male = 0, female = 1) of a hospital patient based on age, county of residence, blood monocyte count and hospitalization history.

A logistic regression prediction equation looks somewhat like:

```z = (w0 * age) + (w1 * county) +
(w2 * monocyte) + (w3 * history) + b
p = 1 / (1 + exp(-z))
```

The p value will be between 0 and 1. A p value less than 0.5 indicates a prediction of class 0, a p value greater than 0.5 indicates class 1.

Left: A screenshot of a demo run from the article. Right: Data and a graph for a simple problem, that I used to explain how logistic regression works.

Finding the values of the wi weights and the bias b is called training the model. The idea is to try different values of the weights and the bias to find the values that give the best results on training data that has known input values (age, etc.) and correct target values (sex = 0 or 1).

There is no closed form solution to find the best values of the weights and the bias, so there are at least a dozen major algorithms to estimate the weights and bias values. These optimization algorithms (in math terms you are minimizing error/loss) include stochastic gradient descent, iterated Newton-Raphson, Nelder-Mead (aka amoeba method), particle swarm optimization, evolutionary optimization, and . . . L-BFGS (“limited memory Broyden Fletcher Goldfarb Shanno”) algorithm.

The L-BFGS algorithm estimates a Calculus first derivative (gradient) and also a second derivative (Hessian). This requires all data to be in memory but produces very fast training.

To summarize, there are many tecniques to create a binary classification model that uses an equation with weights and biases. Each of these these equation-based techniques can be trained using one of many optimization algorithms. My article explained one specific scenario: logistic regression with L-BFGS using the PyTorch code library.

Three advantages of using PyTorch logistic regression with L-BFGS optimization are:

1. The simplicity of logistic regression compared to techniques like support vector machines
2. The flexibility of PyTorch compared to rigid high level systems such as scikit-learn
3. The speed of L-BFGS compared to most forms of stochastic gradient descent

Three disadvantages of the technique presented in the article are:

1. The crudeness of logistic regression compared to much more powerful models such as deep neural binary classifiers
2. Longer development time compared to ready-to-use models like the scikit-learn LogisticRegression() class
3. The requirement that all training data fit into memory compared to techniques that allow batch processing

Binary classification is sort of the quintessential form of machine learning. Chess is quintessentially binary, with the black and white pieces, two players, and black and white squares on the chessboard. I grew up loving chess which probably has something to do with why I love computer science and machine learning. Here are four images from an Internet image search for “chess queen Halloween costume”. I don’t find these images particularly appealing, but I do think they’re interesting in a binary sort of way.

This entry was posted in PyTorch. Bookmark the permalink.