Logistic Regression using Python

I wrote an article in the January 2018 issue of Visual Studio Magazine titled “Logistic Regression using Python. See https://visualstudiomagazine.com/articles/2018/01/04/logistic-regression.aspx.

The goal of a binary classification problem is to predict a class label, which can take one of two possible values, based on the values of two or more predictor variables (sometimes called features in machine language terminology). For example, you might want to predict the sex (male = 0, female = 1) of a person based on their age, annual income and height.

There are several quite different techniques you can use to tackle a binary classification problem. Logistic Regression is one of the simplest. The article shows how to implement logistic regression to solve a binary classification problem, using a program coded in Python, the current programming language of choice for machine learning.

There are many existing systems you can use to perform binary classification with logistic regression. However, when using an external library, your system may be difficult (for technical reasons) or impossible (for legal reasons) to integrate into an existing system. Coding logistic regression from scratch gives you full control over your code, and allows you to fully understand how your prediction system works.

The strength of logistic regression for binary classification is simplicity. The major disadvantage of logistic regression is that it only works well for data that is mostly linearly separable. That is, data where you can conceptually separate the two classes to predict with a straight line.

The major alternative to logistic regression for binary classification is to use a neural network. Neural networks can deal with complex data. Another alternative is to use a support vector machine (SVM). SVMs have fallen out of favor, at least among all my research and engineering colleagues. SVMs are more complex than neural networks and rarely perform as well as neural networks in practice.

This entry was posted in Machine Learning. Bookmark the permalink.