Argh! I have to post on this topic.
Strewn throughout the Internet is a graph that is supposed to explain what logistic regression is and how it works. I’ve seen this graph, and variations of it, for years and it has been blindly copied dozens of times. And it is so completely wrong.
Here are two common versions of the horrible graph I’m talking about:
The graphs are worse than meaningless. They’re actively misleading.
I created an example and two diagrams that correctly illustrate what logistic regression is. I set up 10 dummy items where the goal is to predict if a person is male (class 0) or female (class 1) based on just two predictor variables, x0 = Age and x1 = Income. I plotted the data on the top graph. This was possible only because there are just two predictor variables — if there were three or more I couldn’t have made a 2D graph even though logistic regression works for any number of predictor variables. There are two colors for the dots because logistic regression is a binary classifier technique.
The top graph is training data for a logistic regression problem. The bottom graph is logistic regression for the data.
Logistic regression is designed to handle data that is mostly linearly separable, as is the case for the dummy data.
NOTE: When data is completely linearly separable, as here, there are two huge problems. First, there are an infinite number of solution weights and biases. Second, if you use some form of simple stochastic gradient descent, the weights and biases can grow towards plus or minus infinity. These two problems are very complex in theory. In practice there are easy ways to deal with data that is completely linearly separable.
The bottom graph illustrates how logistic regression works. First you find a weight for each variable and a bias value. I used one of dozens of training techniques and got w0, the weight for age x0, equal to 13.5. I got w1, the weight for income x1, equal to -12.2. I got a bias value of 1.12.
For each data item, you compute a predicted class in two steps. First z = (w0 * x0) + (w1 * x1) + b. Then p = 1 / (1 + exp(-z)). If p is less than 0.5 the predicted class is 0 (male), otherwise if p is greater than 0.5 the predicted class is 1 (female).
The equation for p is called the logistic sigmoid function. It is an “S” shaped curve where z on the horizontal axis runs from minus infinity to plus infinity, and p on the vertical axis s always between 0 and 1. The logistic sigmoid function always looks exactly the same. The predicted p values for each data item will always lie exactly on the line of the graph of the function, as shown. Dots below 0.5 (the red dashed line) are class 0, dots above 0.5 are class 1.
So, the horrible graph you will see plastered everywhere on the Internet is an incorrect combination of plotting the data items together with the logistic regression function.
People who put the bad graph, or a version of it, on their blog sites clearly do not fully understand logistic regression.
Machine learning is not simple. Anyone with reasonably good math skill can learn ML but it requires a lot of study.
Shown below are three of the most famous graphs in history.
Top: Anscombe’s Quartet (1973) shows four datasets. All four datasets have identical linear regression coefficients, x and y means, x and y variance, and Pearson Correlation Coefficients. The point is that sometimes statistics by themselves aren’t enough to describe a dataset.
Center: Murray’s Bell Curve (1994) shows the IQ of two different groups. The point is that the difference in intelligence between groups is surprisingly large (about a full standard deviation) and there are many interpretations of what this intelligence gap means.
Bottom: Snow’s London Cholera map (1854) shows the sources of a cholera outbreak in London. It revealed that there were many deaths near a water pump on Broad Street, which suggested that cholera might be spread by contaminated water (it was later determined it is).