Graphing the Decision Boundary for a Logistic Regression Model

The basic form of Logistic Regression (LR) uses two or more numeric predictor variables to predict a binary value. For example, you might want to predict the sex of a person (male = 1, female = 0) based on age (X1) and income (X2).

The LR math model for two predictors is p = 1.0 / (1.0 + exp(-z)) where z = b0 + (b1)(X1) + (b2)(X2). For example, suppose age = 3.5 (normalized, so perhaps the actual age is 35 years) and income = 5.5 (again normalized so perhaps income is $55,000). And suppose b0 = -0.10, b1 = -0.70, and b2 = 0.80 then z = (-0.10) + (-0.70)(3.5) + (0.80)(5.5) = 1.85 and p = 1.0 / (1.0 + exp(1.85) = 0.8641. Because p > 0.5, the prediction is 1 = male.

LR generalizes to any number of predictor variables. But for the special situation where there are just two predictor variables, you can graph both the data points and the so-called decision boundary line.

The decision boundary line y-intercept is -(b0 / b2) and the slope is -(b1 / b2). This comes from solving for p > 0.5 (which isn’t entirely obvious). For the example data, the y-intercept = -(-0.10 / 0.80) = 0.125 and the slope = -(-0.70 / 0.80) = 0.875 which I’ve graphed below as the green dotted line. Any point above the line would be male = 1 and any point below the line would be female = 0.

Advertisements
This entry was posted in Machine Learning. Bookmark the permalink.