Logistic regression is a technique used to make a prediction when the thing to predict can take only one of two values. For example, suppose you want to predict a person’s political party (“red” or “blue”) based on their age and education level.

Here’s my micro-demo using the R language. First I read data stored in a text file:

mydf = read.table("AgeEduParty.txt",
sep=",", header=T)
mydf$Party <- factor(mydf$Party,
levels=c("red","blue"))
mydf
Age Edu Party
1 1 4 red
2 5 8 red
3 3 7 red
4 2 5 red
5 6 7 red
6 3 2 blue
7 7 5 blue
8 4 5 blue
9 2 3 blue
10 4 7 blue

Next I create the prediction model using the glm function:

mymodel = glm(Party ~ Age + Edu, data=mydf,
family="binomial")
summary(mymodel)
Coefficients:
(Intercept) 3.5566
Age 0.9939
Edu -1.3191

This tell me the prediction equation is p = 1 / (1 + e^-z) where z = 3.5566 + (0.9939)(Age) + (-1.3191)(Edu). I make a prediction for the first person who has Age = 1 and Edu = 4:

z <- 3.5566 + 0.9939*1 + (-1.3191*4)
p <- 1 / (1 + exp(-z))
p
[1] 0.3260951

Here the probability result of 0.3261 is less than 0.5 so it’s closer to 0 which is “red”. If the probability result had been greater than 0.5 the prediction would have been “blue”.

### Like this:

Like Loading...

*Related*