Logistic Regression using R – Briefly

Logistic regression is a technique used to make a prediction when the thing to predict can take only one of two values. For example, suppose you want to predict a person’s political party (“red” or “blue”) based on their age and education level.

LogisticRegressionBriefly

Here’s my micro-demo using the R language. First I read data stored in a text file:

mydf = read.table("AgeEduParty.txt",
  sep=",", header=T)
mydf$Party <- factor(mydf$Party,
  levels=c("red","blue"))
mydf

   Age Edu Party
1    1   4   red
2    5   8   red
3    3   7   red
4    2   5   red
5    6   7   red
6    3   2  blue
7    7   5  blue
8    4   5  blue
9    2   3  blue
10   4   7  blue

Next I create the prediction model using the glm function:

mymodel = glm(Party ~ Age + Edu, data=mydf,
  family="binomial")
summary(mymodel)

Coefficients:
            
(Intercept)   3.5566  
Age           0.9939
Edu          -1.3191

This tell me the prediction equation is p = 1 / (1 + e^-z) where z = 3.5566 + (0.9939)(Age) + (-1.3191)(Edu). I make a prediction for the first person who has Age = 1 and Edu = 4:

z <- 3.5566 + 0.9939*1 + (-1.3191*4)
p <- 1 / (1 + exp(-z))
p
[1] 0.3260951

Here the probability result of 0.3261 is less than 0.5 so it’s closer to 0 which is “red”. If the probability result had been greater than 0.5 the prediction would have been “blue”.

Advertisements
This entry was posted in R Language. Bookmark the permalink.