## Logistic Regression using R – Briefly

Logistic regression is a technique used to make a prediction when the thing to predict can take only one of two values. For example, suppose you want to predict a person’s political party (“red” or “blue”) based on their age and education level.

Here’s my micro-demo using the R language. First I read data stored in a text file:

```mydf = read.table("AgeEduParty.txt",
mydf\$Party <- factor(mydf\$Party,
levels=c("red","blue"))
mydf

Age Edu Party
1    1   4   red
2    5   8   red
3    3   7   red
4    2   5   red
5    6   7   red
6    3   2  blue
7    7   5  blue
8    4   5  blue
9    2   3  blue
10   4   7  blue
```

Next I create the prediction model using the glm function:

```mymodel = glm(Party ~ Age + Edu, data=mydf,
family="binomial")
summary(mymodel)

Coefficients:

(Intercept)   3.5566
Age           0.9939
Edu          -1.3191
```

This tell me the prediction equation is p = 1 / (1 + e^-z) where z = 3.5566 + (0.9939)(Age) + (-1.3191)(Edu). I make a prediction for the first person who has Age = 1 and Edu = 4:

```z <- 3.5566 + 0.9939*1 + (-1.3191*4)
p <- 1 / (1 + exp(-z))
p
[1] 0.3260951
```

Here the probability result of 0.3261 is less than 0.5 so it’s closer to 0 which is “red”. If the probability result had been greater than 0.5 the prediction would have been “blue”.