I was chatting with a colleague recently and the topic of ROC (receiver operating characteristic) came up. ROC is one of those math topics that is related to many, many other math topics, and so it’s difficult to explain ROC succinctly. Here’s how I think about ROC, as briefly as possible.
In a nutshell, ROC is a graph that shows characteristics of a binary classifier. A binary classifier is some math model that predicts something that can take one of two possible values. For example, a binary classifier might predict the sex (0 = male, 1 = female) of a person based on their age, income, and weight.
Here’s a set of three ROC curves on the same graph from the Wikipedia article on ROC.
In a binary classifier, there are four possible outcomes:
1. you predict positive (1) and the result is positive
(“true positive” = TP)
2. you predict positive (1) but the result is negative
(“false positive” = FP)
3. you predict negative (0) and the result is negative
(“true negative” = TN)
4. you predict negative (0) but the result is positive
(“false negative” = FN)
Now for any prediction model there are different parameter(s) you can adjust. Suppose there’s just one parameter you can adjust (giving a specific example would take a few paragraphs, which is what I’m trying to avoid). Call the parameter “theta” (or anything) and suppose it’s value is 3.14. And now you make 100 predictions using theta and get:
1. count true positive = 40
2. count false positive = 30
3. count true negative = 20
4. count false negative = 10
The True Positive Rate (TPR) is:
TP / (TP + FN) = 40 / (40 + 10) = 40 / 50 = 0.80
The False Positive Rate (FPR) is:
FP / (FP + TN) = 30 / (30 + 20) = 30 / 50 = 0.60
So, for theta = 3.14, TPR = 0.80 and FPR = 0.60.
And now, suppose you repeat your experiment using several different values of theta. Each value of theta will give you a different pair of (TPR, FPR) values. And now, if you graph all these sets with the FPR on the x-axis and the TRP rate on the y-axis, and connect the dots, you get a ROC. Whew! Each part of the process is simple, but there are a lot of parts.
And, to extend the idea a bit further, suppose you have more than one prediction model. Each model’s adjustable parameter can be varied giving a different ROC curve.
The image on the Wikipedia article on ROC shows three prediction models. Each point on each of the three curves (the curves don’t show you the points) corresponds to a model parameter value. ROC curves that are “higher” are better.