Precision, Recall, Type I Error, Type II Error, True Positive and False Positive, and ROC Curves

The concepts of precision and recall, type I and type II errors, and true positive and false positive are very closely related. Precision and recall are terms often used in data categorization where each data item is placed into one of several categories. Take for example the artificial example of looking at 100 people as they walk by and categorizing each person as a U.S. citizen or a non-U.S. citizen (i.e., two categories). Precision is the number of people correctly categorized as U.S. citizens divided by the total number of people categorized as U.S. citizens. So, suppose 100 people walk by. Unknown to you, 74 of those people are in fact U.S. citizens and 26 are not U.S. citizens. Now, of the 100 people who walk by suppose you categorize 70 people as being U.S. citizens, and 60 of those people actually are U.S. citizens but 10 are not U.S. citizens. Additionally, you categorize 30 people as non-U.S. citizens, and 16 of those people are in fact non-U.S. citizens but 14 of them are U.S. citizens. Your precision is 60 / (60 + 10) = 0.857. Recall is the number of people correctly categorized as U.S. citizens divided by the total number of people who are in fact U.S. citizens = 60 / (60 + 14) = 0.811. By the way, the term precision here is not at all the same as the normal math meaning of precision when used in association with the term accuracy.

I find precision and recall terminology a bit unintuitive and I generally prefer to think of problems in terms of true positives and false positives. A true positive is a positive example (“is a U.S. citizen”) correctly identified as a positive. A false positive is a negative example (“is not a U.S. citizen”) incorrectly identified as a positive. Also, a true negative is a negative example (“is not a U.S. citizen”) correctly identified as a negative. And a false negative is a positive example incorrectly identified as a negative. Notice that the “true” and “false” here can be interpreted as “correct” and “incorrect” respectively and the “positive” and “negative” can be interpreted as “labeled as positive” and “labeled as negative” respectively.

The terms are easy to confuse. In the example, the count of true-positives (correctly labeled U.S. citizen) is 60. Count of true-negatives (correctly labeled as non-citizen) is 16. Count of false-positives (incorrectly labeled U.S. citizen) is 10. Count of false-negatives (incorrectly labeled as non-citizen) is 14. Note that TP + TN + FP + FN adds up to 100. Using these counts:

precision = TP / (TP + FP) = 60 / (60 + 10) = 60 / 70 = 0.857

recall = TP / (TP + FN) = 60 / (60 + 14) = 60 / 74 = 0.811

In statistics there are type I errors and type II errors. Relative to true positive and false positive terminology, a type I error occurs when you reject the null hypothesis (as false) when it is actually true, which by convention corresponds to a false positive. A type II error occurs when you accept the null hypothesis (as true) when it is actually false, which by convention corresponds to a false negative.

Adding to the confusion is the fact that precision and recall can also be defined somewhat differently in a slightly different context of information retrieval (such as retrieving Web pages based on a search query).

ROC, which stands for Receiver (or Relative) Operating Characteristic is a plot, for a given predictor on a set of data, of the percentage (or probability) of true positive values (on the y-axis) versus the percentage of false positive values (on the x-axis). If you have multiple predictors, you can make an ROC curve. It turns out that points above the lower-left to upper-right diagonal are good predictors, with the upper left hand corner of the plot marking perfect categorization.

This entry was posted in Machine Learning, Software Test Automation. Bookmark the permalink.