Precision and Recall in Information Retrieval

I was chatting with a colleague recently. He worked in Bing Search. He mentioned precision and recall.

This reminded me of my days when I worked in Bing Search v1.0 several years ago. For the most recent years I’ve been working with machine learning rather than information retrieval. The terms precision and recall have different meanings in information retrieval and machine learning.

For IR, precision and recall are best explained by an example. Suppose you have 100 documents and a search query of “ford”. Of the 100 documents, suppose 20 are related/relevant to the term “ford”, and the other 80 are not relevant to “ford”.

Now suppose your search algorithm returns 25 result documents where 15 docs are in fact relevant (but meaning you incorrectly missed 5 relevant docs) and 10 result docs are not relevant (meaning you correctly omitted 70 of the irrelevant docs).

precisionandrecall_informationretrieval

The precision is the fraction/percentage of retrieved docs that are relevant. The recall is the fraction/percentage of relevant docs that were retrieved.

Precision and recall are calculated as:

precision = 15 retrieved relevant / 25 total retrieved
          = 0.60

recall = 15 retrieved relevant / 20 total relevant
       = 0.75

In short, both precision and recall in information retrieval are measures of goodness that are tied to the notion of relevance. Precision and recall in machine learning binary classification are very different (although the underlying principles are similar).

This entry was posted in Machine Learning. Bookmark the permalink.