Machine Learning Scoring Rules

A scoring rule is a function that measures the accuracy of a set of predicted probabilities. For example, suppose you have a very oddly shaped dice with four sides. You somehow predict the probabilities of the four sides:

(0.20, 0.10, 0.40, 0.30)

Then you actually roll the dice several thousand times and find that the actual probabilities are:

(0.25, 0.15, 0.50, 0.10)

How good was your prediction? Instead of using a scoring rule, you could use ordinary error, for example, you could use mean squared error:

MSE = (0.05^2 + 0.05^2 + 0.10^2 + 0.20^2) / 4
    = 0.0025 + 0.0025 + 0.01 + 0.04
    = 0.01375

Error values are always positive so smaller values indicate a better prediction.

But a more common approach is to use a scoring rule, which is sort of an indirect measure of error. Scoring rules are most often used in situations where one outcome occurs. For example, suppose you have a system that predicts the probabilities of who will win a political election between Adams, Baker, and Cogan:

p = (0.20, 0.50, 0.30)

Suppose the election is held and Cogan wins. The actual probabilities are:

x = (0.00, 0.00, 1.00)

The Logarithmic Scoring Rule is calculated like so:

LSR = (0.0)(ln(0.2)) + (0.0)(ln(0.5)) + (1.0)(ln(0.30))
    = ln(0.30)
    = -1.20

Notice that the calculation can be simplified to just “take the ln of the probability associated with the actual outcome.”

Suppose your prediction that Cogan would win was better:

(0.10, 0.20, 0.70)

Now the LSR = ln(0.70) = -0.35 which is greater (less negative) than the first prediction. In short, LSR values are always negative and larger (less negative) values indicate a better prediction.

This entry was posted in Machine Learning. Bookmark the permalink.