## What is Machine Learning Perplexity?

In machine learning, the term perplexity has three closely related meanings. Perplexity is a measure of how easy a probability distribution is to predict. Perplexity is a measure of how variable a prediction model is. And perplexity is a measure of prediction error. The third meaning of perplexity is calculated slightly differently but all three have the same fundamental idea.

Suppose you have a four-sided dice (not sure what that’d be). The dice is fair so all sides are equally likely (0.25, 0.25, 0.25, 0.25). Perplexity is defined:

and so it’s value here is 4.00. Now suppose you have a different dice whose sides have probabilities (0.10, 0.40, 0.20, 0.30). This dice has perplexity 3.5961 which is lower than 4.00 because it’s easier to predict (namely, predict the side that has p = 0.40).

Now suppose you have some neural network that predicts which of three outcomes will occur. The prediction probabilities are (0.20, 0.50, 0.30). Using the equation above the perplexity is 2.8001. Models with lower perplexity have probability values that are more varied, and so the model is making “stronger predictions” in a sense.

Now suppose you are training a model and you want a measure of error. You have three data items:

```prediction         targets
------------------------------------------------
0.10  0.20  0.70   0  0  1   (correct)
0.10  0.70  0.20   0  1  0   (correct)
0.30  0.40  0.30   1  0  0   (wrong)
```

The average cross entropy error is 0.2775. Using the ideas of perplexity, the average perplexity is 2.2675 — in both cases higher values mean more error.