Neural Network Cross Entropy Error using Python

I wrote an article in the July 2017 issue of Visual Studio Magazine titled “Neural Network Cross Entropy Error using Python”. See

For beginners to neural networks, cross entropy error (also called “log loss”) can be very confusing. Cross entropy error is actually quite simple, but like many topics in mathematics, there are many, many ways to look at CE error and so there are many, many different explanations. These explanations at first seem very different, but in fact are the same mathematically, but it takes a lot of time to understand the relationships.

In the early days of neural networks, the nearly universal technique used to compare computed output values to desired target values (from training data) was mean squared error (MS error). For example, suppose for a given set of input values and current NN weight values, the computed output values are (0.20, 0.70, 0.10). If the target values are (0, 1, 0) then squared error is (0.20 – 0)^2 + (0.70 – 1)^2 + ((0.10 – 0)^2 = 0.04 + 0.09 + 0.01 = 0.14. If you computed squared error for all the training items and then took the average, you’d have mean squared error.

Cross entropy error for the same data as above would be -[ln(0.20)*0 + ln(0.70)*1 + ln(0.10)*0] = -(0 + (-0.36) + 0) = 0.36. Notice that for neural network classification, because target values will have just one 1-value and all the rest 0-values, only one term doesn’t drop out of the calculation.

In my article I explain how to use cross entropy error with neural network back-propagation training. As it turns out, using cross entropy error usually leads to better results than mean squared error (the explanation of why is too long for this blog post), and so CE error is now the default error measurement used for neural networks.

The moral is that, if you’re a beginner to NNs, the amount of detail can seem overwhelming at first. But there are only a finite number of things that are essential. Understanding cross entropy error is one of those essential topics.

This entry was posted in Machine Learning. Bookmark the permalink.