Neural Network Cross Entropy Error

I wrote an article titled “Neural Network Cross Entropy Error” that appears in the April 2014 issue of Visual Studio Magazine. See I explain what cross entropy error is and show how to use it.


When training a neural network, you need a measure of error that compares computed output values (for a given set of input values, and weights and bias values) with the known target desired output values of some training data. The most common measure of error is mean squared error which is the average of the squared differences between computed and target values.

However, there is some research evidence that suggests that an alternate measure of error, called cross entropy error, may be superior. Suppose a neural network has three output nodes and the target output values are (0.0, 1.0, 0.0) and the computed output values are (0.1, 0.6, 0.3). The squared error for this one item is (0.0 – 0.1)^2 + (1.0 – 0.6)^2 + (0.0 – 0.3)^2 = 0.01 + 0.16 + 0.09 = 0.26.

Nothing unusual there. The cross entropy error for the same data is -1 * (ln(0.1) * 0.0) + (ln(0.6) * 1.0) + (ln(0.3) * 0.0) = -1 * -1.204 = 1.204. Notice that in neural network classification, where all but one of the target values will be 0.0, cross entropy error really only takes into account one computed output value, the one associated with the single output node that has target value of 1.0.

There is a weird connection between cross entropy error and the back-propagation training algorithm. Under the covers, the original back-propagation algorithm assumed the use of mean squared error. That assumption leads to an equation used to update weights during training. But if you assume the use of cross entropy error, then the resulting update equation has several terms that cancel each other out which results in a very simple update equation.

This entry was posted in Machine Learning. Bookmark the permalink.