For most of the software developers I know, there’s tremendous interest in machine learning and neural networks. So I’ve been giving a series of one-hour talks aimed at developers and designed to get them up to speed quickly.
A few days ago I talked about error and accuracy. My first two talks explained how the neural network input-output process works, and then how the back-propagation training algorithm works. In both of those talks, I assumed the underlying Error function that compares computed output values, such as (0.20, 0.70, 0.10), with correct target values, such as (0, 1, 0), is the simple squared error function. For the data just mentioned, squared error would be (0.20 – 0)^2 + (0.70 – 1)^2 + (0.10 – 0)^2 = 0.04 + 0.09 + 0.01 = 0.14.
The next logical step in a developer’s understanding of neural networks is understanding cross entropy error. Cross entropy error isn’t as obvious as squared error. For the data above, cross entropy error is -1 * [log(0.20)*0 + log(0.70)*1 + log(0.10)*0] = 0.3567.
As it turns out, using cross entropy error rather than squared error, is often (but not always) better because you get more accurate predictions. The reasons why this is so are rather subtle.
Briefly, when using squared error, to update weights during training, there is a term (1 – output)*(output). Because “output” is a probability between 0 and 1, the term is always between 0 and 0.25 (when output = 0.5). For example, if output = 0.60 then the term is 0.60 * 0.40 = 0.24. The small number makes training a bit slower. But if you use cross entropy error, the term is not there in the update so training is faster. Sort of. I’ve skipped over a ton of important details.
Now accuracy is just the percentage correct predictions. Ultimately, prediction accuracy is the metric you’re interested in, but during neural network training, accuracy is too crude.
The moral of the story is that anyone can become an expert on neural networks. But there are a lot of details that need to be learned one at a time. I estimate that in 16 one-hour talks, a developer can become an expert.