I wrote an article in the July 2014 issue of MSDN Magazine titled “Neural Network Weight Decay and Restriction”. See http://visualstudiomagazine.com/articles/2014/07/01/weight-decay-and-restriction.aspx.

You can think of a neural network as a complicated math function that accepts one or more numeric input values, and generates one or more numeric output values. The output values are determined in part by a set of values called the networks weights and biases. Training a neural network is the process of finding the values for the weights and biases.

Training is accomplished by using a set of training data that has known output values. Training finds the set of weights and bias values so that when presented with training data input values, the computed output values are very close to the known output values in the test data.

A major problem when training a neural network is a phenomenon called over-fitting. Over-fitting means you’ve found weights and bias values that work perfectly or almost perfectly with the test data, but when you present the trained network with new data, the prediction accuracy is very poor.

There are several strategies you can use to try and deal with over-fitting. One is called weight decay; during training, each weight and bias value is decremented by a small amount. The idea is that if there is no counter effect from the training data, useless weights will fade away. Another strategy is weight restriction; during training weights and bias values are not allowed to get too big or too small.