I spent a few hours taking a fresh look at a technique called neural network dropout. Dropout is a relatively simple technique (in principle anyway) that is used when training a neural network, and is intended to prevent model overfitting. Dropout was introduced in late 2012 by researcher G. Hinton. Dropout is most useful when working with deep NNs but I created a demo with a single-hidden layer NN.
Overfitting occurs when you train a neural network too well — the trained model gives you near-perfect results on the data you used to create the trained the model, but gives you poor accuracy when presented with new, previously unseen data.
Dropout sounds a bit wacky at first. As each training item is processed, a random 50% of the hidden processing nodes are selected to be dropped — meaning you process the training item as if the dropped nodes aren’t there at all.
When finished, the hidden-to-output weight values are divided by 2. The idea is that during training you’re only using half of the processing nodes so the output values are only half as big as they’d be without dropout, so to compensate you reduce the magnitudes of the weights.
Dropping nodes like this leads to a more robust trained model. In essence, dropout samples random half-sized subsets of the parent network and then averages the subsets together. It’s similar to what’s called an ensemble technique.
Although the basic idea of dropout is very simple, there are many, many subtleties — things that, as usual, aren’t apparent until you code from scratch. In my demo, I created a set of 200 synthetic training items and 40 synthetic test items.
Without dropout, my neural network got 98.00% accuracy on the training items but only 70.00% accuracy on the test items. It looks like the model has been overfitted.
With dropout, my neural network got 90.50% accuracy on the training data but 72.50% accuracy on the test data — a small but significant improvement. Notice that I gave the dropout training an extra 200 training iterations — training with dropout typically converges slower than training without dropout.
Anyway, neural network dropout training is an interesting technique, but one that is difficult to describe concisely because it has connections to many other NN concepts including back-propagation, sparsity, saturation, regularization, norm constraints, and ensemble techniques just to name a few.