Whenever I read about some sort of technology, no matter how clear the explanation is, I never feel that I fully understand the topic unless I can code a demo program. This is probably a character strength and weakness of mine.
I was thinking about several neural network ideas related to over-fitting such as L1 regularization, L2 regularization, weight restriction, and dropout. So, even though I am reasonably familiar with all these ideas, I thought I’d take a close look at dropout.
In neural network dropout, during training when the system finds values of the network weights and biases, as each training item is presented to the network, a random 50% of the hidden processing nodes are virtually dropped — you pretend the dropped nodes aren’t there.
It’s not obvious, but using dropout helps prevent over-fitting, which is when a network predicts very well on the training data, but when presented with new test data that wasn’t used during training, the network predicts poorly.
As usual, even with rather simple ideas, there are many details you have to address when coding an actual implementation. I spent a few hours one rainy Seattle weekend exploring neural network dropout. I now believe I have a very solid understanding of how to implement dropout, and a decent understanding of the theoretical aspects of dropout.
In my demo, a neural net trained without dropout over-fit some synthetic data — the accuracy on the training data was 94.00% but when presented with test data the accuracy was only 67.50%. But when trained using dropout, the accuracy on the test data improved to 72.50%.