There are pros and cons about working at a huge company. One of the very best things about working at Microsoft is the research talks that happen every day on “resnet”. I gave a resnet talk recently on the topic of neural network dropout.
I spent quite a bit of time reviewing neural network fundamental concepts: the input-output mechanism, and the back-propagation training algorithm. Then I discussed the dropout technique where, as each training item is presented, a random 50% of the hidden nodes are selected and dropped as if they weren’t there.
This technique in effect samples sub-networks and then averages them together. The main idea is very simple, but like always with neural networks, there are many subtle details.
Also, when I gave my presentation, I tried to add peripheral information about the history and development of the technique, and a bit about the psychology that’s associated with machine learning research.
I gave the audience a few challenges. When the nodes to drop are selected, they’re always (in every example I’ve ever found anyway) selected randomly:
for-each hidden node generate a random probability between 0 and 1 if p < 0.50 make curr node a drop node end for-each
But this approach doesn’t guarantee that exactly half of the hidden nodes will be selected — if you have four hidden nodes you might get 0, 1, 2, 3, r 4 drop nodes. So the challenge was to write selection code that guarantees exactly half of the nodes are selected.
If using the Python language, one way to do this would be to use the random.sample() function. For example:
# sample.py import random print("\nBegin \n") random.seed(0) # make reproducible indices = list(range(0,10)) print(indices) # [0, 1, . . 9] selected = random.sample(indices, 5) print(selected) # 5 random indices print("\nEnd \n")
I pointed out that, to the best of my knowledge, nobody has investigated and published an analysis if the two selection approaches give essentially the same results on neural network prediction accuracy.