Neural Network Library Dropout Layers

Until quite recently, neural network libraries like TensorFlow and CNTK didn’t exist, so if you wanted to create a neural network, you’d have to do so by writing raw code using C/C++ or C# or Java or similar.

In those days, to implement neural network dropout, you’d do so by writing code to tag nodes as those to be dropped on each training iteration, and then directly editing the code that computes output (skipping drop nodes), and then directly editing the back-propagation training code, and then modifying the final weights to account for the fact that dropout was used during training.

The approach I just described was a bit tricky, but not quite as difficult as the description may sound. But still, in the old days (like 2-3 years ago), almost everything about writing neural network code was non-trivial.

So, my point is, I really, really understand dropout because I’ve read the source research papers, and I’ve implemented dropout from scratch many times.

Then in 2015 and 2016, along come TensorFlow and Keras and CNTK and other libraries. The approach used by these libraries is quite simple. Instead of creating a custom network, you place a so-called dropout layer into the network. The dropout layer sets its input nodes to 0.0 which effectively drops the associated nodes before those in the dropout layer.

Library code could resemble:

model = Sequential()          # not real code
model.add(Dense(4))           # input
model.add(Dense(6))           # hidden
model.add(Dropout(rate=0.5))  # apply to hidden
model.add(Dense(3))           # output

The only way I could fully understand this mechanism was to sketch out a few pictures. Notice if you place a dropout layer immediately after the input layer, you are dropping input values, which is sometimes called jittering (although jittering can also mean adding noise to input values). If you place a dropout layer after the output layer, you’re dropping output values — which doesn’t make sense in any scenario I’ve ever seen.

I don’t think there’s a moral to this story. But an analogy might be something like this: In the 1920s and 1930s, everyone who drove a car probably had to have pretty good knowledge of how cars worked, so that they could fix the cars when they broke. But as time went on, understanding things like how to adjust the ignition timing became less and less important. Maybe that’s true of deep neural networks.

But it’s still good to know how things work.

To the best of my knowledge, the idea of dropout (but not the term ‘dropout’) was introduced in a 2012 research paper, and the first use of the term ‘dropout’ occurred in a 2014 follow-up paper. Dropout became widely known in late 2015. There are a couple of very deep research papers about the mathematics behind dropout (and how it averages virtual sub-networks). The best explanation for me is in a paper at:

HAMR (Harvard Ambulatory Micro Robot) is about the same size as a roach. Even though all life is sacred in some sense, I do not like roaches. Ugh. Hate ’em.

This entry was posted in Machine Learning. Bookmark the permalink.

One Response to Neural Network Library Dropout Layers

  1. PGT-ART says:

    Well there are still areas where deep knowledge about cars is required, not all cars are in series production. The concept neuron layer, is pure math that works good enough. But there is also a group of people who thinker width NN, who dont like to use blackboxes, and some who try to emulate real neuron behaviour, where some dont even use grid layers.

    Just keep up the good work, telling people how things work, never think something for granted.
    Only tinkerers improve the world around us. Now that Google is developing neural nets for Defence i sure hope people understand what they do .. ( movie WarGames ).

Comments are closed.