Deep Neural Networks

A few years ago, ordinary feed-forward neural networks (FFNs) were mysterious and rare. Now, they’re quite commonplace. Over the past year or so I’ve seen a lot of interest in more exotic forms of neural nets, including deep neural networks (DNNs), recurrent neural networks (RNNs) and convolutional neural networks (CNNs).

(Note: the vocabulary in this field varies quite a bit so my terminology won’t necessarily agree with other things you read).

A basic deep neural network is one with two or more hidden node processing layers (a simple FFN has just one hidden layer). So, in the image below there are 2 input nodes, and 3 output nodes. There are 3 hidden layers with 4, 2, 2 nodes respectively.


The input is (1.0, 2.0) and the output is (0.3268, 0.3332, 0.3398). Each blue arrow connecting a pair of nodes represents a numeric constant called a weight. Each purple arrow pointing into hidden nodes and output nodes represents a special kind of weight called a bias. I’ve labeled a few of the weights and biases for a demo DNN. (I didn’t put all values in the diagram because it makes the diagram very hard to interpret).

The top-most hidden node in the first hidden layer has value = 0.3627 calculated as:

hnode = tanh( (1.0)(0.01) + (2.0)(0.05) + 0.27 )
      = 0.3627

In words, sum the products of input values times weights, add the bias, then take the tanh (hyperbolic tangent) of the sum.

The tanh() is called the activation function. Other possible activation functions include logistic sigmoid and rectified linear.

The output nodes are calculated a bit differently. Each sum of products is computed but without an activation function. The preliminary results in the diagram are (0.5628, 0.5822, 0.6017). Then a function called softmax() is applied to give the final output values. The softmax() function makes the output values sum to 1.0 so they can be interpreted as probabilities.

I’ve implemented 2-hidden-layer NNs several times. But implementing a true DNN is quite an engineering challenge and usually takes me several days.


This entry was posted in Machine Learning. Bookmark the permalink.