## Deep Neural Networks – Architecture and IO

The only way I can completely understand many technologies, is to actually code them. Deep neural networks (DNNs) are an example. So, I’ve been writing DNN code from scratch, using the C# language (but I could have used Python, or Java, or a similar C-family language).

The first step is to decide on an underlying architecture, which in this context means designing the data structures to hold input nodes, hidden nodes, output nodes, all the weights, and all the biases. Because DNNs are complex, there were many alternatives but I eventually decided on this:

```public double[] iNodes;  // input nodes
public double [][] hNodes;
public double[] oNodes;

public double[][] ihWeights;  // input- 1st hidden
public double[][][] hhWeights; // hidden-hidden
public double[][] hoWeights;  // last hidden-output

public double[][] hBiases;  // hidden node biases
public double[] oBiases;  // output node biases
```

The most interesting design decision was how to handle the weights. I decided on three separate data structures. The ihWeights[][] is an array-of-arrays style matrix that holds weights connecting the input nodes to the first hidden layer. The hoWeights[][] is an array-of-arrays style matrix that holds weights connecting the last hidden layer to the output nodes.

The hhWeights[][][] is an array of matrices where the first index references the “from” layer. This is quite tricky. For complex data structures, I always need a diagram in order to code, and DNNs are an extreme example of complex data structures.

Anyway, after setting up the data structures, the next step was to write a SetWeights() method so that I could set the weights and biases to some known values. A sub-step was to write a function that calculates the number of weights and biases needed. For example, a 2-(4,2,2)-3 DNN has two inputs, three hidden layers with 4, 2, and 2 nodes respectively, and three output nodes. The total number of weights and biases is:

2*4 + 4*2 + 2*2 + 2*3 + (4+2+2) + 3

which is 37.

With those pieces of the puzzle in place, I could write a ComputeOutputs() method that accepts some inputs values and uses them with the fixed weights and biases to compute output values, assuming a particular hidden layer activation function (I used tanh) and output layer activation function (softmax).

I ran my code and then manually verified all the calculations were correct. A lot of work, but a lot of fun. The next step will be to implement back-propagation training.