Over the past few weeks I’ve been coding a deep neural network (DNN) from scratch, using the C# language. My most recent milestone was getting a back-propagation training method up and running.
I’ve coded back-prop for single-hidden-layer (i.e., non-deep) neural networks many times, so I wasn’t expecting too much trouble when coding back-prop for a DNN. But it was a lot trickier than I thought it’d be. There wasn’t any one thing that stumped me, but there were just a ton of details.
Anyway, when implementing back-prop, there are a few design choices. Conceptually, the choice of using mean squared error or cross-entropy error is a big decision, but in terms of implementation, the error function isn’t a big deal.
I coded my back-prop routing using “stochastic” training, as opposed to batch or mini-batch. Conceptually that’s not a big deal, but implementation-wise, it’s a lot of work to use batch or mini-batch.
Another implementation option is to use momentum or not. I used momentum, because you can always set the momentum term to 0.0 if you like.
Anyway, it was very good fun. Next up, I think I’ll implement batch and mini-batch training. Also, I’ll need to carefully go over the code because anything this complex almost certainly has a few (hopefully minor) bugs.