I’ve been getting my butt kicked, technically speaking, for the past couple of days. I’ve been exploring training deep neural networks. When I use standard online training with back-propagation, my code seems to work fairly well:
My code creates 2,000 dummy items. Each item has four inputs and three outputs and looks like (4.5, -3.2, 1.6, -2.0, 0, 0, 1). The generator uses a 4-(10,10,10)-3 deep NN — four inputs, three hidden layers of ten nodes each, and three outputs. Therefore, the generator has (4 * 10) + (10 * 10) + (10 * 10) + (10 * 3) + 30 + 3 = 303 weight and biases that must be determined.
One of the points of my investigation is to explore the vanishing gradient phenomenon. In the image above I display one gradient every 200 training epochs and you can see that, as expected, it quickly goes to nearly 0 (to four decimals).
So, just for fun I thought I’d see what the effect of using batch training would be:
What the heck?! The NN just doesn’t learn at all. Now I know that online training is better than batch training, but this result is extreme. I suspect I may have a bug in my batch-training version code. But, tracking down a problem in code like this could easily take days so I’m going to have to put it aside for now. Grrr.