I wrote an article titled “Neural Network Back-Propagation using Python” in the June 2017 issue of Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2017/06/01/back-propagation.aspx
I strongly believe that when working with machine learning, even if you’re using a tool such as Weka or a library such as TensorFlow, it’s important to understand what is going on behind the scenes. And for me, the best way to understand a ML topic is by coding the topic from scratch.
Additionally, coding a ML system from scratch gives you complete control over the system, and allows you to customize the code and to experiment. Neural network back-propagation is an example of a topic that requires code for complete understanding (for me anyway).
I coded a demo program in Python plus the NumPy numeric add-on package. Why? Because Python plus NumPy has become the de facto standard API interface for leading deep learning libraries, notably Google TensorFlow and Microsoft CNTK. So a side-benefit of my article demo code is that you gain useful Python skills.
On the one hand, the ideas behind neural network back-propagation are not overwhelmingly difficult (even though they’re by no means easy). However, when you code back-propagation, a ton of important details are revealed.
The back-prop weight update code depends on the underlying error function assumption. If you assume mean squared error, then there are several equivalent forms. One is “squared computed output minus target” and another is “squared target minus computed output”. Both forms give the same error value, but lead to different update code.
Suppose, for a given training data item, the target vector is (0, 1, 0) and the computed outputs are (0.20, 0.70, 0.10). Using “squared target minus output” the error is (0 – 0.20)^2 + (1 – 0.70)^2 + (0 – 0.10)^2 = 0.04 + 0.09 + 0.01 = 0.14. Using the “squared output minus target” the error is (0.20 – 0)^2 + (0.70 – 1)^2 + (0.1 – 0)^2 = 0.14 again.
But the back-prop update code depends on the Calculus derivative of the error function. Here the target is a constant but the computed output is variable. The net result is that one form of error leads you to add a weight delta, and the other form of error leads you to subtract a weight delta.
The moral is that to completely understand neural network back-propagation, it’s a good idea to look at an actual code implementation.