One of many techniques to explain why a neural network produced a particular prediction is called integrated gradient. The idea is difficult to understand if you’re not familiar with it. So I’ll try to give an informal (as possible) explanation.
Suppose your problem scenario is to predict the letter defined by a 3×3 image. There are just 5 different letters: ‘C’, ‘H’, ‘L’, ‘T’, ‘Z’. Each of the 9 pixels is a grayscale value, 0 to 16. You have created a 9-(10-8)-5 deep neural network. One of your images is correctly classified as a ‘T’ but you want to know why your model made that predication.
The integrated gradient technique computes a value between 0 and 1 for each pixel in the image being analyzed. The computed values are based on Calculus gradients which are kind of an indirect measure of error computed by a neural model.
You first set up a baseline image, typically all 0s or all 1s. Then you programmatically create several intermediate images that get closer and closer to the image to analyze. The integrated gradient technique can work with any kind of data, but image analysis is the most common scenario.
Next, you run each of the images through the trained model and compute the average gradient associated with each input pixel. This is tricky. Because your neural network is 9-(10-8)-5 each input node/pixel is associated with 10 weights and each of those weights has a bias. But you only want one gradient per input pixel, so you take the average the 10 gradients for each input pixel.
At this point each of the 9 pixels has 4 average gradients (one for each of the 4 images you have). For each set of pixels you integrate the area under the graph if they were graphed. For example, suppose that for the pixel at (0,0) in the image to analyze, the 4 associated gradients are 0.2, 0.3, 0.6, 0.8. If you graphed the gradients it would look like:
Integrating is computing the area under the curve. This is not feasible in practice so you use a technique called a Riemann Sum to estimate the integral. You don’t really need to make a graph, you just estimate the integral as if there were a graph.
After you estimate the integrals for each of the 9 pixels, you normalize them so each is between 0.0 and 1.0. The normalized integrated gradient values are measures of how important each pixel was when making the prediction – a large value means that pixel was relatively more important than a pixel with a smaller value.
The TensorFlow documentation has a worked example. The image to analyze was a fireboat. The integrated gradient technique shows that it was the water spray from the boat that most influenced the model’s classification. This would maybe lead you to investigate how well the model would do when presented with an image of a fireboat that isn’t spraying water.
The integrated gradient technique is useful when a model makes an incorrect prediction too. The image below was incorrectly classified as a vacuum. When the integrated gradient analysis was applied it shows the narrow railing next to a wide staircase railing, which does in fact resemble an upright vacuum cleaner, was responsible for the misclassification.
The integrated gradient technique seems a bit overly-complex but the research paper explains the mathematical motivations. See “Axiomatic Attribution for Deep Networks” (2017) by M. Sundararajan, A. Taly, and Q. Yan.
Interpreting art images is probably more difficult than interpreting neural network image classifiers. Here are three mixed media portraits from an Internet image search for “enigmatic portrait”. I have no idea what any of these portraits mean but they look nice to me. Left: by artist Aiden Kringen. Center: by artist Arnaud Bauville. Right: by artist Mstislav Pavlov.