I wrote an article titled “Defending Machine Learning Image Classification Models from Attacks” in the January 2021 edition of the Pure AI web site. See https://pureai.com/articles/2021/01/05/defending-model-attacks.aspx.
In the article, I describe an interesting research paper “Denoised Smoothing: A Provable Defense for Pretrained Classifiers” that shows a simple but effective way to prevent adversarial attacks on a machine learning image classification model. Image classification models can b attacked by cleverly modifying a source image. The modified image appears unchanged to the human eye, but the image classifier will grotesquely misclassify the image. The classic example comes from a 2014 paper where researchers modified a photo of a school bus and tricked the image classifier into thinking the photo was an ostrich.
The technique presented in the Denoised Smoothing paper works as indicated in this diagram:
The source image is fed into the system. Several copies of the source image are made. Each image has noise added by modifying pixel values. Then the noisy copies are denoised using a special type of neural network autoencoder. The denoised copies are fed to a standard image classification system, and a consensus classification is reached.
The noise that is deliberately added to the source image swamps any adversarial perturbation noise that has been added by a bad person.
An interesting and practical technique for defending image classification models from attack.