A Quick Look at the Adversarial Robustness Toolbox (ART)

The Adversarial Robustness Toolbox (ART) is a Python library for machine learning security. ART provides tools to explore attack types of evasion, poisoning, extraction, and inference. ART supports the TensorFlow, Keras, PyTorch, MXNet, scikit-learn, XGBoost, LightGBM, CatBoost, and GPy libraries.

I decided to install ART and run one of the documentation examples. Bottom line: I was quite impressed, but the library has a steep learning curve and is very large because it supports so many machine learning libraries.

Installing ART wasn’t too difficult. I followed the instructions at the project repository at github.com/Trusted-AI/adversarial-robustness-toolbox. I first tried the command:

pip install adversarial-robustness-toolbox

But installation failed when the dependency on the llvmlite package couldn’t be deleted-updated. I then tried:

pip install adversarial-robustness-toolbox ^
  --ignore-installed llvmlite

and the ART library seemed to install correctly (but with a few of the inevitable error messages).

I went to the Examples directory on the github site and copy-pasted the PyTorch Fast Gradient Sign Method (FGSM) attack example code. Somewhat amazingly, the example worked on my first attempt.

The top row has FGSM adversarial examples generated with epsilon = 0.15. The bottom row uses epsilon = 0.2. With larger epsilon, the trained model is fooled more often, but the adversarial images are a bit easier to detect by the human eye. The caption above each image shows the true digit value and what digit the trained model was tricked into believing the image was.

FGSM takes images and then slightly modifies pixel values in a way that’s designed to make the trained model make a wrong prediction. The epsilon value controls how much the pixels are changed. The larger epsilon is, the more the images are changed and the model will make more errors, but the more the images are changed the more likely the changes can be seen by the human eye.

For the demo, the trained MNIST model scored 98.13% accuracy on the 10,000 test images. With an epsilon of 0.2, the adversarial images successfully fooled the trained model and the accuracy on those images was only 31.17% accuracy.

Fascinating stuff.

Three more or less random images from an Internet search for “gradient photography”. I like to look at photographic images but I am the worst photo-taker ever.