Testing a Transformer Based Autoencoder Anomaly Detection System

For the past several months, I’ve been poking away at an idea to perform unsupervised anomaly detection using a system based on a deep neural Transformer Architecture (TA). The idea is to start with a dataset, and construct a TA encoder that creates a latent representation of the dataset. Then a neural decoder is applied to reconstruct each source data item. After reconstruction, the source data items are compared with their reconstructions. Data items that have the largest reconstruction error are tagged as anomalies.

Put another way, a TA based system is similar to a standard autoencoder reconstruction error anomaly detection except it uses a Transformer encoder instead of an encoder based on fully connected linear layers.



This screenshot shows a demo run of generating FGSM data items. Can a TA anomaly detection system find the evil data?



This screenshot shows the Transformer based anomaly detection system in action. I fetched an evil FGSM data item and placed it in in position [0] in a dataset with 100 benign items. The TA detection system found the evil item.


After many hours of experimentation, I got a TA based anomaly detection system working. My colleagues were not completely impressed. They wanted evidence that the anomaly detection system actually detects anomalies. Fair enough.

So I implemented a fast gradient sign method (FGSM) attack system to generate anomalous data. FGSM data is created in a way that it looks very similar to benign data, but the FGSM data is misclassified by a neural classification system. I ran the FGSM program and fetched the first data item produced. I salted a normal 100-item dataset with the evil data item and then ran the TA anomaly detection system. The TA system corrected found the evil FGSM data item. I was happy.

I used the UCI Digits dataset for my experiments. Each data item is an 8 by 8 image of a handwritten digit. Each of the 64 pixels is a grayscale value between 0 and 16.

The Transformer based anomaly detection system experiment was a lot of work but very interesting. Apart from being potentially useful, I learned a lot.



Anomalous facial features can be unattractive or attractive. Three actresses with facial anomalies. Left: Sophia Loren (b. 1934) has a cleft chin, a fairly common anomaly (~5% of women). Center: Emma Watson (b. 1990) has freckles, a recessive trait anomaly (~5% of people have this trait). Right: Jane Seymour (b. 1951) has Heterochromia, different color eyes. It is a rare (less than 1% of population) anomaly.


This entry was posted in PyTorch. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s