A PyTorch Autoencoder for Anomaly Detection

I try to write at least one PyTorch program every day. PyTorch is complicated and the only way I can learn new techniques, and avoid losing some of my existing PyTorch knowledge, is to write programs.

One morning I decided to implement an autoencoder. I consider autoencoders to be one of the four basic types of neural networks that all data scientists should know. (The other three are binary classifier, multi-class classifier, and regression model). An autoencoder learns to predict its own input. Autoencoders can be used for 1.) “dimensionality reduction”, which is sort of like data compression, or for 2.) anomaly detection, or for 3.) denoising data, or for 4.) converting mixed-type data into purely numeric data so the data can be processed by numeric-only algorithms such as k-means clustering.

For anomaly detection, the basic idea is to train an autoencoder to predict its own input values, then use the trained model to find the item(s) that have the largest reconstruction error. For example, suppose you have employee data like (sex, age, income) where a male, 32-year old employee who makes $55,000.00 is normalized and encoded as (-1, 0.32, 0.55). If you feed this input to the trained autoencoder, it should spit back a result very close to the same three input values. Suppose you get back (-0.90, 0.40, 0.60). Then the squared error for that item is 0.0100 + 0.0064 + 0.0025 = 0.0189.

If you analyze every data item and find the one with the largest reconstruction error, it is likely that the item you found is anomalous in some way, compared to the other items.

Even though autoencoders are probably the simplest form of the four basic neural network types, there are still several ways to go wrong. For instance, several of the autoencoder examples I saw on the Internet applied ReLU activation to the final decoder layer. Because ReLU returns only non-negative values, ReLU isn’t a good choice if any of the input values (and therefore desired output values) can be negative, such as encoding sex as -1 or +1.

Note: In cases where all input is scaled to between 0 and 1, you could apply sigmoid() activation on the output nodes. Or if all input is scaled to between -1 and +1, you could apply tanh() activation on the output nodes. I don’t know of any research on this topic and a few experiments I’ve performed have not have had conclusive results.

Autoencoders. Good fun.

You’d think that it would be difficult to go wrong when designing an album cover for saxophone music. But I think it’s fair to say that these three examples are anomalies of good cover design.

# employee_auto.py
# autoencoder reconstruction error
# PyTorch 1.6.0-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10 

import numpy as np
import torch as T
device = T.device("cpu") 

# -----------------------------------------------------------

class EmployeeDataset(T.utils.data.Dataset):
  def __init__(self, src_file, num_rows=None):
    # sex  age   city    income   job
    # -1  0.27  0  1  0  0.7610  0  0  1
    # +1  0.19  0  0  1  0.6550  0  1  0
    # city: anaheim, boulder, concord
    # job: mgmt, supp, tech
    tmp_x = np.loadtxt(src_file, max_rows=num_rows,
      usecols=range(0,9), delimiter="\t", skiprows=0,
      dtype=np.float32)
    self.x_data = T.tensor(tmp_x, dtype=T.float32)

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    preds = self.x_data[idx]
    sample = { 'predictors' : preds }
    return sample

# -----------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.enc1 = T.nn.Linear(9, 4)  # 9-4-2-4-9
    self.enc2 = T.nn.Linear(4, 2)

    self.dec1 = T.nn.Linear(2, 4)
    self.dec2 = T.nn.Linear(4, 9)

    T.nn.init.xavier_uniform_(self.enc1.weight)
    T.nn.init.zeros_(self.enc1.bias)
    T.nn.init.xavier_uniform_(self.enc2.weight)
    T.nn.init.zeros_(self.enc2.bias)
    T.nn.init.xavier_uniform_(self.dec1.weight)
    T.nn.init.zeros_(self.dec1.bias)
    T.nn.init.xavier_uniform_(self.dec2.weight)
    T.nn.init.zeros_(self.dec2.bias)

  def forward(self, x):
    z = T.tanh(self.enc1(x))
    z = T.tanh(self.enc2(z))
    z = T.tanh(self.dec1(z))
    z = self.dec2(z)  # no activation
    return z

# -----------------------------------------------------------

def analyze_error(model, ds):
  largest_err = 0.0
  worst_x = None
  worst_y = None
  n_features = len(ds[0]['predictors'])

  for i in range(len(ds)):
    X = ds[i]['predictors']
    with T.no_grad():
      Y = model(X)  # should be same as X
    err = T.sum((X-Y)*(X-Y)).item()  # SSE all features
    err = err / n_features           # sort of norm'ed SSE 

    if err "greater-than" largest_err:
      largest_err = err
      worst_x = X
      worst_y = Y

  print("Largest error found: %0.4f" % largest_err)
  print("Worst actual X   = " + str(worst_x))
  print("Worst computed Y = " + str(worst_y))

# -----------------------------------------------------------

def main():
  # 0. get started
  print("\nBegin Employee autoencoder demo \n")
  T.manual_seed(1)
  np.random.seed(1)
  
  # 1. create DataLoader objects
  print("Creating Employee Dataset ")

  train_file = ".\\Data\\employee_all.txt"
  train_ds = EmployeeDataset(train_file)  # all 240 rows

  bat_size = 10
  train_ldr = T.utils.data.DataLoader(train_ds,
    batch_size=bat_size, shuffle=True)

  # 2. create network
  net = Net().to(device)

  # 3. train autoencoder model
  max_epochs = 1000
  ep_log_interval = 100
  lrn_rate = 0.005

  loss_func = T.nn.MSELoss()
  optimizer = T.optim.Adam(net.parameters(), lr=lrn_rate)

  print("\nbat_size = %3d " % bat_size)
  print("loss = " + str(loss_func))
  print("optimizer = Adam")
  print("max_epochs = %3d " % max_epochs)
  print("lrn_rate = %0.3f " % lrn_rate)

  print("\nStarting training")
  net.train()
  for epoch in range(0, max_epochs):
    epoch_loss = 0  # for one full epoch

    for (batch_idx, batch) in enumerate(train_ldr):
      X = batch['predictors'] 
      Y = batch['predictors'] 

      optimizer.zero_grad()
      oupt = net(X)
      loss_obj = loss_func(oupt, Y)  # a tensor
      epoch_loss += loss_obj.item()  # accumulate
      loss_obj.backward()
      optimizer.step()

    if epoch % ep_log_interval == 0:
      print("epoch = %4d   loss = %0.4f" % (epoch, epoch_loss))
  print("Done ")

  # 4. find item with largest reconstruction error
  print("\nAnalyzing data for largest reconstruction error \n")
  net = net.eval()
  analyze_error(net, train_ds)

  print("\nEnd Employee autoencoder demo")

if __name__ == "__main__":
  main()

5 Responses to A PyTorch Autoencoder for Anomaly Detection

Thorsten Kleppe says:

October 19, 2020 at 4:33 am

Autoencoders can be so impressive.
The de-noise example blew my mind the first time:
1. Take a picture twice, one for the target and one where you are adding a lot of noise.
2. Let the autoencoder train and watch what happens and compare the original, the noisy image and the autoencoder result (I did that with popcorn for a long time).

Another crazy thing is to do the opposite of anomaly detection, take the lowest value and make this example to the target of this class, this way trains beauties (or sometimes a anomaly), stunning!

Did you ever tried an autoencoder for Zoltar?
- jamesdmccaffrey says:
  
  October 19, 2020 at 7:01 pm
  
  No I never applied an autoencoder to Zoltar — either the input data (game results) or the output data (game predictions). A very interesting idea.
gmiliotis says:

October 31, 2020 at 2:57 am

Really useful post! Many thanks. Where can we find the data please? I think they first appear in the “Regression Using PyTorch” post. How do you preprocess/normalise them? I mean why income is divided by 100K for example.

Thanks!
- jamesdmccaffrey says:
  
  November 1, 2020 at 7:42 am
  
  I generated the data randomly but after running my demo I didn’t save the data. I made 240 rows of data. The gender is random 50% male, 50% female. The ages are random between 18 and 68. The city is random Anaheim, Boulder or Concord. The income is random between 23,000 and 89,000. The job is random mgmt, supp, or tech. I preprocessed the data manually in an Excel spreadsheet. Ages were divided by 100 and incomes were divided by 100,000 so that all numeric values are between 0.0 and 1.0. This makes it so that during training large values like incomes don’t overwhelm small values like age.
  - gmiliotis says:
    
    November 1, 2020 at 11:16 am
    
    Ok, many thanks! Your blog is very interesting and your examples are very clear