MNIST Image Classification Using Keras 2.8 on Windows 11

One of my standard neural network examples is image classification on the MNIST dataset. The full MNIST (modified National Institure of Standards and Technology) dataset has 60,000 images for training and 10,000 images for testing. Each image is a 28 x 28 (784 pixels) grayscale handwritten digit from ‘0’ to ‘9’. Each pixel value is an integer from 0 (white) to 255 (black).

I fetched the raw MNIST data from http://yann.lecun.com/exdb/mnist/. The data is stord in four .gz (gnu-zipped) files: train-images-idx3-ubyte.gz, train-labels-idx1-ubyte.gz, t10k-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz. I used the 7-Zip utility program to extract the four files. The data is stored in a proprietary binary format so I wrote a helper program to convert the binary data to text files. See https://jamesmccaffrey.wordpress.com/2022/02/25/preparing-mnist-image-data-text-files-in-visual-studio-magazine/.

I used a 1,000-item subset of the training data, and a 100-item subset of the test data. After conversion to text, the data looks like:

0 0 0 . . 84 185 159 . . 5
0 0 0 . . 133 254 87 . . 9
0 0 0 . . 164 79 202 . . 7
. . .

Each line is one image. The first 784 values on each line are the pixel values. The last value on each line is the target digit, ‘0’ to ‘9’.

I designed a convolutional neural network that has two convolution layers, three linear layers, two pooling layers, and two dropout layers. The architecture was adapted from an example I found buried in the PyTorch documentation.

I used ReLU() activation on all layers except for the final layer where I used no activation (combined with CrossEntropyLoss() for training which automatically adds log_softmax() activation). I allowed the default Keras initialization — glorot_uniform() for weights and zeros() for biases.

For training, I used stochastic gradient descent optimization with a fixed learning rate of 0.05 and a batch size of 20.

The demo achieved 97.00% accuracy on the test data: not bad but it’s possible to do better by fiddling with the hyperparameters.

To use the trained model, just for fun, I created a fake image that sort of resembles a reversed ‘4’.

Note: I did the same problem using PyTorch 1.10.0 — see https://jamesmccaffrey.wordpress.com/2022/05/26/mnist-image-classification-using-pytorch-1-10-on-windows-11/.



The Disney animated movie “Atlantis: The Lost Empire” (2001) was a box office flop. I had high hopes but even though the movie had excellent animation and art, it missed the mark on plot, dialog, and pacing. The animators created a complete Atlantean alphabet with digits — very nice. I would have loved for the movie to be a success but there’s always a chance for a new version or reboot.


Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols. My lame blog editor chokes on symbols.

# mnist_tfk.py
# MNIST using CNN and raw text data
# Keras 2.8.0 in TensorFlow 2.8.0 ("_tfk")
# Anaconda3-2020.02  Python 3.7.6  Windows 10/11

import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'  # suppress CPU warn

import numpy as np
import tensorflow as tf
from tensorflow import keras as K
import matplotlib.pyplot as plt

# -----------------------------------------------------------

class MyLogger(K.callbacks.Callback):
  def __init__(self, n):
    self.n = n   # print loss & acc every n epochs

  def on_epoch_end(self, epoch, logs={}):
    if epoch % self.n == 0:
      curr_loss = logs.get('loss')
      curr_acc = logs.get('accuracy') * 100
      print("epoch = %4d  |  loss = %0.6f  |  acc = %0.2f%%" % \
(epoch, curr_loss, curr_acc))

# -----------------------------------------------------------

def main():
  # 0. get started
  print("\nBegin MNIST using Keras CNN and raw text data ")
  np.random.seed(1)
  tf.random.set_seed(1)

  # 1. load data
  # 784 tab-delim pixel values (0-255) then label (0-9)
  print("\nLoading 1000-train 100-test data from text file ")
  train_file = ".\\Data\\mnist_train_1000.txt" 
  all_train_xy = np.loadtxt(train_file, usecols=range(785),
      delimiter="\t", comments="#", dtype=np.float32)
  train_x = all_train_xy[:, 0:784]  # all rows, cols [0,783]
  train_x /= 255.0
  train_x = train_x.reshape(1_000, 28, 28, 1)

  train_y = all_train_xy[:, 784]
  train_y = K.utils.to_categorical(train_y, 10)

  test_file = ".\\Data\\mnist_test_100.txt" 
  all_test_xy = np.loadtxt(test_file, usecols=range(785),
      delimiter="\t", comments="#", dtype=np.float32)
  test_x = all_test_xy[:, 0:784]  # all rows, cols [0,783]
  test_x /= 255.0
  test_x = test_x.reshape(100, 28, 28, 1)

  test_y = all_test_xy[:, 784]
  test_y = K.utils.to_categorical(test_y, 10)

# -----------------------------------------------------------

  # 2. define model
  print("\nCreating CNN network with 2 conv and 3 linear ")
  # g_init = K.initializers.glorot_uniform(seed=1)
  
  x = K.layers.Input(shape=(28,28,1))
  con1 = K.layers.Conv2D(filters=32, kernel_size=(5,5), 
    activation='relu', padding='valid')(x)
  mp1 = K.layers.MaxPooling2D(pool_size=(2,2))(con1)
  do1 = K.layers.Dropout(0.25)(mp1)

  con2 = K.layers.Conv2D(filters=64, kernel_size=(5,5), 
    activation='relu', padding='valid')(do1)
  mp2 = K.layers.MaxPooling2D(pool_size=(2,2))(con2)

  # neural network phase
  z = K.layers.Flatten()(mp2)
  fc1 = K.layers.Dense(units=512, activation='relu')(z)
  do2 = K.layers.Dropout(0.50)(fc1)
  fc2 = K.layers.Dense(units=256, activation='relu')(do2)
    
  fc3 = K.layers.Dense(units=10, activation='softmax')(fc2)
 
  model = K.models.Model(x, fc3)

  lrn_rate = 0.05
  opt = K.optimizers.SGD(learning_rate=lrn_rate)
  # opt = K.optimizers.Adam(learning_rate=0.05)
  model.compile(loss='categorical_crossentropy',
    optimizer=opt, metrics=['accuracy'])

# -----------------------------------------------------------
  
  # 3. train model
  bat_size= 20
  max_epochs = 25
  print("\nbat_size = %3d " % bat_size)
  print("loss = categorical_crossentropy ")
  print("optimizer = SGD")
  print("lrn_rate = %0.3f " % lrn_rate)
  print("max_epochs = %3d " % max_epochs)


  print("\nStarting training")
  my_logger = MyLogger(n=5)  # progress every 5 epochs
  model.fit(train_x, train_y, batch_size=bat_size,
    epochs=max_epochs, verbose=0, callbacks=[my_logger])
  print("Done ")

# -----------------------------------------------------------

  # 4. evaluate model
  print("\nComputing model accuracy")
  eval = model.evaluate(train_x, train_y, verbose=0)
  acc_train = eval[1]
  print("Accuracy on training data = %0.4f" % acc_train)

  eval = model.evaluate(test_x, test_y, verbose=0)
  acc_test = eval[1]
  print("Accuracy on test data = %0.4f" % acc_test)

# -----------------------------------------------------------

  # 5. use model
  print("\nMaking prediction for fake image: ")
  x = np.zeros(shape=(28,28), dtype=np.float32)
  for row in range(5,23):
    x[row][9] = 180  # vertical line
  for rc in range(9,19):
    x[rc][rc] = 250  # diagonal
  for col in range(5,15):  
    x[14][col] = 200  # horizontal
  x /= 255.0

  plt.tight_layout()
  plt.imshow(x, cmap=plt.get_cmap('gray_r'))
  plt.show()

  x = x.reshape(1, 28, 28, 1)
  pred_probs = model.predict(x)  # sum to 1
  print("\nPrediction probabilities: ")
  np.set_printoptions(formatter={'float': '{: 0.4f}'.format})
  # np.set_printoptions(precision=4, suppress=True)
  print(pred_probs)
  
  digits = ['zero', 'one', 'two', 'three', 'four', 'five', 
    'six', 'seven', 'eight', 'nine' ]
  
  am = np.argmax(pred_probs)
  print("\nPredicted class is \'" + digits[am] + "\'")

# -----------------------------------------------------------

  # 6. save model
  print("\nSaving MNIST model to disk ")
  # mp = ".\\Models\\mnist_model.h5"
  # model.save(mp)

  print("\nEnd MNIST Keras CNN demo ")

if __name__ == "__main__":
  main()
Posted in Keras, PAW | Leave a comment

MNIST Image Classification Using PyTorch 1.10 on Windows 11

One of my standard neural network examples is image classification on the MNIST dataset. The full MNIST (modified National Institute of Standards and Technology) dataset has 60,000 images for training and 10,000 images for testing. Each image is a 28 x 28 (784 pixels) grayscale handwritten digit from ‘0’ to ‘9’. Each pixel value is an integer from 0 (white) to 255 (black).

I fetched the raw MNIST data from http://yann.lecun.com/exdb/mnist/. The data is stored in four .gz (gnu-zipped) files: train-images-idx3-ubyte.gz, train-labels-idx1-ubyte.gz, t10k-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz. I used the 7-Zip utility program to extract the four files. The data is stored in a proprietary binary format so I wrote a helper program to convert the binary data to text files. See https://jamesmccaffrey.wordpress.com/2022/02/25/preparing-mnist-image-data-text-files-in-visual-studio-magazine/.

I used a 1,000-item subset of the training data, and a 100-item subset of the test data. After conversion to text, the data looks like:

0 0 0 . . 84 185 159 . . 5
0 0 0 . . 133 254 87 . . 9
0 0 0 . . 164 79 202 . . 7
. . .

Each line is one image. The first 784 values on each line are the pixel values. The last value on each line is the target digit, ‘0’ to ‘9’.

I designed a convolutional neural network that has two convolution layers, three linear layers, two pooling layers, and two dropout layers. The architecture was adapted from an example I found buried in the PyTorch documentation.

I used ReLU() activation on all layers except for the final layer where I used no activation (combined with CrossEntropyLoss() for training which automatically adds log_softmax() activation). I used the default PyTorch weight and bias initialization. The documentation is not very clear about what this is, but I believe it’s xavier_uniform_() weight initialization and zeros_() bias initialization.

For training, I used stochastic gradient descent optimization with a fixed learning rate of 0.05 and a batch size of 20.

The demo achieved 97.00% accuracy on the test data: not bad but it’s possible to do better by fiddling with the hyperparameters.

To use the trained model, just for fun, I created a fake image that sort of resembles a mutated ‘4’ digit.



A mutated handwritten digit is one thing. A mutated plant in an old science fiction movie is another.

Left: “The Lost World” (1960) tells the story of explorers who find a hidden area on top of a nearly inaccessible plateau. There are several different types of plants that have mutated into carnivorous versions. The movie is loosely based on a novel by Arthur Conan Doyle, who best known for his Sherlock Holmes stories.

Center: “Matango” (1963) is a Japanese movie where survivors of a shipwreck end up on a deserted island. Unfortunately, eating mushrooms mutates people into mushroom-people.

Right: “The Day of the Triffids” (1962) tells a story where a global-wide meteor shower blinds most people on Earth and also delivers spores that mutate into huge Triffids — carnivorous plants that are slow moving but have a poisonous sting.


Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols. My lame blog editor chokes on symbols.

# mnist_cnn.py
# PyTorch 1.10.0-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10/11

# reads MNIST data from text file rather than using
# built-in black box Dataset from torchvision

import numpy as np
import matplotlib.pyplot as plt
import torch as T

device = T.device('cpu')

# -----------------------------------------------------------

class MNIST_Dataset(T.utils.data.Dataset):
  # 784 tab-delim pixel values (0-255) then label (0-9)
  def __init__(self, src_file):
    all_xy = np.loadtxt(src_file, usecols=range(785),
      delimiter="\t", comments="#", dtype=np.float32)

    tmp_x = all_xy[:, 0:784]  # all rows, cols [0,783]
    tmp_x /= 255.0
    tmp_x = tmp_x.reshape(-1, 1, 28, 28)  # bs, chnls, 28x28
    tmp_y = all_xy[:, 784]    # 1-D required

    self.x_data = \
      T.tensor(tmp_x, dtype=T.float32).to(device)
    self.y_data = \
      T.tensor(tmp_y, dtype=T.int64).to(device) 

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    lbl = self.y_data[idx] 
    pixels = self.x_data[idx] 
    return (pixels, lbl)

# -----------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()  # pre Python 3.3 syntax

    self.conv1 = T.nn.Conv2d(1, 32, 5)  # chnl-in, out, krnl
    self.conv2 = T.nn.Conv2d(32, 64, 5)
    self.fc1 = T.nn.Linear(1024, 512)   # [64*4*4, x]
    self.fc2 = T.nn.Linear(512, 256)
    self.fc3 = T.nn.Linear(256, 10)     # 10 classes
    self.pool1 = T.nn.MaxPool2d(2, 2)   # kernel, stride
    self.pool2 = T.nn.MaxPool2d(2, 2)
    self.drop1 = T.nn.Dropout(0.25)
    self.drop2 = T.nn.Dropout(0.50)
    # uses default weight and bias initialization
  
  def forward(self, x):
    # convolution phase         # x is [bs, 1, 28, 28]
    z = T.relu(self.conv1(x))   # Size([bs, 32, 24, 24])
    z = self.pool1(z)           # Size([bs, 32, 12, 12])
    z = self.drop1(z)
    z = T.relu(self.conv2(z))   # Size([bs, 64, 8, 8])
    z = self.pool2(z)           # Size([bs, 64, 4, 4])
   
    # neural network phase
    z = z.reshape(-1, 1024)     # Size([bs, 1024])
    z = T.relu(self.fc1(z))     # Size([bs, 512])
    z = self.drop2(z)
    z = T.relu(self.fc2(z))     # Size([bs, 256])
    z = self.fc3(z)             # Size([bs, 10]) 
    return z  # implicit log-softmax() activation

# -----------------------------------------------------------

def accuracy(model, ds):
  ldr = T.utils.data.DataLoader(ds,
    batch_size=len(ds), shuffle=False)
  n_correct = 0
  for data in ldr:
    (pixels, labels) = data
    with T.no_grad():
      oupts = model(pixels)
    (_, predicteds) = T.max(oupts, 1)
    n_correct += (predicteds == labels).sum().item()

  acc = (n_correct * 1.0) / len(ds)
  return acc

# -----------------------------------------------------------

def main():
  # 0. setup
  print("\nBegin MNIST with PyTorch CNN demo ")
  np.random.seed(1)
  T.manual_seed(1)

  # 1. create Dataset
  print("\nCreating 1000-item train Dataset from text file ")
  train_file = ".\\Data\\mnist_train_1000.txt"
  train_ds = MNIST_Dataset(train_file)

  bat_size = 20
  train_ldr = T.utils.data.DataLoader(train_ds,
    batch_size=bat_size, shuffle=True)

  # 2. create network
  print("\nCreating CNN network with 2 conv and 3 linear ")
  net = Net().to(device)

# -----------------------------------------------------------

  # 3. train model
  max_epochs = 25  # 100 gives better results
  ep_log_interval = 5
  lrn_rate = 0.05
  
  loss_func = T.nn.CrossEntropyLoss()  # does log-softmax()
  optimizer = T.optim.SGD(net.parameters(), lr=lrn_rate)
  # optimizer = T.optim.Adam(net.parameters(), lr=0.005)
  
  print("\nbat_size = %3d " % bat_size)
  print("loss = " + str(loss_func))
  print("optimizer = SGD")
  print("lrn_rate = %0.3f " % lrn_rate)
  print("max_epochs = %3d " % max_epochs)


  print("\nStarting training")
  net.train()  # set mode
  for epoch in range(0, max_epochs):
    ep_loss = 0  # for one full epoch
    for (batch_idx, batch) in enumerate(train_ldr):
      (X, y) = batch  # X = pixels, y = target labels
      optimizer.zero_grad()
      oupt = net(X)
      loss_val = loss_func(oupt, y)  # a tensor
      ep_loss += loss_val.item()  # accumulate
      loss_val.backward()  # compute grads
      optimizer.step()     # update weights
    if epoch % ep_log_interval == 0:
      print("epoch = %4d   |  loss = %9.4f" % (epoch, ep_loss))
  print("Done ") 

# -----------------------------------------------------------

  # 4. evaluate model accuracy
  print("\nComputing model accuracy")
  net.eval()
  acc_train = accuracy(net, train_ds)  # all at once
  print("Accuracy on training data = %0.4f" % acc_train)

  test_file = ".\\Data\\mnist_test_100.txt"
  test_ds = MNIST_Dataset(test_file)
  net.eval()
  acc_test = accuracy(net, test_ds)  # all at once
  print("Accuracy on test data = %0.4f" % acc_test)

# -----------------------------------------------------------

  # 5. use model
  print("\nMaking prediction for fake image: ")
  x = np.zeros(shape=(28,28), dtype=np.float32)
  for row in range(5,23):
    x[row][9] = 180  # vertical line
  for rc in range(9,19):
    x[rc][rc] = 250  # diagonal
  for col in range(5,15):  
    x[14][col] = 200  # horizontal
  x /= 255.0

  plt.tight_layout()
  plt.imshow(x, cmap=plt.get_cmap('gray_r'))
  plt.show()

  x = x.reshape(1, 1, 28, 28)  # 1 image, 1 channel
  x = T.tensor(x, dtype=T.float32).to(device)
  with T.no_grad():
    oupt = net(x)  # 10 logits like [[-0.12, 1.03, . . ]]
  pred_probs = T.softmax(oupt, dim=1)
  print("\nPrediction probabilities: ")
  np.set_printoptions(formatter={'float': '{: 0.4f}'.format})
  # np.set_printoptions(precision=4, suppress=True)
  print(pred_probs.numpy())

  digits = ['zero', 'one', 'two', 'three', 'four', 'five', 
    'six', 'seven', 'eight', 'nine' ]
  am = T.argmax(oupt) # 0 to 9
  print("\nPredicted class is \'" + digits[am] + "\'")

# -----------------------------------------------------------

  # 6. save model
  print("\nSaving trained model state")
  # fn = ".\\Models\\mnist_model.pt"
  # T.save(net.state_dict(), fn)  

  print("\nEnd MNIST PyTorch CNN demo ")

if __name__ == "__main__":
  main()
Posted in PAW, PyTorch | Leave a comment

“Contrastive Loss Representation for Anomaly Detection Has Cybersecurity Implications” on the Pure AI Web Site

I contributed to an article titled “Contrastive Loss Representation for Anomaly Detection Has Cybersecurity Implications” in the May 2022 edition of the online Pure AI Web site. See https://pureai.com/articles/2022/05/03/anomaly-detection.aspx.

The article describes a type of neural network architecture called contrastive loss representation (CLR). CLR was originally designed for image data, but the article describes how the technique was adapted for use with log files, for cybersecurity purposes.

Briefly, contrastive loss representation for image data accepts an image (such as a 32 x 32 color image of a dog) and generates a numeric vector that is an abstract representation of the image (such as a numeric array of 500 values). The abstract representation vector can be used for so-called “downstream” tasks such as creating an image classifier, with only a very small number of images that are labeled with the correct class.

The article first describes CLR for image data. The CIFAR-10 (Canadian Institute for Advanced Research, 10 classes) dataset has 50,000 training images and 10,000 test images. Each image is 32 x 32 pixels. Because the images are color, each image has three channels (red, green, blue). Each pixel-channel value is an integer between 0 and 255. Each image is one of 10 classes: plane (class 0), car, bird, cat, deer, dog, frog, horse, ship, truck (class 9). Using all 50,000 training images it’s relatively easy to create an image classification system that achieves about 90 percent accuracy.

Suppose you want to create an image classifier for a new dataset of 32 x 32 images where each image is one of three classes: bicycle, cow and rabbit. You only have 100 labeled training images for each class. If you create an image classifier from scratch using the 300 training images, your classifier will certainly have poor accuracy because you just don’t have enough training data.

However, your (bicycle, cow, rabbit) image data is similar in some intuitive sense to the CIFAR-10 image data. If you could construct an internal representation of the CIFAR-10 data, there’s a good chance you could use that representation to jump-start an image classifier for your data and get good accuracy even though you have a very limited amount of training data.

I am quoted in the article:

McCaffrey commented, “Applying contrastive loss representation to non-image data is a straightforward idea so I’m not surprised that the technique appears to work well.”

McCaffrey further observed, “Deep neural systems have made fantastic progress in many areas, notably natural language processing. But one area where these deep neural systems have not quite met expectations is in cybersecurity.”

McCaffrey also noted, “This research, and many other efforts, seem to be making good progress toward our ability to detect and defend against malicious attacks on computer systems.”



An Internet search for “contrastive” led to a high contrast portrait (left). This led to a “gel” portrait (center). And that led to a “jello hat” portrait. Thank you Internet, for endless entertainment.


Posted in Machine Learning | Leave a comment

Regression (Employee Income) Using Keras 2.8 on Windows 11

One of my standard neural network examples is to predict employee income from sex, age, city, and job-type. Predicting a single numeric value is usually called a regression problem. (Note: “logistic regression” predicts a single numeric probability value between 0.0 and 1.0 but then that value is immediately used as a binary classification result).

My data is synthetic and looks like:

 1   0.24   1 0 0   0.2950   0 0 1
-1   0.39   0 0 1   0.5120   0 1 0
 1   0.63   0 1 0   0.7580   1 0 0
-1   0.36   1 0 0   0.4450   0 1 0
 1   0.27   0 1 0   0.2860   0 0 1
. . .

There are 200 training items and 40 test items.

The first value in column [0] is sex (M = -1, F = +1). Column [1] is age, normalized by dividing by 100. Columns [2,3,4] is city one-hot encoded (anaheim, boulder, concord). Column [5] is annual income, divided by $100,000, and is the value to predict. Columns [6,7,8] is job-type (mgmt, supp, tech).

I designed an 8-(10-10)-1 neural network. I used glorot_uniform() weight initialization with zero-bias initialization. I used tanh() activation on the two hidden layers, and no activation (aka Identity activation) on the single output node.

For training, I used Adam optimization with an initial learning rate of 0.01 along with a batch size of 10. I used mean squared error for the loss function.

For regression problems you must define a custom accuracy() function. My accuracy() function counts an income prediction as correct if it’s within 10% of the true income. I implemented two accuracy() functions. The first version iterates through one data item at a time. This is slow but useful to examine results. The second version feeds all data to the model at the same time. This is faster but more opaque.



There’s a strong correlation between a person’s job and their income. Here are three people who have interesting jobs.

Left: According to the BBC, Alan Moore is a “writer, wizard, mall Santa, and Rasputin impersonator”. Impressive.

Center: According to the Food Network TV company, Richard Scheuerman is a “shredded cheese authority”. OK.

Right: The BBC broadcast an interview with Andrew Drinkwater, from the “Water Research Centre”. He was meant to have that job.


Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols. My lame blog editor chokes on symbols. For the training and test data, see my post at https://jamesmccaffrey.wordpress.com/2022/05/23/regression-employee-income-using-pytorch-1-10-on-windows-11/ where I did the same problem using PyTorch.

# employee_income_tfk.py
# predict income from sex, age, city, job_type
# Keras 2.8.0 in TensorFlow 2.8.0 ("_tfk")
# Anaconda3-2020.02  Python 3.7.6  Windows 10/11

import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'  # suppress CPU warn

import numpy as np
import tensorflow as tf
from tensorflow import keras as K

# -----------------------------------------------------------

class MyLogger(K.callbacks.Callback):
  def __init__(self, n, model, data_x, data_y):
    self.n = n   # print loss every n epochs
    self.model = model
    self.data_x = data_x  # needed to compute accuracy
    self.data_y = data_y
    
  def on_epoch_end(self, epoch, logs={}):
    if epoch % self.n == 0:
      curr_loss = logs.get('loss')  # loss on curr batch
      acc = accuracy_x(self.model, self.data_x,
        self.data_y, 0.10) 
      print("epoch = %4d  |  loss = %0.6f  |  acc = %0.4f" % \
(epoch, curr_loss, acc))

# -----------------------------------------------------------

def accuracy(model, data_x, data_y, pct_close):
  # item-by-item -- slow -- for debugging
  n_correct = 0; n_wrong = 0
  n = len(data_x)
  for i in range(n):
    x = np.array([data_x[i]])  # [[ x ]]
    predicted = model.predict(x)  
    actual = data_y[i]
    if np.abs(predicted[0][0] - actual) "lt" \
      np.abs(pct_close * actual):
      n_correct += 1
    else:
      n_wrong += 1
  return (n_correct * 1.0) / (n_correct + n_wrong)

# -----------------------------------------------------------

def accuracy_x(model, data_x, data_y, pct_close):
  n = len(data_x)
  oupt = model(data_x)
  oupt = tf.reshape(oupt, [-1])  # 1D
 
  max_deltas = tf.abs(pct_close * data_y)  # max allow deltas
  abs_deltas = tf.abs(oupt - data_y)   # actual differences
  results = abs_deltas "lt" max_deltas    # [True, False, . .]

  n_correct = np.sum(results)
  acc = n_correct / n
  return acc

# -----------------------------------------------------------

def main():
  # 0. prepare
  print("\nBegin Employee predict income using Keras ")
  np.random.random(1)
  tf.random.set_seed(1)

  # 1. load data
  # sex age   city    income   job_type
  # -1  0.27  0 1 0   0.7610   0 0 1
  # +1  0.19  0 0 1   0.6550   1 0 0

  print("\nLoading Employee data into memory ")
  train_file = ".\\Data\\employee_train.txt"  # 200 lines
  train_x = np.loadtxt(train_file, usecols=[0,1,2,3,4,6,7,8],
    delimiter="\t", comments="#", dtype=np.float32)
  train_y = np.loadtxt(train_file, usecols=5, delimiter="\t",
    comments="#", dtype=np.float32)

  test_file = ".\\Data\\employee_test.txt"  # 40 lines
  test_x = np.loadtxt(test_file, usecols=[0,1,2,3,4,6,7,8],
    delimiter="\t", comments="#", dtype=np.float32)
  test_y = np.loadtxt(test_file, usecols=5, delimiter="\t",
    comments="#", dtype=np.float32)

# -----------------------------------------------------------

  # 2. create network
  print("\nCreating 8-(10-10)-1 neural network ")
  model = K.models.Sequential()
  model.add(K.layers.Dense(units=10, input_dim=8,
    activation='tanh', kernel_initializer='glorot_uniform',
    bias_initializer='zeros'))  # hid1
  model.add(K.layers.Dense(units=10,
    activation='tanh', kernel_initializer='glorot_uniform',
    bias_initializer='zeros'))  # hid2
  model.add(K.layers.Dense(units=1,
    activation=None, kernel_initializer='glorot_uniform',
    bias_initializer='zeros'))    # output layer
  opt = K.optimizers.Adam(learning_rate=0.01)
  model.compile(loss='mean_squared_error',
    optimizer=opt, metrics=['mse'])

# -----------------------------------------------------------

  # 3. train model
  print("\nbat_size = 10 ")
  print("loss = mean_squared_error ")
  print("optimizer = Adam ")
  print("lrn_rate = 0.01 ")

  my_logger = MyLogger(100, model, train_x, train_y) 

  print("\nStarting training ")
  h = model.fit(train_x, train_y, batch_size=10,
    epochs=1000, verbose=0, callbacks=[my_logger])
  print("Done ")

# -----------------------------------------------------------

  # 4. evaluate model
  print("\nComputing model accuracy (within 0.10 of true) ")
  train_acc = accuracy(model, train_x, train_y, 0.10) 
  print("Accuracy on train data = %0.4f" % train_acc)
  test_acc = accuracy_x(model, test_x, test_y, 0.10) 
  print("Accuracy on test data = %0.4f" % test_acc)

  # 5. use model
  # np.set_printoptions(formatter={'float': '{: 0.6f}'.format})
  print("\nPredicting income for M 34 concord support: ")
  x = np.array([[-1, 0.34, 0,0,1,  0,1,0]], dtype=np.float32)
  pred_inc = model.predict(x)
  print("$%0.2f" % (pred_inc * 100_000))  # un-normalized

# -----------------------------------------------------------

  # 6. save model
  print("\nSaving trained model ")
  # model.save_weights(".\\Models\\employee_model_wts.h5")
  # model.save(".\\Models\\employee_model.h5")

# -----------------------------------------------------------

  print("\nEnd Employee income demo")

if __name__=="__main__":
  main()
Posted in Keras, PAW | Leave a comment

Regression (Employee Income) Using PyTorch 1.10 on Windows 11

One of my standard neural network examples is to predict employee income from sex, age, city, and job-type. Predicting a single numeric value is usually called a regression problem. (Note: “logistic regression” predicts a single numeric probability value between 0.0 and 1.0 but then that value is immediately used as a binary classification result).

My data is synthetic and looks like:

 1   0.24   1 0 0   0.2950   0 0 1
-1   0.39   0 0 1   0.5120   0 1 0
 1   0.63   0 1 0   0.7580   1 0 0
-1   0.36   1 0 0   0.4450   0 1 0
 1   0.27   0 1 0   0.2860   0 0 1
. . .

There are 200 training items and 40 test items.

The first value in column [0] is sex (M = -1, F = +1). Column [1] is age, normalized by dividing by 100. Columns [2,3,4] is city one-hot encoded (anaheim, boulder, concord). Column [5] is annual income, divided by $100,000, and is the value to predict. Columns [6,7,8] is job-type (mgmt, supp, tech).

I designed an 8-(10-10)-1 neural network. I used xavier_uniform() weight initialization with zero-bias initialization. I used tanh() activation on the two hidden layers, and no activation (aka Identity activation) on the single output node.

For training, I used Adam optimization with an initial learning rate of 0.01 along with a batch size of 10. I used mean squared error for the loss function. I wrapped the training statements in a program-defined train() for no specific reason.

For regression problems you must define a custom accuracy() function. My function counts an income prediction as correct if it’s within 10% of the true income.



The normal English (non-math) meaning of “regression” is “a return to a less developed state”. This confused me when I was a math student and learned about regression. Left: In “Rocketship X-M” (1950) an expedition goes to Mars and finds that the once advanced civilization has regressed to primitive cavemen-like creatures. It doesn’t end well for the expedition members. Center: In “The Time Machine” (1960) scientist H. George Wells travels far into the future but finds that civilization has regressed to savage Morlocks who prey on the gentle Eloi. Right: In “The Time Travelers” (1964) a group of scientists accidentally travel into the far future and find the a nuclear war has regressed most humans to a primitive state.


Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols. My lame blog editor chokes on symbols.

# employee_income.py
# predict income from sex, age, city, job_type
# PyTorch 1.10.0-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10/11 

import numpy as np
import torch as T
device = T.device('cpu')  # apply to Tensor or Module

# -----------------------------------------------------------

class EmployeeDataset(T.utils.data.Dataset):
  def __init__(self, src_file):
    # sex age   city    income   job_type
    # -1  0.27  0 1 0   0.7610   0 0 1
    # +1  0.19  0 0 1   0.6550   1 0 0
    tmp_x = np.loadtxt(src_file, usecols=[0,1,2,3,4,6,7,8],
      delimiter="\t", comments="#", dtype=np.float32)
    tmp_y = np.loadtxt(src_file, usecols=5, delimiter="\t",
      comments="#", dtype=np.float32)
    tmp_y = tmp_y.reshape(-1,1)  # 2D required

    self.x_data = T.tensor(tmp_x, dtype=T.float32).to(device)
    self.y_data = T.tensor(tmp_y, dtype=T.float32).to(device)

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    preds = self.x_data[idx]
    incom = self.y_data[idx] 
    return (preds, incom)  # as a tuple

# -----------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(8, 10)  # 8-(10-10)-1
    self.hid2 = T.nn.Linear(10, 10)
    self.oupt = T.nn.Linear(10, 1)

    T.nn.init.xavier_uniform_(self.hid1.weight)
    T.nn.init.zeros_(self.hid1.bias)
    T.nn.init.xavier_uniform_(self.hid2.weight)
    T.nn.init.zeros_(self.hid2.bias)
    T.nn.init.xavier_uniform_(self.oupt.weight)
    T.nn.init.zeros_(self.oupt.bias)

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.tanh(self.hid2(z))
    z = self.oupt(z)  # regression: no activation
    return z

# -----------------------------------------------------------

def accuracy(model, ds, pct_close):
  # assumes model.eval()
  # correct within pct of true income
  n_correct = 0; n_wrong = 0

  for i in range(len(ds)):
    X = ds[i][0]   # 2-d
    Y = ds[i][1]   # 2-d
    with T.no_grad():
      oupt = model(X)         # computed income

    if T.abs(oupt - Y) "lt" T.abs(pct_close * Y):
      n_correct += 1
    else:
      n_wrong += 1
  acc = (n_correct * 1.0) / (n_correct + n_wrong)
  return acc

# -----------------------------------------------------------

def accuracy_x(model, ds, pct_close):
  # all-at-once (quick)
  # assumes model.eval()
  X = ds.x_data  # all inputs
  Y = ds.y_data  # all targets
  n_items = len(X)
  with T.no_grad():
    pred = model(X)  # all predicted prices
 
  n_correct = T.sum((T.abs(pred - Y) "lt" T.abs(pct_close * Y)))
  result = (n_correct.item() / n_items)  # scalar
  return result  

# -----------------------------------------------------------

def train(model, ds, bs, lr, me, le):
  # dataset, bat_size, lrn_rate, max_epochs, log interval
  train_ldr = T.utils.data.DataLoader(ds,
    batch_size=bs, shuffle=True)
  loss_func = T.nn.MSELoss()
  optimizer = T.optim.Adam(model.parameters(), lr=lr)

  for epoch in range(0, me):
    epoch_loss = 0  # for one full epoch

    for (b_idx, batch) in enumerate(train_ldr):
      X = batch[0]
      y = batch[1]
      optimizer.zero_grad()
      oupt = model(X)
      loss_val = loss_func(oupt, y)  # a tensor
      epoch_loss += loss_val.item()  # accumulate
      loss_val.backward()  # compute gradients
      optimizer.step()     # update weights

    if epoch % le == 0:
      print("epoch = %4d  |  loss = %0.4f" % (epoch, epoch_loss)) 

# -----------------------------------------------------------

def main():
  # 0. get started
  print("\nBegin Employee predict income ")
  T.manual_seed(1)
  np.random.seed(1)
  
  # 1. create DataLoader objects
  print("\nCreating Employee Dataset objects ")
  train_file = ".\\Data\\employee_train.txt"
  train_ds = EmployeeDataset(train_file)  # 200 rows

  test_file = ".\\Data\\employee_test.txt"
  test_ds = EmployeeDataset(test_file)  # 40 rows

  bat_size = 10
  train_ldr = T.utils.data.DataLoader(train_ds,
    batch_size=bat_size, shuffle=True)

  # 2. create network
  print("\nCreating 8-(10-10)-1 neural network ")
  net = Net().to(device)

# -----------------------------------------------------------

  # 3. train model
  print("\nbat_size = 10 ")
  print("loss = MSELoss() ")
  print("optimizer = Adam ")
  print("lrn_rate = 0.01 ")

  print("\nStarting training")
  net.train()
  train(net, train_ds, bs=10, lr=0.01, me=1000, le=100)
  print("Done ")

# -----------------------------------------------------------

  # 4. evaluate model accuracy
  print("\nComputing model accuracy (within 0.10 of true) ")
  net = net.eval()
  acc_train = accuracy(net, train_ds, 0.10)  # item-by-item
  print("Accuracy on train data = %0.4f" % acc_train)

  acc_test = accuracy_x(net, test_ds, 0.10)  # all-at-once
  print("Accuracy on test data = %0.4f" % acc_test)

# -----------------------------------------------------------

  # 5. make a prediction
  print("\nPredicting income for M 34 concord support: ")
  x = np.array([[-1, 0.34, 0,0,1,  0,1,0]],
    dtype=np.float32)
  x = T.tensor(x, dtype=T.float32).to(device) 

  with T.no_grad():
    pred_inc = net(x)
  pred_inc = pred_inc.item()  # scalar
  print("$%0.2f" % (pred_inc * 100_000))  # un-normalized

# -----------------------------------------------------------

  # 6. save model (state_dict approach)
  print("\nSaving trained model state")
  fn = ".\\Models\\employee_model.pt"
  # T.save(net.state_dict(), fn)

  # saved_model = Net()
  # saved_model.load_state_dict(T.load(fn))
  # use saved_model to make prediction(s)

  print("\nEnd Employee income demo")

if __name__ == "__main__":
  main()

Training data. If you copy-paste you might lose the tab-delimiters.

# employee_train.txt
#
# sex (-1 = male, 1 = female), age / 100,
# city (anaheim = 100, boulder = 010, concord = 001),
# income / 100_000,
# job type (mgmt = 100, supp = 010, tech = 001)
#
1	0.24	1	0	0	0.2950	0	0	1
-1	0.39	0	0	1	0.5120	0	1	0
1	0.63	0	1	0	0.7580	1	0	0
-1	0.36	1	0	0	0.4450	0	1	0
1	0.27	0	1	0	0.2860	0	0	1
1	0.50	0	1	0	0.5650	0	1	0
1	0.50	0	0	1	0.5500	0	1	0
-1	0.19	0	0	1	0.3270	1	0	0
1	0.22	0	1	0	0.2770	0	1	0
-1	0.39	0	0	1	0.4710	0	0	1
1	0.34	1	0	0	0.3940	0	1	0
-1	0.22	1	0	0	0.3350	1	0	0
1	0.35	0	0	1	0.3520	0	0	1
-1	0.33	0	1	0	0.4640	0	1	0
1	0.45	0	1	0	0.5410	0	1	0
1	0.42	0	1	0	0.5070	0	1	0
-1	0.33	0	1	0	0.4680	0	1	0
1	0.25	0	0	1	0.3000	0	1	0
-1	0.31	0	1	0	0.4640	1	0	0
1	0.27	1	0	0	0.3250	0	0	1
1	0.48	1	0	0	0.5400	0	1	0
-1	0.64	0	1	0	0.7130	0	0	1
1	0.61	0	1	0	0.7240	1	0	0
1	0.54	0	0	1	0.6100	1	0	0
1	0.29	1	0	0	0.3630	1	0	0
1	0.50	0	0	1	0.5500	0	1	0
1	0.55	0	0	1	0.6250	1	0	0
1	0.40	1	0	0	0.5240	1	0	0
1	0.22	1	0	0	0.2360	0	0	1
1	0.68	0	1	0	0.7840	1	0	0
-1	0.60	1	0	0	0.7170	0	0	1
-1	0.34	0	0	1	0.4650	0	1	0
-1	0.25	0	0	1	0.3710	1	0	0
-1	0.31	0	1	0	0.4890	0	1	0
1	0.43	0	0	1	0.4800	0	1	0
1	0.58	0	1	0	0.6540	0	0	1
-1	0.55	0	1	0	0.6070	0	0	1
-1	0.43	0	1	0	0.5110	0	1	0
-1	0.43	0	0	1	0.5320	0	1	0
-1	0.21	1	0	0	0.3720	1	0	0
1	0.55	0	0	1	0.6460	1	0	0
1	0.64	0	1	0	0.7480	1	0	0
-1	0.41	1	0	0	0.5880	0	1	0
1	0.64	0	0	1	0.7270	1	0	0
-1	0.56	0	0	1	0.6660	0	0	1
1	0.31	0	0	1	0.3600	0	1	0
-1	0.65	0	0	1	0.7010	0	0	1
1	0.55	0	0	1	0.6430	1	0	0
-1	0.25	1	0	0	0.4030	1	0	0
1	0.46	0	0	1	0.5100	0	1	0
-1	0.36	1	0	0	0.5350	1	0	0
1	0.52	0	1	0	0.5810	0	1	0
1	0.61	0	0	1	0.6790	1	0	0
1	0.57	0	0	1	0.6570	1	0	0
-1	0.46	0	1	0	0.5260	0	1	0
-1	0.62	1	0	0	0.6680	0	0	1
1	0.55	0	0	1	0.6270	1	0	0
-1	0.22	0	0	1	0.2770	0	1	0
-1	0.50	1	0	0	0.6290	1	0	0
-1	0.32	0	1	0	0.4180	0	1	0
-1	0.21	0	0	1	0.3560	1	0	0
1	0.44	0	1	0	0.5200	0	1	0
1	0.46	0	1	0	0.5170	0	1	0
1	0.62	0	1	0	0.6970	1	0	0
1	0.57	0	1	0	0.6640	1	0	0
-1	0.67	0	0	1	0.7580	0	0	1
1	0.29	1	0	0	0.3430	0	0	1
1	0.53	1	0	0	0.6010	1	0	0
-1	0.44	1	0	0	0.5480	0	1	0
1	0.46	0	1	0	0.5230	0	1	0
-1	0.20	0	1	0	0.3010	0	1	0
-1	0.38	1	0	0	0.5350	0	1	0
1	0.50	0	1	0	0.5860	0	1	0
1	0.33	0	1	0	0.4250	0	1	0
-1	0.33	0	1	0	0.3930	0	1	0
1	0.26	0	1	0	0.4040	1	0	0
1	0.58	1	0	0	0.7070	1	0	0
1	0.43	0	0	1	0.4800	0	1	0
-1	0.46	1	0	0	0.6440	1	0	0
1	0.60	1	0	0	0.7170	1	0	0
-1	0.42	1	0	0	0.4890	0	1	0
-1	0.56	0	0	1	0.5640	0	0	1
-1	0.62	0	1	0	0.6630	0	0	1
-1	0.50	1	0	0	0.6480	0	1	0
1	0.47	0	0	1	0.5200	0	1	0
-1	0.67	0	1	0	0.8040	0	0	1
-1	0.40	0	0	1	0.5040	0	1	0
1	0.42	0	1	0	0.4840	0	1	0
1	0.64	1	0	0	0.7200	1	0	0
-1	0.47	1	0	0	0.5870	0	0	1
1	0.45	0	1	0	0.5280	0	1	0
-1	0.25	0	0	1	0.4090	1	0	0
1	0.38	1	0	0	0.4840	1	0	0
1	0.55	0	0	1	0.6000	0	1	0
-1	0.44	1	0	0	0.6060	0	1	0
1	0.33	1	0	0	0.4100	0	1	0
1	0.34	0	0	1	0.3900	0	1	0
1	0.27	0	1	0	0.3370	0	0	1
1	0.32	0	1	0	0.4070	0	1	0
1	0.42	0	0	1	0.4700	0	1	0
-1	0.24	0	0	1	0.4030	1	0	0
1	0.42	0	1	0	0.5030	0	1	0
1	0.25	0	0	1	0.2800	0	0	1
1	0.51	0	1	0	0.5800	0	1	0
-1	0.55	0	1	0	0.6350	0	0	1
1	0.44	1	0	0	0.4780	0	0	1
-1	0.18	1	0	0	0.3980	1	0	0
-1	0.67	0	1	0	0.7160	0	0	1
1	0.45	0	0	1	0.5000	0	1	0
1	0.48	1	0	0	0.5580	0	1	0
-1	0.25	0	1	0	0.3900	0	1	0
-1	0.67	1	0	0	0.7830	0	1	0
1	0.37	0	0	1	0.4200	0	1	0
-1	0.32	1	0	0	0.4270	0	1	0
1	0.48	1	0	0	0.5700	0	1	0
-1	0.66	0	0	1	0.7500	0	0	1
1	0.61	1	0	0	0.7000	1	0	0
-1	0.58	0	0	1	0.6890	0	1	0
1	0.19	1	0	0	0.2400	0	0	1
1	0.38	0	0	1	0.4300	0	1	0
-1	0.27	1	0	0	0.3640	0	1	0
1	0.42	1	0	0	0.4800	0	1	0
1	0.60	1	0	0	0.7130	1	0	0
-1	0.27	0	0	1	0.3480	1	0	0
1	0.29	0	1	0	0.3710	1	0	0
-1	0.43	1	0	0	0.5670	0	1	0
1	0.48	1	0	0	0.5670	0	1	0
1	0.27	0	0	1	0.2940	0	0	1
-1	0.44	1	0	0	0.5520	1	0	0
1	0.23	0	1	0	0.2630	0	0	1
-1	0.36	0	1	0	0.5300	0	0	1
1	0.64	0	0	1	0.7250	1	0	0
1	0.29	0	0	1	0.3000	0	0	1
-1	0.33	1	0	0	0.4930	0	1	0
-1	0.66	0	1	0	0.7500	0	0	1
-1	0.21	0	0	1	0.3430	1	0	0
1	0.27	1	0	0	0.3270	0	0	1
1	0.29	1	0	0	0.3180	0	0	1
-1	0.31	1	0	0	0.4860	0	1	0
1	0.36	0	0	1	0.4100	0	1	0
1	0.49	0	1	0	0.5570	0	1	0
-1	0.28	1	0	0	0.3840	1	0	0
-1	0.43	0	0	1	0.5660	0	1	0
-1	0.46	0	1	0	0.5880	0	1	0
1	0.57	1	0	0	0.6980	1	0	0
-1	0.52	0	0	1	0.5940	0	1	0
-1	0.31	0	0	1	0.4350	0	1	0
-1	0.55	1	0	0	0.6200	0	0	1
1	0.50	1	0	0	0.5640	0	1	0
1	0.48	0	1	0	0.5590	0	1	0
-1	0.22	0	0	1	0.3450	1	0	0
1	0.59	0	0	1	0.6670	1	0	0
1	0.34	1	0	0	0.4280	0	0	1
-1	0.64	1	0	0	0.7720	0	0	1
1	0.29	0	0	1	0.3350	0	0	1
-1	0.34	0	1	0	0.4320	0	1	0
-1	0.61	1	0	0	0.7500	0	0	1
1	0.64	0	0	1	0.7110	1	0	0
-1	0.29	1	0	0	0.4130	1	0	0
1	0.63	0	1	0	0.7060	1	0	0
-1	0.29	0	1	0	0.4000	1	0	0
-1	0.51	1	0	0	0.6270	0	1	0
-1	0.24	0	0	1	0.3770	1	0	0
1	0.48	0	1	0	0.5750	0	1	0
1	0.18	1	0	0	0.2740	1	0	0
1	0.18	1	0	0	0.2030	0	0	1
1	0.33	0	1	0	0.3820	0	0	1
-1	0.20	0	0	1	0.3480	1	0	0
1	0.29	0	0	1	0.3300	0	0	1
-1	0.44	0	0	1	0.6300	1	0	0
-1	0.65	0	0	1	0.8180	1	0	0
-1	0.56	1	0	0	0.6370	0	0	1
-1	0.52	0	0	1	0.5840	0	1	0
-1	0.29	0	1	0	0.4860	1	0	0
-1	0.47	0	1	0	0.5890	0	1	0
1	0.68	1	0	0	0.7260	0	0	1
1	0.31	0	0	1	0.3600	0	1	0
1	0.61	0	1	0	0.6250	0	0	1
1	0.19	0	1	0	0.2150	0	0	1
1	0.38	0	0	1	0.4300	0	1	0
-1	0.26	1	0	0	0.4230	1	0	0
1	0.61	0	1	0	0.6740	1	0	0
1	0.40	1	0	0	0.4650	0	1	0
-1	0.49	1	0	0	0.6520	0	1	0
1	0.56	1	0	0	0.6750	1	0	0
-1	0.48	0	1	0	0.6600	0	1	0
1	0.52	1	0	0	0.5630	0	0	1
-1	0.18	1	0	0	0.2980	1	0	0
-1	0.56	0	0	1	0.5930	0	0	1
-1	0.52	0	1	0	0.6440	0	1	0
-1	0.18	0	1	0	0.2860	0	1	0
-1	0.58	1	0	0	0.6620	0	0	1
-1	0.39	0	1	0	0.5510	0	1	0
-1	0.46	1	0	0	0.6290	0	1	0
-1	0.40	0	1	0	0.4620	0	1	0
-1	0.60	1	0	0	0.7270	0	0	1
1	0.36	0	1	0	0.4070	0	0	1
1	0.44	1	0	0	0.5230	0	1	0
1	0.28	1	0	0	0.3130	0	0	1
1	0.54	0	0	1	0.6260	1	0	0

Test data.

# employee_test.txt
#
-1	0.51	1	0	0	0.6120	0	1	0
-1	0.32	0	1	0	0.4610	0	1	0
1	0.55	1	0	0	0.6270	1	0	0
1	0.25	0	0	1	0.2620	0	0	1
1	0.33	0	0	1	0.3730	0	0	1
-1	0.29	0	1	0	0.4620	1	0	0
1	0.65	1	0	0	0.7270	1	0	0
-1	0.43	0	1	0	0.5140	0	1	0
-1	0.54	0	1	0	0.6480	0	0	1
1	0.61	0	1	0	0.7270	1	0	0
1	0.52	0	1	0	0.6360	1	0	0
1	0.3	0	1	0	0.3350	0	0	1
1	0.29	1	0	0	0.3140	0	0	1
-1	0.47	0	0	1	0.5940	0	1	0
1	0.39	0	1	0	0.4780	0	1	0
1	0.47	0	0	1	0.5200	0	1	0
-1	0.49	1	0	0	0.5860	0	1	0
-1	0.63	0	0	1	0.6740	0	0	1
-1	0.3	1	0	0	0.3920	1	0	0
-1	0.61	0	0	1	0.6960	0	0	1
-1	0.47	0	0	1	0.5870	0	1	0
1	0.3	0	0	1	0.3450	0	0	1
-1	0.51	0	0	1	0.5800	0	1	0
-1	0.24	1	0	0	0.3880	0	1	0
-1	0.49	1	0	0	0.6450	0	1	0
1	0.66	0	0	1	0.7450	1	0	0
-1	0.65	1	0	0	0.7690	1	0	0
-1	0.46	0	1	0	0.5800	1	0	0
-1	0.45	0	0	1	0.5180	0	1	0
-1	0.47	1	0	0	0.6360	1	0	0
-1	0.29	1	0	0	0.4480	1	0	0
-1	0.57	0	0	1	0.6930	0	0	1
-1	0.2	1	0	0	0.2870	0	0	1
-1	0.35	1	0	0	0.4340	0	1	0
-1	0.61	0	0	1	0.6700	0	0	1
-1	0.31	0	0	1	0.3730	0	1	0
1	0.18	1	0	0	0.2080	0	0	1
1	0.26	0	0	1	0.2920	0	0	1
-1	0.28	1	0	0	0.3640	0	0	1
-1	0.59	0	0	1	0.6940	0	0	1
Posted in PAW, PyTorch | 1 Comment

Understanding SimCLR – Simple Contrastive Loss Representation for Image Data

I’ve been looking at an interesting research paper titled “A Simple Framework for Contrastive Learning of Visual Representations” (2020) by T. Chen, S. Kornblith, M. Norouzi, and G. Hinton. The main idea is to take unlabeled image data and use it to create an abstract representation (as in a numeric vector of, say, 500 values) of the images. This representation has no direct use, but it can be fine-tuned for downstream tasks. See my earlier post at https://jamesmccaffrey.wordpress.com/2022/04/11/an-example-of-normalized-temperature-scaled-cross-entropy-loss/.

To help me understand, I refactored the architecture diagram in the paper. Here’s the diagram from the research paper:



And here’s my version:



My explanation:

The input is a 32 x 32 CIFAR-10 color image. The image is sent twice to a sequence of three augmentations: 1.) a random part of the image (such as a 9 x 9 block of pixels) is cropped and then resized back to 32 x 32 pixels, 2.) the colors are randomly distorted, 3.) Gaussian blur is applied. The result is a pair of augmented images, x1 and x2. Note that “augmented” usually means “added to” but in the context of contrastive loss, “augmented” means “mutated”.

The pair of augmented images (x1, x2) are fed to a ResNet-50 neural network. ResNet-50 is a large neural network with 50 layers used for image classification. Intermediate results of the ResNet-50 network (h1, h2), just after the average pooling layer, are fetched rather than the final output vector of 10 values. The (h1, h2) outputs from the ResNet-50 component are abstract representations of the two augmented images. These two abstract representations could be compared by a contrastive loss function. But it was discovered that passing the representations to a simple, single-hidden-layer neural network to get a pair derived representations (z1, z2) and then feeding the derived representations to the normalized temperature-scaled cross entropy contrastive loss function works better.

The results of the loss function are used to update all the weights and biases in the SimCLR system. This results in the internal h1 and h2 representations being better. After training, the internal h-representations can be used for downstream tasks.

SimCLR architecture is an example of what’s sometimes called a Siamese network. This is because you feed two inputs to the network — conceptually it’s like there are two identical networks that share the same weights. In addition to SimCLR, there are many other specific examples of Siamese networks, but they tend to be more complicated than SimCLR. In fact, the complexity of the other Siamese architectures is what motivated the creation of SimCLR (the “Sim” stands for “simple”).


One thing that both diagrams leave out is that a SimCLR network is trained using a batch of pairs of images. The first pair are similar to each other, but the other pairs are randomly selected and are assumed to be dissimilar. The similar and dissimilar pairs are actually fed to the contrastive loss function, not just the similar pair as shown.

Interesting stuff.



Images from an Internet search for “contrastive image”. I sort of understand photography but I don’t grok photography at a deep level.


Posted in Machine Learning | Leave a comment

Autoencoder Anomaly Detection Using Keras 2.8 on Windows 11

Every few months I revisit my standard neural network examples to make sure that changes in the underlying code libraries (PyTorch, Keras/TensorFlow) haven’t introduced a breaking change(s). One of my standard examples is autoencoder anomaly detection.

The idea is to take a set of data and implement a deep neural network that predicts its input. The values of the interior hidden layer of nodes is a condensed representation of the input. The output nodes are a reconstruction of the input. Data items where the reconstructed input is very different from the associated input are anomalous in some way.

My demo uses a synthetic set of Employee data. There are x feature variables: employee sex (M, F), age, city (anaheim, boulder, concord), annual income, and job-type (mgmt, supp, tech). There are 240 items. The normalized and encoded data looks like:

# sex  age   city      income   job_type
 -1   0.27   0  1  0   0.7610   0  0  1
  1   0.19   0  0  1   0.6550   0  1  0
. . .

My demo network uses a 9-4-(2)-4-9 architecture. The input and output size are determined by the data, but the number of hidden layers and the number of nodes in each, are hyperparameters that must be determined by trial and error. The middle hidden layer, with 2 nodes, represents a condensed version of a data item. This internal representation isn’t used directly — it’s used to reconstruct a data item. The difference between a 9-value input item and its 9-value output is used to find anomalies.



A problem facing artists who want to create images of alien animals and vegetation is to find a balance between images that are too anomalous to real plants and animals (making them look implausible) and images that are not anomalous enough (making them look not alien enough). Artist Jorge Abalo creates beautiful digital renderings of alien plants — a perfect balance of anomalous to my eye.


Demo code. Replace “lt”, “gt”, “let”, “gte” with Boolean operator symbols — my lame blog editor chokes on symbols.

# employee_autoanom_tfk.py
# autoencoder reconstruction error anomaly detection
# Keras 2.8.0 in TensorFlow 2.8.0 ("_tfk")
# Anaconda3-2020.02  Python 3.7.6  Windows 10/11

import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'  # suppress CPU warn

import numpy as np
import tensorflow as tf
from tensorflow import keras as K

# -----------------------------------------------------------

class MyLogger(K.callbacks.Callback):
  def __init__(self, n):
    self.n = n   # print loss every n epochs
    # self.data_x = data_x  # for accuracy
    # self.data_y = data_y
    
  def on_epoch_end(self, epoch, logs={}):
    if epoch % self.n == 0:
      curr_loss = logs.get('loss')  # loss on curr batch
      print("epoch = %4d  |  loss = %0.6f " % \
        (epoch, curr_loss))

# -----------------------------------------------------------

def analyze_error(model, data_x):
  largest_err = 0.0
  worst_x = None
  worst_y = None
  n_features = len(data_x[0])  # 9 predictors

  for i in range(len(data_x)):
    X = data_x[i].reshape(1,-1)
    Y = model(X)
    err = tf.reduce_sum( (X-Y)*(X-Y) )  # across all predictors
    err /= n_features

    if err "gt"For  largest_err:
      largest_err = err
      worst_x = X
      worst_y = Y.numpy()

  np.set_printoptions(formatter={'float': '{: 0.4f}'.format})
  print("Largest reconstruction error: %0.4f" % largest_err)
  print("Worst data item    = ")
  print(worst_x)
  print("Its reconstruction = " )
  print(worst_y)

# -----------------------------------------------------------

def main():
  # 0. prepare
  print("\nBegin Employee autoencoder anomaly using Keras ")
  np.random.random(1)
  tf.random.set_seed(1)

  # 1. load data
  # sex age   city    income   job_type
  # -1  0.27  0 1 0   0.7610   0 0 1
  # +1  0.19  0 0 1   0.6550   1 0 0

  print("\nLoading Employee data into memory ")
  data_file = ".\\Data\\employee_all.txt"  # 240 lines
  data_x = np.loadtxt(data_file, usecols=[0,1,2,3,4,5,6,7,8],
    delimiter="\t", comments="#", dtype=np.float32)
  
# -----------------------------------------------------------

  # 2. create network
  print("\nCreating 9-4-(2)-4-9 network ")
  model = K.models.Sequential()
  model.add(K.layers.Dense(units=4, input_dim=9,
    activation='tanh', kernel_initializer='glorot_uniform',
    bias_initializer='zeros'))  # enc1
  model.add(K.layers.Dense(units=2,
    activation='tanh', kernel_initializer='glorot_uniform',
    bias_initializer='zeros'))  # enc2
  model.add(K.layers.Dense(units=4,
    activation='tanh', kernel_initializer='glorot_uniform',
    bias_initializer='zeros'))  # dec1
  model.add(K.layers.Dense(units=9,
    activation=None, kernel_initializer='glorot_uniform',
    bias_initializer='zeros'))  # dec2
  opt = K.optimizers.Adam(learning_rate=0.005)
  model.compile(loss='mean_squared_error',
    optimizer=opt, metrics=['mse'])

# -----------------------------------------------------------

  # 3. train model
  print("\nbat_size = 10 ")
  print("loss = mean_squared_error ")
  print("optimizer = Adam ")
  print("lrn_rate = 0.005 ")

  my_logger = MyLogger(100) 

  print("\nStarting training ")
  h = model.fit(data_x, data_x, batch_size=10,
    epochs=1000, verbose=0, callbacks=[my_logger])
  print("Done ")

# -----------------------------------------------------------

  # 4. find item with largest reconstruction error
  print("\nAnalyzing data for largest reconstruction error \n")
  analyze_error(model, data_x)

# -----------------------------------------------------------

  # 5. save model
  # print("\nSaving trained model ")
  # model.save_weights(".\\Models\\employee_model_wts.h5")
  # model.save(".\\Models\\employee_model.h5")

# -----------------------------------------------------------

  print("\nEnd Employee autoencoder anomaly demo ")

if __name__=="__main__":
  main()

Demo data:

# employee_all.txt
# sex (M = -1, F = +1), age / 100,
# city (anaheim = 100, boulder = 010, concord = 001),
# income / 100_00,
# job_type (mgmt = 100, supp = 010, tech = 001)
#
1	0.24	1	0	0	0.2950	0	0	1
-1	0.39	0	0	1	0.5120	0	1	0
1	0.63	0	1	0	0.7580	1	0	0
-1	0.36	1	0	0	0.4450	0	1	0
1	0.27	0	1	0	0.2860	0	0	1
1	0.50	0	1	0	0.5650	0	1	0
1	0.50	0	0	1	0.5500	0	1	0
-1	0.19	0	0	1	0.3270	1	0	0
1	0.22	0	1	0	0.2770	0	1	0
-1	0.39	0	0	1	0.4710	0	0	1
1	0.34	1	0	0	0.3940	0	1	0
-1	0.22	1	0	0	0.3350	1	0	0
1	0.35	0	0	1	0.3520	0	0	1
-1	0.33	0	1	0	0.4640	0	1	0
1	0.45	0	1	0	0.5410	0	1	0
1	0.42	0	1	0	0.5070	0	1	0
-1	0.33	0	1	0	0.4680	0	1	0
1	0.25	0	0	1	0.3000	0	1	0
-1	0.31	0	1	0	0.4640	1	0	0
1	0.27	1	0	0	0.3250	0	0	1
1	0.48	1	0	0	0.5400	0	1	0
-1	0.64	0	1	0	0.7130	0	0	1
1	0.61	0	1	0	0.7240	1	0	0
1	0.54	0	0	1	0.6100	1	0	0
1	0.29	1	0	0	0.3630	1	0	0
1	0.50	0	0	1	0.5500	0	1	0
1	0.55	0	0	1	0.6250	1	0	0
1	0.40	1	0	0	0.5240	1	0	0
1	0.22	1	0	0	0.2360	0	0	1
1	0.68	0	1	0	0.7840	1	0	0
-1	0.60	1	0	0	0.7170	0	0	1
-1	0.34	0	0	1	0.4650	0	1	0
-1	0.25	0	0	1	0.3710	1	0	0
-1	0.31	0	1	0	0.4890	0	1	0
1	0.43	0	0	1	0.4800	0	1	0
1	0.58	0	1	0	0.6540	0	0	1
-1	0.55	0	1	0	0.6070	0	0	1
-1	0.43	0	1	0	0.5110	0	1	0
-1	0.43	0	0	1	0.5320	0	1	0
-1	0.21	1	0	0	0.3720	1	0	0
1	0.55	0	0	1	0.6460	1	0	0
1	0.64	0	1	0	0.7480	1	0	0
-1	0.41	1	0	0	0.5880	0	1	0
1	0.64	0	0	1	0.7270	1	0	0
-1	0.56	0	0	1	0.6660	0	0	1
1	0.31	0	0	1	0.3600	0	1	0
-1	0.65	0	0	1	0.7010	0	0	1
1	0.55	0	0	1	0.6430	1	0	0
-1	0.25	1	0	0	0.4030	1	0	0
1	0.46	0	0	1	0.5100	0	1	0
-1	0.36	1	0	0	0.5350	1	0	0
1	0.52	0	1	0	0.5810	0	1	0
1	0.61	0	0	1	0.6790	1	0	0
1	0.57	0	0	1	0.6570	1	0	0
-1	0.46	0	1	0	0.5260	0	1	0
-1	0.62	1	0	0	0.6680	0	0	1
1	0.55	0	0	1	0.6270	1	0	0
-1	0.22	0	0	1	0.2770	0	1	0
-1	0.50	1	0	0	0.6290	1	0	0
-1	0.32	0	1	0	0.4180	0	1	0
-1	0.21	0	0	1	0.3560	1	0	0
1	0.44	0	1	0	0.5200	0	1	0
1	0.46	0	1	0	0.5170	0	1	0
1	0.62	0	1	0	0.6970	1	0	0
1	0.57	0	1	0	0.6640	1	0	0
-1	0.67	0	0	1	0.7580	0	0	1
1	0.29	1	0	0	0.3430	0	0	1
1	0.53	1	0	0	0.6010	1	0	0
-1	0.44	1	0	0	0.5480	0	1	0
1	0.46	0	1	0	0.5230	0	1	0
-1	0.20	0	1	0	0.3010	0	1	0
-1	0.38	1	0	0	0.5350	0	1	0
1	0.50	0	1	0	0.5860	0	1	0
1	0.33	0	1	0	0.4250	0	1	0
-1	0.33	0	1	0	0.3930	0	1	0
1	0.26	0	1	0	0.4040	1	0	0
1	0.58	1	0	0	0.7070	1	0	0
1	0.43	0	0	1	0.4800	0	1	0
-1	0.46	1	0	0	0.6440	1	0	0
1	0.60	1	0	0	0.7170	1	0	0
-1	0.42	1	0	0	0.4890	0	1	0
-1	0.56	0	0	1	0.5640	0	0	1
-1	0.62	0	1	0	0.6630	0	0	1
-1	0.50	1	0	0	0.6480	0	1	0
1	0.47	0	0	1	0.5200	0	1	0
-1	0.67	0	1	0	0.8040	0	0	1
-1	0.40	0	0	1	0.5040	0	1	0
1	0.42	0	1	0	0.4840	0	1	0
1	0.64	1	0	0	0.7200	1	0	0
-1	0.47	1	0	0	0.5870	0	0	1
1	0.45	0	1	0	0.5280	0	1	0
-1	0.25	0	0	1	0.4090	1	0	0
1	0.38	1	0	0	0.4840	1	0	0
1	0.55	0	0	1	0.6000	0	1	0
-1	0.44	1	0	0	0.6060	0	1	0
1	0.33	1	0	0	0.4100	0	1	0
1	0.34	0	0	1	0.3900	0	1	0
1	0.27	0	1	0	0.3370	0	0	1
1	0.32	0	1	0	0.4070	0	1	0
1	0.42	0	0	1	0.4700	0	1	0
-1	0.24	0	0	1	0.4030	1	0	0
1	0.42	0	1	0	0.5030	0	1	0
1	0.25	0	0	1	0.2800	0	0	1
1	0.51	0	1	0	0.5800	0	1	0
-1	0.55	0	1	0	0.6350	0	0	1
1	0.44	1	0	0	0.4780	0	0	1
-1	0.18	1	0	0	0.3980	1	0	0
-1	0.67	0	1	0	0.7160	0	0	1
1	0.45	0	0	1	0.5000	0	1	0
1	0.48	1	0	0	0.5580	0	1	0
-1	0.25	0	1	0	0.3900	0	1	0
-1	0.67	1	0	0	0.7830	0	1	0
1	0.37	0	0	1	0.4200	0	1	0
-1	0.32	1	0	0	0.4270	0	1	0
1	0.48	1	0	0	0.5700	0	1	0
-1	0.66	0	0	1	0.7500	0	0	1
1	0.61	1	0	0	0.7000	1	0	0
-1	0.58	0	0	1	0.6890	0	1	0
1	0.19	1	0	0	0.2400	0	0	1
1	0.38	0	0	1	0.4300	0	1	0
-1	0.27	1	0	0	0.3640	0	1	0
1	0.42	1	0	0	0.4800	0	1	0
1	0.60	1	0	0	0.7130	1	0	0
-1	0.27	0	0	1	0.3480	1	0	0
1	0.29	0	1	0	0.3710	1	0	0
-1	0.43	1	0	0	0.5670	0	1	0
1	0.48	1	0	0	0.5670	0	1	0
1	0.27	0	0	1	0.2940	0	0	1
-1	0.44	1	0	0	0.5520	1	0	0
1	0.23	0	1	0	0.2630	0	0	1
-1	0.36	0	1	0	0.5300	0	0	1
1	0.64	0	0	1	0.7250	1	0	0
1	0.29	0	0	1	0.3000	0	0	1
-1	0.33	1	0	0	0.4930	0	1	0
-1	0.66	0	1	0	0.7500	0	0	1
-1	0.21	0	0	1	0.3430	1	0	0
1	0.27	1	0	0	0.3270	0	0	1
1	0.29	1	0	0	0.3180	0	0	1
-1	0.31	1	0	0	0.4860	0	1	0
1	0.36	0	0	1	0.4100	0	1	0
1	0.49	0	1	0	0.5570	0	1	0
-1	0.28	1	0	0	0.3840	1	0	0
-1	0.43	0	0	1	0.5660	0	1	0
-1	0.46	0	1	0	0.5880	0	1	0
1	0.57	1	0	0	0.6980	1	0	0
-1	0.52	0	0	1	0.5940	0	1	0
-1	0.31	0	0	1	0.4350	0	1	0
-1	0.55	1	0	0	0.6200	0	0	1
1	0.50	1	0	0	0.5640	0	1	0
1	0.48	0	1	0	0.5590	0	1	0
-1	0.22	0	0	1	0.3450	1	0	0
1	0.59	0	0	1	0.6670	1	0	0
1	0.34	1	0	0	0.4280	0	0	1
-1	0.64	1	0	0	0.7720	0	0	1
1	0.29	0	0	1	0.3350	0	0	1
-1	0.34	0	1	0	0.4320	0	1	0
-1	0.61	1	0	0	0.7500	0	0	1
1	0.64	0	0	1	0.7110	1	0	0
-1	0.29	1	0	0	0.4130	1	0	0
1	0.63	0	1	0	0.7060	1	0	0
-1	0.29	0	1	0	0.4000	1	0	0
-1	0.51	1	0	0	0.6270	0	1	0
-1	0.24	0	0	1	0.3770	1	0	0
1	0.48	0	1	0	0.5750	0	1	0
1	0.18	1	0	0	0.2740	1	0	0
1	0.18	1	0	0	0.2030	0	0	1
1	0.33	0	1	0	0.3820	0	0	1
-1	0.20	0	0	1	0.3480	1	0	0
1	0.29	0	0	1	0.3300	0	0	1
-1	0.44	0	0	1	0.6300	1	0	0
-1	0.65	0	0	1	0.8180	1	0	0
-1	0.56	1	0	0	0.6370	0	0	1
-1	0.52	0	0	1	0.5840	0	1	0
-1	0.29	0	1	0	0.4860	1	0	0
-1	0.47	0	1	0	0.5890	0	1	0
1	0.68	1	0	0	0.7260	0	0	1
1	0.31	0	0	1	0.3600	0	1	0
1	0.61	0	1	0	0.6250	0	0	1
1	0.19	0	1	0	0.2150	0	0	1
1	0.38	0	0	1	0.4300	0	1	0
-1	0.26	1	0	0	0.4230	1	0	0
1	0.61	0	1	0	0.6740	1	0	0
1	0.40	1	0	0	0.4650	0	1	0
-1	0.49	1	0	0	0.6520	0	1	0
1	0.56	1	0	0	0.6750	1	0	0
-1	0.48	0	1	0	0.6600	0	1	0
1	0.52	1	0	0	0.5630	0	0	1
-1	0.18	1	0	0	0.2980	1	0	0
-1	0.56	0	0	1	0.5930	0	0	1
-1	0.52	0	1	0	0.6440	0	1	0
-1	0.18	0	1	0	0.2860	0	1	0
-1	0.58	1	0	0	0.6620	0	0	1
-1	0.39	0	1	0	0.5510	0	1	0
-1	0.46	1	0	0	0.6290	0	1	0
-1	0.40	0	1	0	0.4620	0	1	0
-1	0.60	1	0	0	0.7270	0	0	1
1	0.36	0	1	0	0.4070	0	0	1
1	0.44	1	0	0	0.5230	0	1	0
1	0.28	1	0	0	0.3130	0	0	1
1	0.54	0	0	1	0.6260	1	0	0
-1	0.51	1	0	0	0.6120	0	1	0
-1	0.32	0	1	0	0.4610	0	1	0
1	0.55	1	0	0	0.6270	1	0	0
1	0.25	0	0	1	0.2620	0	0	1
1	0.33	0	0	1	0.3730	0	0	1
-1	0.29	0	1	0	0.4620	1	0	0
1	0.65	1	0	0	0.7270	1	0	0
-1	0.43	0	1	0	0.5140	0	1	0
-1	0.54	0	1	0	0.6480	0	0	1
1	0.61	0	1	0	0.7270	1	0	0
1	0.52	0	1	0	0.6360	1	0	0
1	0.3	0	1	0	0.3350	0	0	1
1	0.29	1	0	0	0.3140	0	0	1
-1	0.47	0	0	1	0.5940	0	1	0
1	0.39	0	1	0	0.4780	0	1	0
1	0.47	0	0	1	0.5200	0	1	0
-1	0.49	1	0	0	0.5860	0	1	0
-1	0.63	0	0	1	0.6740	0	0	1
-1	0.3	1	0	0	0.3920	1	0	0
-1	0.61	0	0	1	0.6960	0	0	1
-1	0.47	0	0	1	0.5870	0	1	0
1	0.3	0	0	1	0.3450	0	0	1
-1	0.51	0	0	1	0.5800	0	1	0
-1	0.24	1	0	0	0.3880	0	1	0
-1	0.49	1	0	0	0.6450	0	1	0
1	0.66	0	0	1	0.7450	1	0	0
-1	0.65	1	0	0	0.7690	1	0	0
-1	0.46	0	1	0	0.5800	1	0	0
-1	0.45	0	0	1	0.5180	0	1	0
-1	0.47	1	0	0	0.6360	1	0	0
-1	0.29	1	0	0	0.4480	1	0	0
-1	0.57	0	0	1	0.6930	0	0	1
-1	0.2	1	0	0	0.2870	0	0	1
-1	0.35	1	0	0	0.4340	0	1	0
-1	0.61	0	0	1	0.6700	0	0	1
-1	0.31	0	0	1	0.3730	0	1	0
1	0.18	1	0	0	0.2080	0	0	1
1	0.26	0	0	1	0.2920	0	0	1
-1	0.28	1	0	0	0.3640	0	0	1
-1	0.59	0	0	1	0.6940	0	0	1
Posted in Keras, PAW | Leave a comment

Logistic Regression for the Banknote Problem Using Raw Python

Every few months I implement a logistic regression (binary classification) model using raw Python (or some other language). The idea is that coding is a skill that must be practiced. One rainy Pacific Northwest afternoon, I zapped out logistic regression for the Banknote Authentication (BA) problem.

The goal of the BA problem is to predict if a banknote (think euro or dollar bill) is real/authentic (class 0) or fake/forgery (class 1). The raw data is available in several places on the Internet and looks like:

3.6216, 8.6661, -2.8073, -0.44699, 0
4.5459, 8.1674, -2.4586, -1.4621, 0
. . . 
-3.5637, -8.3827, 12.393, -1.2823, 1
-2.5419, -0.65804, 2.6842, 1.1952, 1

There are 1,372 data items. There are four predictor variables derived from a digital image of each banknote: variance, skewness, kurtosis, entropy. I broke the dataset into a 1,000-item set for training and a 372-item set by testing. I normalized all predictor values by dividing each by 20.0 so that the normalized values are all between 0.0 and 1.0 and replaced comma delimiters with tabs. The normalized data looks like:

-0.177550    0.094775   0.009325   -0.122045   1
 0.065570    0.227310   0.114675    0.011271   0
-0.200865   -0.415615   0.622735   -0.071875   1
. . . 

There are many design possibilities for logistic regression. I opted for simplicity and just maintained an array of weights (one for each predictor) and a bias value. Therefore, creating the LR model is:

print("Creating logistic regression model ")
wts = np.zeros(4)  # one wt per predictor
lo = -0.01; hi = 0.01
for i in range(len(wts)):
  wts[i] = (hi - lo) * np.random.random() + lo
bias = 0.00

I implemented a compute_outpute() function as:

def compute_output(w, b, x):
  # input x using weights w and bias b
  z = 0.0
  for i in range(len(w)):
    z += w[i] * x[i]
  z += b
  p = 1.0 / (1.0 + np.exp(-z))  # logistic sigmoid
  return p

Anyway, it was a fun exercise and, as always, I gained some new insights into the details of logistic regression.



Some fake upscale brand watches are very difficult to distinguish from authentic. But some fakes are easy to identify. Left: This Rolex is creative but not convincing. Center: I strongly suspect that Ghetto University is not legit. Right: An Apple watch — pretty punny.


Demo code: (replace “lt”, “gt”, lte”, “gte” with Boolean operator symbols — my blog editor chokes on them)

# banknote_logreg.py

# predict real (0) or forgery (1) from
# variance, skewness, kurtosis, entropy (all div by 20.0)
# data:
# -0.177550  0.094775  0.009325  -0.122045   1
#  0.065570  0.227310  0.114675   0.011271   0

# Anaconda3-2020.02  Python 3.7.6
# Windows 10/11

import numpy as np

# -----------------------------------------------------------

def compute_output(w, b, x):
  # input x using weights w and bias b
  z = 0.0
  for i in range(len(w)):
    z += w[i] * x[i]
  z += b
  p = 1.0 / (1.0 + np.exp(-z))  # logistic sigmoid
  return p

# -----------------------------------------------------------

def accuracy(w, b, data_x, data_y):
  n_correct = 0; n_wrong = 0
  for i in range(len(data_x)):
    x = data_x[i]  # inputs
    y = int(data_y[i])  # target 0 or 1
    p = compute_output(w, b, x)
    if (y == 0 and p = 0.5):
      n_correct += 1
    else:
      n_wrong += 1
  acc = (n_correct * 1.0) / (n_correct + n_wrong)
  return acc

# -----------------------------------------------------------

def mse_loss(w, b, data_x, data_y):
  sum = 0.0
  for i in range(len(data_x)):
    x = data_x[i]  # inputs
    y = int(data_y[i])  # target 0 or 1
    p = compute_output(w, b, x)
    sum += (y - p) * (y - p)
  mse = sum / len(data_x)
  return mse

# -----------------------------------------------------------

def main():
  # 0. get ready
  print("\nBegin logistic regression with raw Python demo ")
  np.random.seed(1)

  # 1. load data
  print("\nLoading Banknote train and test data to memory ")
  # variance, skewness, kurtosis, entropy (all div by 20.0)
  # 0 = real, 1 = forgery
  # -0.177550  0.094775  0.009325  -0.122045   1
  #  0.065570  0.227310  0.114675   0.011271   0

  train_file = ".\\Data\\banknote_train.txt"
  train_xy = np.loadtxt(train_file, usecols=range(0,5),
    delimiter="\t", comments="#",  dtype=np.float32) 
  train_x = train_xy[:,0:4]
  train_y = train_xy[:,4]

  test_file = ".\\Data\\banknote_test.txt"
  test_xy = np.loadtxt(test_file, usecols=range(0,5),
    delimiter="\t", comments="#", dtype=np.float32)
  test_x = test_xy[:,0:4]
  test_y = test_xy[:,4]

# -----------------------------------------------------------

  # 2. create model
  print("\nCreating logistic regression model ")
  wts = np.zeros(4)  # one wt per predictor
  lo = -0.01; hi = 0.01
  for i in range(len(wts)):
    wts[i] = (hi - lo) * np.random.random() + lo
  bias = 0.00

# -----------------------------------------------------------

  # 3. train model
  lrn_rate = 0.01
  max_epochs = 100
  indices = np.arange(len(train_x))  # [0, 1, .. 999]
  print("\nTraining using SGD with lrn_rate = %0.4f " % lrn_rate)
  for epoch in range(max_epochs):
    np.random.shuffle(indices)
    for i in indices:
      x = train_x[i]  # inputs
      y = train_y[i]  # target 0.0 or 1.0
      p = compute_output(wts, bias, x)

      # update all wts and the bias
      for j in range(len(wts)):
        wts[j] += lrn_rate * x[j] * (y - p)  # target - oupt
      bias += lrn_rate * (y - p)
    if epoch % 10 == 0:
      loss = mse_loss(wts, bias, train_x, train_y)
      print("epoch = %5d  |  loss = %9.4f " % (epoch, loss))
  print("Done ")

# -----------------------------------------------------------

  # 4. evaluate model
  print("\nEvaluating trained model ")
  acc_train = accuracy(wts, bias, train_x, train_y)
  print("Accuracy on train data: %0.4f " % acc_train)
  acc_test = accuracy(wts, bias, test_x, test_y)
  print("Accuracy on test data: %0.4f " % acc_test)

  # 5. use model
  print("\nPrediction for [0.2, 0.3, 0.5, 0.7] banknote: ")
  x = np.array([0.2, 0.3, 0.5, 0.7], dtype=np.float32)
  p = compute_output(wts, bias, x)
  print("%0.8f " % p)
  if p "lt" 0.5:  # replace here
    print("class 0 (real) ")
  else:
    print("class 1 (forgery) ") 

  # 6. TODO: save trained weights and bias to file

  print("\nEnd Banknote logistic regression demo ")

if __name__ == "__main__":
  main()

Training data. You might lose the tabs if you copy-paste.

Continue reading

Posted in Machine Learning, PAW | Leave a comment

Autoencoder Anomaly Detection Using PyTorch 1.10 on Windows 11

Every few months I revisit my standard neural network examples to make sure that changes in the underlying code libraries (PyTorch, Keras/TensorFlow) haven’t introduced a breaking change(s). One of my standard examples is autoencoder anomaly detection.

The idea is to take a set of data and implement a deep neural network that predicts its input. The values of the interior hidden layer of nodes is a condensed representation of the input. The output nodes are a reconstruction of the input. Data items where the reconstructed input is very different from the associated input are anomalous in some way.

My demo uses a synthetic set of Employee data. There are five feature variables: employee sex (M, F), age, city (anaheim, boulder, concord), annual income, and job-type (mgmt, supp, tech). There are 240 items. The normalized and encoded data looks like:

# sex  age   city      income   job_type
 -1   0.27   0  1  0   0.7610   0  0  1
  1   0.19   0  0  1   0.6550   0  1  0
. . .

My demo network uses a 9-4-(2)-4-9 architecture. The input and output size is determined by the data, but the number of hidden layers and the number of nodes in each, are hyperparameters that must be determined by trial and error.



I love to observe people and things, especially in Las Vegas. On a recent trip to speak at a tech conference, I noticed that electronic versions of games such as Blackjack, Roulette, and Craps display results of recent games. This encourages players to seek out and bet on anomalies — results that appear less than expected or more than expected. Left: This craps game at the MGM Grand shows “hot” numbers and “cold” numbers. Right: The Fortune Cup horse race game shows the results of the most recent 40 races. Fascinating.


Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols — my weak blog editor chokes on symbols.

# employee_auto_anom.py
# autoencoder reconstruction error anomaly detection
# PyTorch 1.10.0-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10/11

import numpy as np
import torch as T

device = T.device('cpu') 

# -----------------------------------------------------------

class EmployeeDataset(T.utils.data.Dataset):
  # sex  age   city     income  job
  # -1   0.27  0  1  0  0.7610  0  0  1
  # +1   0.19  0  0  1  0.6550  0  1  0
  # sex: -1 = male, +1 = female
  # city: anaheim, boulder, concord
  # job: mgmt, supp, tech

  def __init__(self, src_file):
    tmp_x = np.loadtxt(src_file, usecols=range(0,9),
      delimiter="\t", comments="#", dtype=np.float32)
    self.x_data = T.tensor(tmp_x, dtype=T.float32).to(device)

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    preds = self.x_data[idx, :]  # row idx, all cols
    sample = { 'predictors' : preds }  # as Dictionary
    return sample

# -----------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.enc1 = T.nn.Linear(9, 4)  # 9-4-2-4-9
    self.enc2 = T.nn.Linear(4, 2)

    self.dec1 = T.nn.Linear(2, 4)
    self.dec2 = T.nn.Linear(4, 9)

    T.nn.init.xavier_uniform_(self.enc1.weight)
    T.nn.init.zeros_(self.enc1.bias)
    T.nn.init.xavier_uniform_(self.enc2.weight)
    T.nn.init.zeros_(self.enc2.bias)
    T.nn.init.xavier_uniform_(self.dec1.weight)
    T.nn.init.zeros_(self.dec1.bias)
    T.nn.init.xavier_uniform_(self.dec2.weight)
    T.nn.init.zeros_(self.dec2.bias)

  def forward(self, x):
    z = T.tanh(self.enc1(x))
    z = T.tanh(self.enc2(z))
    z = T.tanh(self.dec1(z))
    z = self.dec2(z)  # no activation
    return z

# -----------------------------------------------------------

def analyze_error(model, ds):
  largest_err = 0.0
  worst_x = None
  worst_y = None
  n_features = len(ds[0]['predictors'])

  for i in range(len(ds)):
    X = ds[i]['predictors']
    with T.no_grad():
      Y = model(X)  # should be same as X
    err = T.sum((X-Y)*(X-Y)).item()  # SSE all features
    err = err / n_features           # sort of norm'ed SSE 

    if err "gt" largest_err:
      largest_err = err
      worst_x = X
      worst_y = Y

  np.set_printoptions(formatter={'float': '{: 0.4f}'.format})
  print("Largest reconstruction error: %0.4f" % largest_err)
  print("Worst data item    = ")
  print(worst_x.numpy())
  print("Its reconstruction = " )
  print(worst_y.numpy())

# -----------------------------------------------------------

def main():
  # 0. get started
  print("\nBegin Employee autoencoder anomaly detection ")
  T.manual_seed(2)
  np.random.seed(2)
  
  # 1. create DataLoader objects
  print("\nCreating Employee Dataset ")

  data_file = ".\\Data\\employee_all.txt"
  data_ds = EmployeeDataset(data_file)  # all 240 rows

  bat_size = 20
  data_ldr = T.utils.data.DataLoader(data_ds,
    batch_size=bat_size, shuffle=True)

  # 2. create network
  print("\nCreating 9-4-(2)-4-9 network ")
  net = Net().to(device)

# -----------------------------------------------------------

  # 3. train autoencoder model
  max_epochs = 1000
  ep_log_interval = 100
  lrn_rate = 0.005

  loss_func = T.nn.MSELoss()
  optimizer = T.optim.Adam(net.parameters(), lr=lrn_rate)

  print("\nbat_size = %3d " % bat_size)
  print("loss = " + str(loss_func))
  print("optimizer = Adam")
  print("lrn_rate = %0.3f " % lrn_rate)
  print("max_epochs = %3d " % max_epochs)
  

  print("\nStarting training")
  net.train()
  for epoch in range(0, max_epochs):
    epoch_loss = 0  # for one full epoch

    for (batch_idx, batch) in enumerate(data_ldr):
      X = batch['predictors'] 
      Y = batch['predictors'] 

      optimizer.zero_grad()
      oupt = net(X)
      loss_val = loss_func(oupt, Y)  # a tensor
      epoch_loss += loss_val.item()  # accumulate
      loss_val.backward()
      optimizer.step()

    if epoch % ep_log_interval == 0:
      print("epoch = %4d  |  loss = %0.4f" % \
       (epoch, epoch_loss))
  print("Done ")

# -----------------------------------------------------------

  # 4. find item with largest reconstruction error
  print("\nAnalyzing data for largest reconstruction error \n")
  net.eval()
  analyze_error(net, data_ds)

  print("\nEnd Employee autoencoder anomaly demo ")

if __name__ == "__main__":
  main()

Demo data:

# employee_all.txt
# sex (M = -1, F = +1), age / 100,
# city (anaheim = 100, boulder = 010, concord = 001),
# income / 100_00,
# job_type (mgmt = 100, supp = 010, tech = 001)
#
1	0.24	1	0	0	0.2950	0	0	1
-1	0.39	0	0	1	0.5120	0	1	0
1	0.63	0	1	0	0.7580	1	0	0
-1	0.36	1	0	0	0.4450	0	1	0
1	0.27	0	1	0	0.2860	0	0	1
1	0.50	0	1	0	0.5650	0	1	0
1	0.50	0	0	1	0.5500	0	1	0
-1	0.19	0	0	1	0.3270	1	0	0
1	0.22	0	1	0	0.2770	0	1	0
-1	0.39	0	0	1	0.4710	0	0	1
1	0.34	1	0	0	0.3940	0	1	0
-1	0.22	1	0	0	0.3350	1	0	0
1	0.35	0	0	1	0.3520	0	0	1
-1	0.33	0	1	0	0.4640	0	1	0
1	0.45	0	1	0	0.5410	0	1	0
1	0.42	0	1	0	0.5070	0	1	0
-1	0.33	0	1	0	0.4680	0	1	0
1	0.25	0	0	1	0.3000	0	1	0
-1	0.31	0	1	0	0.4640	1	0	0
1	0.27	1	0	0	0.3250	0	0	1
1	0.48	1	0	0	0.5400	0	1	0
-1	0.64	0	1	0	0.7130	0	0	1
1	0.61	0	1	0	0.7240	1	0	0
1	0.54	0	0	1	0.6100	1	0	0
1	0.29	1	0	0	0.3630	1	0	0
1	0.50	0	0	1	0.5500	0	1	0
1	0.55	0	0	1	0.6250	1	0	0
1	0.40	1	0	0	0.5240	1	0	0
1	0.22	1	0	0	0.2360	0	0	1
1	0.68	0	1	0	0.7840	1	0	0
-1	0.60	1	0	0	0.7170	0	0	1
-1	0.34	0	0	1	0.4650	0	1	0
-1	0.25	0	0	1	0.3710	1	0	0
-1	0.31	0	1	0	0.4890	0	1	0
1	0.43	0	0	1	0.4800	0	1	0
1	0.58	0	1	0	0.6540	0	0	1
-1	0.55	0	1	0	0.6070	0	0	1
-1	0.43	0	1	0	0.5110	0	1	0
-1	0.43	0	0	1	0.5320	0	1	0
-1	0.21	1	0	0	0.3720	1	0	0
1	0.55	0	0	1	0.6460	1	0	0
1	0.64	0	1	0	0.7480	1	0	0
-1	0.41	1	0	0	0.5880	0	1	0
1	0.64	0	0	1	0.7270	1	0	0
-1	0.56	0	0	1	0.6660	0	0	1
1	0.31	0	0	1	0.3600	0	1	0
-1	0.65	0	0	1	0.7010	0	0	1
1	0.55	0	0	1	0.6430	1	0	0
-1	0.25	1	0	0	0.4030	1	0	0
1	0.46	0	0	1	0.5100	0	1	0
-1	0.36	1	0	0	0.5350	1	0	0
1	0.52	0	1	0	0.5810	0	1	0
1	0.61	0	0	1	0.6790	1	0	0
1	0.57	0	0	1	0.6570	1	0	0
-1	0.46	0	1	0	0.5260	0	1	0
-1	0.62	1	0	0	0.6680	0	0	1
1	0.55	0	0	1	0.6270	1	0	0
-1	0.22	0	0	1	0.2770	0	1	0
-1	0.50	1	0	0	0.6290	1	0	0
-1	0.32	0	1	0	0.4180	0	1	0
-1	0.21	0	0	1	0.3560	1	0	0
1	0.44	0	1	0	0.5200	0	1	0
1	0.46	0	1	0	0.5170	0	1	0
1	0.62	0	1	0	0.6970	1	0	0
1	0.57	0	1	0	0.6640	1	0	0
-1	0.67	0	0	1	0.7580	0	0	1
1	0.29	1	0	0	0.3430	0	0	1
1	0.53	1	0	0	0.6010	1	0	0
-1	0.44	1	0	0	0.5480	0	1	0
1	0.46	0	1	0	0.5230	0	1	0
-1	0.20	0	1	0	0.3010	0	1	0
-1	0.38	1	0	0	0.5350	0	1	0
1	0.50	0	1	0	0.5860	0	1	0
1	0.33	0	1	0	0.4250	0	1	0
-1	0.33	0	1	0	0.3930	0	1	0
1	0.26	0	1	0	0.4040	1	0	0
1	0.58	1	0	0	0.7070	1	0	0
1	0.43	0	0	1	0.4800	0	1	0
-1	0.46	1	0	0	0.6440	1	0	0
1	0.60	1	0	0	0.7170	1	0	0
-1	0.42	1	0	0	0.4890	0	1	0
-1	0.56	0	0	1	0.5640	0	0	1
-1	0.62	0	1	0	0.6630	0	0	1
-1	0.50	1	0	0	0.6480	0	1	0
1	0.47	0	0	1	0.5200	0	1	0
-1	0.67	0	1	0	0.8040	0	0	1
-1	0.40	0	0	1	0.5040	0	1	0
1	0.42	0	1	0	0.4840	0	1	0
1	0.64	1	0	0	0.7200	1	0	0
-1	0.47	1	0	0	0.5870	0	0	1
1	0.45	0	1	0	0.5280	0	1	0
-1	0.25	0	0	1	0.4090	1	0	0
1	0.38	1	0	0	0.4840	1	0	0
1	0.55	0	0	1	0.6000	0	1	0
-1	0.44	1	0	0	0.6060	0	1	0
1	0.33	1	0	0	0.4100	0	1	0
1	0.34	0	0	1	0.3900	0	1	0
1	0.27	0	1	0	0.3370	0	0	1
1	0.32	0	1	0	0.4070	0	1	0
1	0.42	0	0	1	0.4700	0	1	0
-1	0.24	0	0	1	0.4030	1	0	0
1	0.42	0	1	0	0.5030	0	1	0
1	0.25	0	0	1	0.2800	0	0	1
1	0.51	0	1	0	0.5800	0	1	0
-1	0.55	0	1	0	0.6350	0	0	1
1	0.44	1	0	0	0.4780	0	0	1
-1	0.18	1	0	0	0.3980	1	0	0
-1	0.67	0	1	0	0.7160	0	0	1
1	0.45	0	0	1	0.5000	0	1	0
1	0.48	1	0	0	0.5580	0	1	0
-1	0.25	0	1	0	0.3900	0	1	0
-1	0.67	1	0	0	0.7830	0	1	0
1	0.37	0	0	1	0.4200	0	1	0
-1	0.32	1	0	0	0.4270	0	1	0
1	0.48	1	0	0	0.5700	0	1	0
-1	0.66	0	0	1	0.7500	0	0	1
1	0.61	1	0	0	0.7000	1	0	0
-1	0.58	0	0	1	0.6890	0	1	0
1	0.19	1	0	0	0.2400	0	0	1
1	0.38	0	0	1	0.4300	0	1	0
-1	0.27	1	0	0	0.3640	0	1	0
1	0.42	1	0	0	0.4800	0	1	0
1	0.60	1	0	0	0.7130	1	0	0
-1	0.27	0	0	1	0.3480	1	0	0
1	0.29	0	1	0	0.3710	1	0	0
-1	0.43	1	0	0	0.5670	0	1	0
1	0.48	1	0	0	0.5670	0	1	0
1	0.27	0	0	1	0.2940	0	0	1
-1	0.44	1	0	0	0.5520	1	0	0
1	0.23	0	1	0	0.2630	0	0	1
-1	0.36	0	1	0	0.5300	0	0	1
1	0.64	0	0	1	0.7250	1	0	0
1	0.29	0	0	1	0.3000	0	0	1
-1	0.33	1	0	0	0.4930	0	1	0
-1	0.66	0	1	0	0.7500	0	0	1
-1	0.21	0	0	1	0.3430	1	0	0
1	0.27	1	0	0	0.3270	0	0	1
1	0.29	1	0	0	0.3180	0	0	1
-1	0.31	1	0	0	0.4860	0	1	0
1	0.36	0	0	1	0.4100	0	1	0
1	0.49	0	1	0	0.5570	0	1	0
-1	0.28	1	0	0	0.3840	1	0	0
-1	0.43	0	0	1	0.5660	0	1	0
-1	0.46	0	1	0	0.5880	0	1	0
1	0.57	1	0	0	0.6980	1	0	0
-1	0.52	0	0	1	0.5940	0	1	0
-1	0.31	0	0	1	0.4350	0	1	0
-1	0.55	1	0	0	0.6200	0	0	1
1	0.50	1	0	0	0.5640	0	1	0
1	0.48	0	1	0	0.5590	0	1	0
-1	0.22	0	0	1	0.3450	1	0	0
1	0.59	0	0	1	0.6670	1	0	0
1	0.34	1	0	0	0.4280	0	0	1
-1	0.64	1	0	0	0.7720	0	0	1
1	0.29	0	0	1	0.3350	0	0	1
-1	0.34	0	1	0	0.4320	0	1	0
-1	0.61	1	0	0	0.7500	0	0	1
1	0.64	0	0	1	0.7110	1	0	0
-1	0.29	1	0	0	0.4130	1	0	0
1	0.63	0	1	0	0.7060	1	0	0
-1	0.29	0	1	0	0.4000	1	0	0
-1	0.51	1	0	0	0.6270	0	1	0
-1	0.24	0	0	1	0.3770	1	0	0
1	0.48	0	1	0	0.5750	0	1	0
1	0.18	1	0	0	0.2740	1	0	0
1	0.18	1	0	0	0.2030	0	0	1
1	0.33	0	1	0	0.3820	0	0	1
-1	0.20	0	0	1	0.3480	1	0	0
1	0.29	0	0	1	0.3300	0	0	1
-1	0.44	0	0	1	0.6300	1	0	0
-1	0.65	0	0	1	0.8180	1	0	0
-1	0.56	1	0	0	0.6370	0	0	1
-1	0.52	0	0	1	0.5840	0	1	0
-1	0.29	0	1	0	0.4860	1	0	0
-1	0.47	0	1	0	0.5890	0	1	0
1	0.68	1	0	0	0.7260	0	0	1
1	0.31	0	0	1	0.3600	0	1	0
1	0.61	0	1	0	0.6250	0	0	1
1	0.19	0	1	0	0.2150	0	0	1
1	0.38	0	0	1	0.4300	0	1	0
-1	0.26	1	0	0	0.4230	1	0	0
1	0.61	0	1	0	0.6740	1	0	0
1	0.40	1	0	0	0.4650	0	1	0
-1	0.49	1	0	0	0.6520	0	1	0
1	0.56	1	0	0	0.6750	1	0	0
-1	0.48	0	1	0	0.6600	0	1	0
1	0.52	1	0	0	0.5630	0	0	1
-1	0.18	1	0	0	0.2980	1	0	0
-1	0.56	0	0	1	0.5930	0	0	1
-1	0.52	0	1	0	0.6440	0	1	0
-1	0.18	0	1	0	0.2860	0	1	0
-1	0.58	1	0	0	0.6620	0	0	1
-1	0.39	0	1	0	0.5510	0	1	0
-1	0.46	1	0	0	0.6290	0	1	0
-1	0.40	0	1	0	0.4620	0	1	0
-1	0.60	1	0	0	0.7270	0	0	1
1	0.36	0	1	0	0.4070	0	0	1
1	0.44	1	0	0	0.5230	0	1	0
1	0.28	1	0	0	0.3130	0	0	1
1	0.54	0	0	1	0.6260	1	0	0
-1	0.51	1	0	0	0.6120	0	1	0
-1	0.32	0	1	0	0.4610	0	1	0
1	0.55	1	0	0	0.6270	1	0	0
1	0.25	0	0	1	0.2620	0	0	1
1	0.33	0	0	1	0.3730	0	0	1
-1	0.29	0	1	0	0.4620	1	0	0
1	0.65	1	0	0	0.7270	1	0	0
-1	0.43	0	1	0	0.5140	0	1	0
-1	0.54	0	1	0	0.6480	0	0	1
1	0.61	0	1	0	0.7270	1	0	0
1	0.52	0	1	0	0.6360	1	0	0
1	0.3	0	1	0	0.3350	0	0	1
1	0.29	1	0	0	0.3140	0	0	1
-1	0.47	0	0	1	0.5940	0	1	0
1	0.39	0	1	0	0.4780	0	1	0
1	0.47	0	0	1	0.5200	0	1	0
-1	0.49	1	0	0	0.5860	0	1	0
-1	0.63	0	0	1	0.6740	0	0	1
-1	0.3	1	0	0	0.3920	1	0	0
-1	0.61	0	0	1	0.6960	0	0	1
-1	0.47	0	0	1	0.5870	0	1	0
1	0.3	0	0	1	0.3450	0	0	1
-1	0.51	0	0	1	0.5800	0	1	0
-1	0.24	1	0	0	0.3880	0	1	0
-1	0.49	1	0	0	0.6450	0	1	0
1	0.66	0	0	1	0.7450	1	0	0
-1	0.65	1	0	0	0.7690	1	0	0
-1	0.46	0	1	0	0.5800	1	0	0
-1	0.45	0	0	1	0.5180	0	1	0
-1	0.47	1	0	0	0.6360	1	0	0
-1	0.29	1	0	0	0.4480	1	0	0
-1	0.57	0	0	1	0.6930	0	0	1
-1	0.2	1	0	0	0.2870	0	0	1
-1	0.35	1	0	0	0.4340	0	1	0
-1	0.61	0	0	1	0.6700	0	0	1
-1	0.31	0	0	1	0.3730	0	1	0
1	0.18	1	0	0	0.2080	0	0	1
1	0.26	0	0	1	0.2920	0	0	1
-1	0.28	1	0	0	0.3640	0	0	1
-1	0.59	0	0	1	0.6940	0	0	1
Posted in PAW, PyTorch | Leave a comment

Naive Bayes Classification Using C# in Visual Studio Magazine

I wrote an article titled “Naive Bayes Classification Using C#” in the May 2022 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2022/05/02/naive-bayes-classification-csharp.aspx.

I present a complete demo program. The demo uses a set of 40 data items where each item consists of a person’s occupation (actor, baker, clerk or diver), eye color (green or hazel), country (Italy, Japan or Korea), and their personality optimism score (0, 1 or 2). You want to predict a person’s optimism score from their occupation, eye color and country. (This is an example of multiclass classification because the variable to predict, optimism, has three or more possible values.)

The first few data items look like:

actor  green  korea  1
baker  green  italy  0
diver  hazel  japan  0
diver  green  japan  1
clerk  hazel  japan  2
. . . 

The demo sets up an item to predict as (“baker”, “hazel”, “italy”). Next, the demo scans through the data and computes and displays smoothed (“add 1”) joint counts. For example, the 5 in the screenshot means that there are 4 bakers who have optimism class = 0.

The demo computes the raw, unsmoothed class counts as (19, 14, 7). This means there are 19 people with optimism class = 0, 14 people with class = 1, and 7 people with class = 2. Notice that 19 + 14 + 7 = 40, the number of data items.

The smoothed joint counts and the raw class counts are combined mathematically to produce evidence terms of (0.0027, 0.0013, 0.0021). These correspond to the likelihoods of class (0, 1, 2). Because the largest evidence value is at index [0], the prediction for the (“baker”, “hazel”, “italy”) person is class 0.

Evidence terms are somewhat difficult to interpret so the demo converts the three evidence terms to pseudo-probabilities: (0.4418, 0.2116, 0.3466). The values are not true mathematical probabilities but because they sum to 1.0 they can loosely be interpreted as probabilities. The largest probability is at index [0].

Naive Bayes classification is called “naive” because it analyzes each predictor column independently. This doesn’t take into account interactions between predictor values. For example, in the demo data, maybe clerks who have green eyes might have some special characteristics. The technique is “Bayesian” because the math is based on observed counts of data rather than some underlying theory.

The technique presented in the article works only with categorical data. There are other forms of naive Bayes classification that can handle numeric data. However, you must make assumptions about the math properties of the data, for example that the data has a normal (Gaussian) distribution with a certain mean and standard deviation.

Naive Bayes classification isn’t used as much as it used to be because techniques based on neural networks are much more powerful. However, neural techniques usually require lots of data. Naive Bayes classification often works well with small datasets.

You can find the complete C# demo code in the VSM article at the URL/link above.



In many of the comedy movies that I like, there is a naive character whose lack of sophistication leads to funny situations. Left: In “Dumb and Dumber To” (2014), buddies Lloyd (actor Jim Carrey) and Harry (Jeff Daniels) are orders of magnitude beyond naive but somehow always manage to emerge with success. Center: In “Stuck On You” (2003), conjoined twins Bob (Matt Damon) and Walt (Greg Kinnear) go to Hollywood so Walt can become an actor. The brothers are nice to everyone including their neighbor April (Eva Mendes) who is blissfully unaware of her surroundings. Right: In “Game Night” (2018) wife Annie (Rachel McAdams) is oblivious to danger when she and husband Max (Jason Bateman) are in a sketchy bar filled with not-very-nice criminals.


Posted in Machine Learning | Leave a comment