Regression (Employee Income) Using Keras 2.8 on Windows 11

One of my standard neural network examples is to predict employee income from sex, age, city, and job-type. Predicting a single numeric value is usually called a regression problem. (Note: “logistic regression” predicts a single numeric probability value between 0.0 and 1.0 but then that value is immediately used as a binary classification result).

My data is synthetic and looks like:

 1   0.24   1 0 0   0.2950   0 0 1
-1   0.39   0 0 1   0.5120   0 1 0
 1   0.63   0 1 0   0.7580   1 0 0
-1   0.36   1 0 0   0.4450   0 1 0
 1   0.27   0 1 0   0.2860   0 0 1
. . .

There are 200 training items and 40 test items.

The first value in column [0] is sex (M = -1, F = +1). Column [1] is age, normalized by dividing by 100. Columns [2,3,4] is city one-hot encoded (anaheim, boulder, concord). Column [5] is annual income, divided by $100,000, and is the value to predict. Columns [6,7,8] is job-type (mgmt, supp, tech).

I designed an 8-(10-10)-1 neural network. I used glorot_uniform() weight initialization with zero-bias initialization. I used tanh() activation on the two hidden layers, and no activation (aka Identity activation) on the single output node.

For training, I used Adam optimization with an initial learning rate of 0.01 along with a batch size of 10. I used mean squared error for the loss function.

For regression problems you must define a custom accuracy() function. My accuracy() function counts an income prediction as correct if it’s within 10% of the true income. I implemented two accuracy() functions. The first version iterates through one data item at a time. This is slow but useful to examine results. The second version feeds all data to the model at the same time. This is faster but more opaque.



There’s a strong correlation between a person’s job and their income. Here are three people who have interesting jobs.

Left: According to the BBC, Alan Moore is a “writer, wizard, mall Santa, and Rasputin impersonator”. Impressive.

Center: According to the Food Network TV company, Richard Scheuerman is a “shredded cheese authority”. OK.

Right: The BBC broadcast an interview with Andrew Drinkwater, from the “Water Research Centre”. He was meant to have that job.


Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols. My lame blog editor chokes on symbols. For the training and test data, see my post at https://jamesmccaffrey.wordpress.com/2022/05/23/regression-employee-income-using-pytorch-1-10-on-windows-11/ where I did the same problem using PyTorch.

# employee_income_tfk.py
# predict income from sex, age, city, job_type
# Keras 2.8.0 in TensorFlow 2.8.0 ("_tfk")
# Anaconda3-2020.02  Python 3.7.6  Windows 10/11

import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'  # suppress CPU warn

import numpy as np
import tensorflow as tf
from tensorflow import keras as K

# -----------------------------------------------------------

class MyLogger(K.callbacks.Callback):
  def __init__(self, n, model, data_x, data_y):
    self.n = n   # print loss every n epochs
    self.model = model
    self.data_x = data_x  # needed to compute accuracy
    self.data_y = data_y
    
  def on_epoch_end(self, epoch, logs={}):
    if epoch % self.n == 0:
      curr_loss = logs.get('loss')  # loss on curr batch
      acc = accuracy_x(self.model, self.data_x,
        self.data_y, 0.10) 
      print("epoch = %4d  |  loss = %0.6f  |  acc = %0.4f" % \
(epoch, curr_loss, acc))

# -----------------------------------------------------------

def accuracy(model, data_x, data_y, pct_close):
  # item-by-item -- slow -- for debugging
  n_correct = 0; n_wrong = 0
  n = len(data_x)
  for i in range(n):
    x = np.array([data_x[i]])  # [[ x ]]
    predicted = model.predict(x)  
    actual = data_y[i]
    if np.abs(predicted[0][0] - actual) "lt" \
      np.abs(pct_close * actual):
      n_correct += 1
    else:
      n_wrong += 1
  return (n_correct * 1.0) / (n_correct + n_wrong)

# -----------------------------------------------------------

def accuracy_x(model, data_x, data_y, pct_close):
  n = len(data_x)
  oupt = model(data_x)
  oupt = tf.reshape(oupt, [-1])  # 1D
 
  max_deltas = tf.abs(pct_close * data_y)  # max allow deltas
  abs_deltas = tf.abs(oupt - data_y)   # actual differences
  results = abs_deltas "lt" max_deltas    # [True, False, . .]

  n_correct = np.sum(results)
  acc = n_correct / n
  return acc

# -----------------------------------------------------------

def main():
  # 0. prepare
  print("\nBegin Employee predict income using Keras ")
  np.random.random(1)
  tf.random.set_seed(1)

  # 1. load data
  # sex age   city    income   job_type
  # -1  0.27  0 1 0   0.7610   0 0 1
  # +1  0.19  0 0 1   0.6550   1 0 0

  print("\nLoading Employee data into memory ")
  train_file = ".\\Data\\employee_train.txt"  # 200 lines
  train_x = np.loadtxt(train_file, usecols=[0,1,2,3,4,6,7,8],
    delimiter="\t", comments="#", dtype=np.float32)
  train_y = np.loadtxt(train_file, usecols=5, delimiter="\t",
    comments="#", dtype=np.float32)

  test_file = ".\\Data\\employee_test.txt"  # 40 lines
  test_x = np.loadtxt(test_file, usecols=[0,1,2,3,4,6,7,8],
    delimiter="\t", comments="#", dtype=np.float32)
  test_y = np.loadtxt(test_file, usecols=5, delimiter="\t",
    comments="#", dtype=np.float32)

# -----------------------------------------------------------

  # 2. create network
  print("\nCreating 8-(10-10)-1 neural network ")
  model = K.models.Sequential()
  model.add(K.layers.Dense(units=10, input_dim=8,
    activation='tanh', kernel_initializer='glorot_uniform',
    bias_initializer='zeros'))  # hid1
  model.add(K.layers.Dense(units=10,
    activation='tanh', kernel_initializer='glorot_uniform',
    bias_initializer='zeros'))  # hid2
  model.add(K.layers.Dense(units=1,
    activation=None, kernel_initializer='glorot_uniform',
    bias_initializer='zeros'))    # output layer
  opt = K.optimizers.Adam(learning_rate=0.01)
  model.compile(loss='mean_squared_error',
    optimizer=opt, metrics=['mse'])

# -----------------------------------------------------------

  # 3. train model
  print("\nbat_size = 10 ")
  print("loss = mean_squared_error ")
  print("optimizer = Adam ")
  print("lrn_rate = 0.01 ")

  my_logger = MyLogger(100, model, train_x, train_y) 

  print("\nStarting training ")
  h = model.fit(train_x, train_y, batch_size=10,
    epochs=1000, verbose=0, callbacks=[my_logger])
  print("Done ")

# -----------------------------------------------------------

  # 4. evaluate model
  print("\nComputing model accuracy (within 0.10 of true) ")
  train_acc = accuracy(model, train_x, train_y, 0.10) 
  print("Accuracy on train data = %0.4f" % train_acc)
  test_acc = accuracy_x(model, test_x, test_y, 0.10) 
  print("Accuracy on test data = %0.4f" % test_acc)

  # 5. use model
  # np.set_printoptions(formatter={'float': '{: 0.6f}'.format})
  print("\nPredicting income for M 34 concord support: ")
  x = np.array([[-1, 0.34, 0,0,1,  0,1,0]], dtype=np.float32)
  pred_inc = model.predict(x)
  print("$%0.2f" % (pred_inc * 100_000))  # un-normalized

# -----------------------------------------------------------

  # 6. save model
  print("\nSaving trained model ")
  # model.save_weights(".\\Models\\employee_model_wts.h5")
  # model.save(".\\Models\\employee_model.h5")

# -----------------------------------------------------------

  print("\nEnd Employee income demo")

if __name__=="__main__":
  main()
Posted in Keras, PAW | Leave a comment

Regression (Employee Income) Using PyTorch 1.10 on Windows 11

One of my standard neural network examples is to predict employee income from sex, age, city, and job-type. Predicting a single numeric value is usually called a regression problem. (Note: “logistic regression” predicts a single numeric probability value between 0.0 and 1.0 but then that value is immediately used as a binary classification result).

My data is synthetic and looks like:

 1   0.24   1 0 0   0.2950   0 0 1
-1   0.39   0 0 1   0.5120   0 1 0
 1   0.63   0 1 0   0.7580   1 0 0
-1   0.36   1 0 0   0.4450   0 1 0
 1   0.27   0 1 0   0.2860   0 0 1
. . .

There are 200 training items and 40 test items.

The first value in column [0] is sex (M = -1, F = +1). Column [1] is age, normalized by dividing by 100. Columns [2,3,4] is city one-hot encoded (anaheim, boulder, concord). Column [5] is annual income, divided by $100,000, and is the value to predict. Columns [6,7,8] is job-type (mgmt, supp, tech).

I designed an 8-(10-10)-1 neural network. I used xavier_uniform() weight initialization with zero-bias initialization. I used tanh() activation on the two hidden layers, and no activation (aka Identity activation) on the single output node.

For training, I used Adam optimization with an initial learning rate of 0.01 along with a batch size of 10. I used mean squared error for the loss function. I wrapped the training statements in a program-defined train() for no specific reason.

For regression problems you must define a custom accuracy() function. My function counts an income prediction as correct if it’s within 10% of the true income.



The normal English (non-math) meaning of “regression” is “a return to a less developed state”. This confused me when I was a math student and learned about regression. Left: In “Rocketship X-M” (1950) an expedition goes to Mars and finds that the once advanced civilization has regressed to primitive cavemen-like creatures. It doesn’t end well for the expedition members. Center: In “The Time Machine” (1960) scientist H. George Wells travels far into the future but finds that civilization has regressed to savage Morlocks who prey on the gentle Eloi. Right: In “The Time Travelers” (1964) a group of scientists accidentally travel into the far future and find the a nuclear war has regressed most humans to a primitive state.


Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols. My lame blog editor chokes on symbols.

# employee_income.py
# predict income from sex, age, city, job_type
# PyTorch 1.10.0-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10/11 

import numpy as np
import torch as T
device = T.device('cpu')  # apply to Tensor or Module

# -----------------------------------------------------------

class EmployeeDataset(T.utils.data.Dataset):
  def __init__(self, src_file):
    # sex age   city    income   job_type
    # -1  0.27  0 1 0   0.7610   0 0 1
    # +1  0.19  0 0 1   0.6550   1 0 0
    tmp_x = np.loadtxt(src_file, usecols=[0,1,2,3,4,6,7,8],
      delimiter="\t", comments="#", dtype=np.float32)
    tmp_y = np.loadtxt(src_file, usecols=5, delimiter="\t",
      comments="#", dtype=np.float32)
    tmp_y = tmp_y.reshape(-1,1)  # 2D required

    self.x_data = T.tensor(tmp_x, dtype=T.float32).to(device)
    self.y_data = T.tensor(tmp_y, dtype=T.float32).to(device)

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    preds = self.x_data[idx]
    incom = self.y_data[idx] 
    return (preds, incom)  # as a tuple

# -----------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(8, 10)  # 8-(10-10)-1
    self.hid2 = T.nn.Linear(10, 10)
    self.oupt = T.nn.Linear(10, 1)

    T.nn.init.xavier_uniform_(self.hid1.weight)
    T.nn.init.zeros_(self.hid1.bias)
    T.nn.init.xavier_uniform_(self.hid2.weight)
    T.nn.init.zeros_(self.hid2.bias)
    T.nn.init.xavier_uniform_(self.oupt.weight)
    T.nn.init.zeros_(self.oupt.bias)

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.tanh(self.hid2(z))
    z = self.oupt(z)  # regression: no activation
    return z

# -----------------------------------------------------------

def accuracy(model, ds, pct_close):
  # assumes model.eval()
  # correct within pct of true income
  n_correct = 0; n_wrong = 0

  for i in range(len(ds)):
    X = ds[i][0]   # 2-d
    Y = ds[i][1]   # 2-d
    with T.no_grad():
      oupt = model(X)         # computed income

    if T.abs(oupt - Y) "lt" T.abs(pct_close * Y):
      n_correct += 1
    else:
      n_wrong += 1
  acc = (n_correct * 1.0) / (n_correct + n_wrong)
  return acc

# -----------------------------------------------------------

def accuracy_x(model, ds, pct_close):
  # all-at-once (quick)
  # assumes model.eval()
  X = ds.x_data  # all inputs
  Y = ds.y_data  # all targets
  n_items = len(X)
  with T.no_grad():
    pred = model(X)  # all predicted prices
 
  n_correct = T.sum((T.abs(pred - Y) "lt" T.abs(pct_close * Y)))
  result = (n_correct.item() / n_items)  # scalar
  return result  

# -----------------------------------------------------------

def train(model, ds, bs, lr, me, le):
  # dataset, bat_size, lrn_rate, max_epochs, log interval
  train_ldr = T.utils.data.DataLoader(ds,
    batch_size=bs, shuffle=True)
  loss_func = T.nn.MSELoss()
  optimizer = T.optim.Adam(model.parameters(), lr=lr)

  for epoch in range(0, me):
    epoch_loss = 0  # for one full epoch

    for (b_idx, batch) in enumerate(train_ldr):
      X = batch[0]
      y = batch[1]
      optimizer.zero_grad()
      oupt = model(X)
      loss_val = loss_func(oupt, y)  # a tensor
      epoch_loss += loss_val.item()  # accumulate
      loss_val.backward()  # compute gradients
      optimizer.step()     # update weights

    if epoch % le == 0:
      print("epoch = %4d  |  loss = %0.4f" % (epoch, epoch_loss)) 

# -----------------------------------------------------------

def main():
  # 0. get started
  print("\nBegin Employee predict income ")
  T.manual_seed(1)
  np.random.seed(1)
  
  # 1. create DataLoader objects
  print("\nCreating Employee Dataset objects ")
  train_file = ".\\Data\\employee_train.txt"
  train_ds = EmployeeDataset(train_file)  # 200 rows

  test_file = ".\\Data\\employee_test.txt"
  test_ds = EmployeeDataset(test_file)  # 40 rows

  bat_size = 10
  train_ldr = T.utils.data.DataLoader(train_ds,
    batch_size=bat_size, shuffle=True)

  # 2. create network
  print("\nCreating 8-(10-10)-1 neural network ")
  net = Net().to(device)

# -----------------------------------------------------------

  # 3. train model
  print("\nbat_size = 10 ")
  print("loss = MSELoss() ")
  print("optimizer = Adam ")
  print("lrn_rate = 0.01 ")

  print("\nStarting training")
  net.train()
  train(net, train_ds, bs=10, lr=0.01, me=1000, le=100)
  print("Done ")

# -----------------------------------------------------------

  # 4. evaluate model accuracy
  print("\nComputing model accuracy (within 0.10 of true) ")
  net = net.eval()
  acc_train = accuracy(net, train_ds, 0.10)  # item-by-item
  print("Accuracy on train data = %0.4f" % acc_train)

  acc_test = accuracy_x(net, test_ds, 0.10)  # all-at-once
  print("Accuracy on test data = %0.4f" % acc_test)

# -----------------------------------------------------------

  # 5. make a prediction
  print("\nPredicting income for M 34 concord support: ")
  x = np.array([[-1, 0.34, 0,0,1,  0,1,0]],
    dtype=np.float32)
  x = T.tensor(x, dtype=T.float32).to(device) 

  with T.no_grad():
    pred_inc = net(x)
  pred_inc = pred_inc.item()  # scalar
  print("$%0.2f" % (pred_inc * 100_000))  # un-normalized

# -----------------------------------------------------------

  # 6. save model (state_dict approach)
  print("\nSaving trained model state")
  fn = ".\\Models\\employee_model.pt"
  # T.save(net.state_dict(), fn)

  # saved_model = Net()
  # saved_model.load_state_dict(T.load(fn))
  # use saved_model to make prediction(s)

  print("\nEnd Employee income demo")

if __name__ == "__main__":
  main()

Training data. If you copy-paste you might lose the tab-delimiters.

# employee_train.txt
#
# sex (-1 = male, 1 = female), age / 100,
# city (anaheim = 100, boulder = 010, concord = 001),
# income / 100_000,
# job type (mgmt = 100, supp = 010, tech = 001)
#
1	0.24	1	0	0	0.2950	0	0	1
-1	0.39	0	0	1	0.5120	0	1	0
1	0.63	0	1	0	0.7580	1	0	0
-1	0.36	1	0	0	0.4450	0	1	0
1	0.27	0	1	0	0.2860	0	0	1
1	0.50	0	1	0	0.5650	0	1	0
1	0.50	0	0	1	0.5500	0	1	0
-1	0.19	0	0	1	0.3270	1	0	0
1	0.22	0	1	0	0.2770	0	1	0
-1	0.39	0	0	1	0.4710	0	0	1
1	0.34	1	0	0	0.3940	0	1	0
-1	0.22	1	0	0	0.3350	1	0	0
1	0.35	0	0	1	0.3520	0	0	1
-1	0.33	0	1	0	0.4640	0	1	0
1	0.45	0	1	0	0.5410	0	1	0
1	0.42	0	1	0	0.5070	0	1	0
-1	0.33	0	1	0	0.4680	0	1	0
1	0.25	0	0	1	0.3000	0	1	0
-1	0.31	0	1	0	0.4640	1	0	0
1	0.27	1	0	0	0.3250	0	0	1
1	0.48	1	0	0	0.5400	0	1	0
-1	0.64	0	1	0	0.7130	0	0	1
1	0.61	0	1	0	0.7240	1	0	0
1	0.54	0	0	1	0.6100	1	0	0
1	0.29	1	0	0	0.3630	1	0	0
1	0.50	0	0	1	0.5500	0	1	0
1	0.55	0	0	1	0.6250	1	0	0
1	0.40	1	0	0	0.5240	1	0	0
1	0.22	1	0	0	0.2360	0	0	1
1	0.68	0	1	0	0.7840	1	0	0
-1	0.60	1	0	0	0.7170	0	0	1
-1	0.34	0	0	1	0.4650	0	1	0
-1	0.25	0	0	1	0.3710	1	0	0
-1	0.31	0	1	0	0.4890	0	1	0
1	0.43	0	0	1	0.4800	0	1	0
1	0.58	0	1	0	0.6540	0	0	1
-1	0.55	0	1	0	0.6070	0	0	1
-1	0.43	0	1	0	0.5110	0	1	0
-1	0.43	0	0	1	0.5320	0	1	0
-1	0.21	1	0	0	0.3720	1	0	0
1	0.55	0	0	1	0.6460	1	0	0
1	0.64	0	1	0	0.7480	1	0	0
-1	0.41	1	0	0	0.5880	0	1	0
1	0.64	0	0	1	0.7270	1	0	0
-1	0.56	0	0	1	0.6660	0	0	1
1	0.31	0	0	1	0.3600	0	1	0
-1	0.65	0	0	1	0.7010	0	0	1
1	0.55	0	0	1	0.6430	1	0	0
-1	0.25	1	0	0	0.4030	1	0	0
1	0.46	0	0	1	0.5100	0	1	0
-1	0.36	1	0	0	0.5350	1	0	0
1	0.52	0	1	0	0.5810	0	1	0
1	0.61	0	0	1	0.6790	1	0	0
1	0.57	0	0	1	0.6570	1	0	0
-1	0.46	0	1	0	0.5260	0	1	0
-1	0.62	1	0	0	0.6680	0	0	1
1	0.55	0	0	1	0.6270	1	0	0
-1	0.22	0	0	1	0.2770	0	1	0
-1	0.50	1	0	0	0.6290	1	0	0
-1	0.32	0	1	0	0.4180	0	1	0
-1	0.21	0	0	1	0.3560	1	0	0
1	0.44	0	1	0	0.5200	0	1	0
1	0.46	0	1	0	0.5170	0	1	0
1	0.62	0	1	0	0.6970	1	0	0
1	0.57	0	1	0	0.6640	1	0	0
-1	0.67	0	0	1	0.7580	0	0	1
1	0.29	1	0	0	0.3430	0	0	1
1	0.53	1	0	0	0.6010	1	0	0
-1	0.44	1	0	0	0.5480	0	1	0
1	0.46	0	1	0	0.5230	0	1	0
-1	0.20	0	1	0	0.3010	0	1	0
-1	0.38	1	0	0	0.5350	0	1	0
1	0.50	0	1	0	0.5860	0	1	0
1	0.33	0	1	0	0.4250	0	1	0
-1	0.33	0	1	0	0.3930	0	1	0
1	0.26	0	1	0	0.4040	1	0	0
1	0.58	1	0	0	0.7070	1	0	0
1	0.43	0	0	1	0.4800	0	1	0
-1	0.46	1	0	0	0.6440	1	0	0
1	0.60	1	0	0	0.7170	1	0	0
-1	0.42	1	0	0	0.4890	0	1	0
-1	0.56	0	0	1	0.5640	0	0	1
-1	0.62	0	1	0	0.6630	0	0	1
-1	0.50	1	0	0	0.6480	0	1	0
1	0.47	0	0	1	0.5200	0	1	0
-1	0.67	0	1	0	0.8040	0	0	1
-1	0.40	0	0	1	0.5040	0	1	0
1	0.42	0	1	0	0.4840	0	1	0
1	0.64	1	0	0	0.7200	1	0	0
-1	0.47	1	0	0	0.5870	0	0	1
1	0.45	0	1	0	0.5280	0	1	0
-1	0.25	0	0	1	0.4090	1	0	0
1	0.38	1	0	0	0.4840	1	0	0
1	0.55	0	0	1	0.6000	0	1	0
-1	0.44	1	0	0	0.6060	0	1	0
1	0.33	1	0	0	0.4100	0	1	0
1	0.34	0	0	1	0.3900	0	1	0
1	0.27	0	1	0	0.3370	0	0	1
1	0.32	0	1	0	0.4070	0	1	0
1	0.42	0	0	1	0.4700	0	1	0
-1	0.24	0	0	1	0.4030	1	0	0
1	0.42	0	1	0	0.5030	0	1	0
1	0.25	0	0	1	0.2800	0	0	1
1	0.51	0	1	0	0.5800	0	1	0
-1	0.55	0	1	0	0.6350	0	0	1
1	0.44	1	0	0	0.4780	0	0	1
-1	0.18	1	0	0	0.3980	1	0	0
-1	0.67	0	1	0	0.7160	0	0	1
1	0.45	0	0	1	0.5000	0	1	0
1	0.48	1	0	0	0.5580	0	1	0
-1	0.25	0	1	0	0.3900	0	1	0
-1	0.67	1	0	0	0.7830	0	1	0
1	0.37	0	0	1	0.4200	0	1	0
-1	0.32	1	0	0	0.4270	0	1	0
1	0.48	1	0	0	0.5700	0	1	0
-1	0.66	0	0	1	0.7500	0	0	1
1	0.61	1	0	0	0.7000	1	0	0
-1	0.58	0	0	1	0.6890	0	1	0
1	0.19	1	0	0	0.2400	0	0	1
1	0.38	0	0	1	0.4300	0	1	0
-1	0.27	1	0	0	0.3640	0	1	0
1	0.42	1	0	0	0.4800	0	1	0
1	0.60	1	0	0	0.7130	1	0	0
-1	0.27	0	0	1	0.3480	1	0	0
1	0.29	0	1	0	0.3710	1	0	0
-1	0.43	1	0	0	0.5670	0	1	0
1	0.48	1	0	0	0.5670	0	1	0
1	0.27	0	0	1	0.2940	0	0	1
-1	0.44	1	0	0	0.5520	1	0	0
1	0.23	0	1	0	0.2630	0	0	1
-1	0.36	0	1	0	0.5300	0	0	1
1	0.64	0	0	1	0.7250	1	0	0
1	0.29	0	0	1	0.3000	0	0	1
-1	0.33	1	0	0	0.4930	0	1	0
-1	0.66	0	1	0	0.7500	0	0	1
-1	0.21	0	0	1	0.3430	1	0	0
1	0.27	1	0	0	0.3270	0	0	1
1	0.29	1	0	0	0.3180	0	0	1
-1	0.31	1	0	0	0.4860	0	1	0
1	0.36	0	0	1	0.4100	0	1	0
1	0.49	0	1	0	0.5570	0	1	0
-1	0.28	1	0	0	0.3840	1	0	0
-1	0.43	0	0	1	0.5660	0	1	0
-1	0.46	0	1	0	0.5880	0	1	0
1	0.57	1	0	0	0.6980	1	0	0
-1	0.52	0	0	1	0.5940	0	1	0
-1	0.31	0	0	1	0.4350	0	1	0
-1	0.55	1	0	0	0.6200	0	0	1
1	0.50	1	0	0	0.5640	0	1	0
1	0.48	0	1	0	0.5590	0	1	0
-1	0.22	0	0	1	0.3450	1	0	0
1	0.59	0	0	1	0.6670	1	0	0
1	0.34	1	0	0	0.4280	0	0	1
-1	0.64	1	0	0	0.7720	0	0	1
1	0.29	0	0	1	0.3350	0	0	1
-1	0.34	0	1	0	0.4320	0	1	0
-1	0.61	1	0	0	0.7500	0	0	1
1	0.64	0	0	1	0.7110	1	0	0
-1	0.29	1	0	0	0.4130	1	0	0
1	0.63	0	1	0	0.7060	1	0	0
-1	0.29	0	1	0	0.4000	1	0	0
-1	0.51	1	0	0	0.6270	0	1	0
-1	0.24	0	0	1	0.3770	1	0	0
1	0.48	0	1	0	0.5750	0	1	0
1	0.18	1	0	0	0.2740	1	0	0
1	0.18	1	0	0	0.2030	0	0	1
1	0.33	0	1	0	0.3820	0	0	1
-1	0.20	0	0	1	0.3480	1	0	0
1	0.29	0	0	1	0.3300	0	0	1
-1	0.44	0	0	1	0.6300	1	0	0
-1	0.65	0	0	1	0.8180	1	0	0
-1	0.56	1	0	0	0.6370	0	0	1
-1	0.52	0	0	1	0.5840	0	1	0
-1	0.29	0	1	0	0.4860	1	0	0
-1	0.47	0	1	0	0.5890	0	1	0
1	0.68	1	0	0	0.7260	0	0	1
1	0.31	0	0	1	0.3600	0	1	0
1	0.61	0	1	0	0.6250	0	0	1
1	0.19	0	1	0	0.2150	0	0	1
1	0.38	0	0	1	0.4300	0	1	0
-1	0.26	1	0	0	0.4230	1	0	0
1	0.61	0	1	0	0.6740	1	0	0
1	0.40	1	0	0	0.4650	0	1	0
-1	0.49	1	0	0	0.6520	0	1	0
1	0.56	1	0	0	0.6750	1	0	0
-1	0.48	0	1	0	0.6600	0	1	0
1	0.52	1	0	0	0.5630	0	0	1
-1	0.18	1	0	0	0.2980	1	0	0
-1	0.56	0	0	1	0.5930	0	0	1
-1	0.52	0	1	0	0.6440	0	1	0
-1	0.18	0	1	0	0.2860	0	1	0
-1	0.58	1	0	0	0.6620	0	0	1
-1	0.39	0	1	0	0.5510	0	1	0
-1	0.46	1	0	0	0.6290	0	1	0
-1	0.40	0	1	0	0.4620	0	1	0
-1	0.60	1	0	0	0.7270	0	0	1
1	0.36	0	1	0	0.4070	0	0	1
1	0.44	1	0	0	0.5230	0	1	0
1	0.28	1	0	0	0.3130	0	0	1
1	0.54	0	0	1	0.6260	1	0	0

Test data.

# employee_test.txt
#
-1	0.51	1	0	0	0.6120	0	1	0
-1	0.32	0	1	0	0.4610	0	1	0
1	0.55	1	0	0	0.6270	1	0	0
1	0.25	0	0	1	0.2620	0	0	1
1	0.33	0	0	1	0.3730	0	0	1
-1	0.29	0	1	0	0.4620	1	0	0
1	0.65	1	0	0	0.7270	1	0	0
-1	0.43	0	1	0	0.5140	0	1	0
-1	0.54	0	1	0	0.6480	0	0	1
1	0.61	0	1	0	0.7270	1	0	0
1	0.52	0	1	0	0.6360	1	0	0
1	0.3	0	1	0	0.3350	0	0	1
1	0.29	1	0	0	0.3140	0	0	1
-1	0.47	0	0	1	0.5940	0	1	0
1	0.39	0	1	0	0.4780	0	1	0
1	0.47	0	0	1	0.5200	0	1	0
-1	0.49	1	0	0	0.5860	0	1	0
-1	0.63	0	0	1	0.6740	0	0	1
-1	0.3	1	0	0	0.3920	1	0	0
-1	0.61	0	0	1	0.6960	0	0	1
-1	0.47	0	0	1	0.5870	0	1	0
1	0.3	0	0	1	0.3450	0	0	1
-1	0.51	0	0	1	0.5800	0	1	0
-1	0.24	1	0	0	0.3880	0	1	0
-1	0.49	1	0	0	0.6450	0	1	0
1	0.66	0	0	1	0.7450	1	0	0
-1	0.65	1	0	0	0.7690	1	0	0
-1	0.46	0	1	0	0.5800	1	0	0
-1	0.45	0	0	1	0.5180	0	1	0
-1	0.47	1	0	0	0.6360	1	0	0
-1	0.29	1	0	0	0.4480	1	0	0
-1	0.57	0	0	1	0.6930	0	0	1
-1	0.2	1	0	0	0.2870	0	0	1
-1	0.35	1	0	0	0.4340	0	1	0
-1	0.61	0	0	1	0.6700	0	0	1
-1	0.31	0	0	1	0.3730	0	1	0
1	0.18	1	0	0	0.2080	0	0	1
1	0.26	0	0	1	0.2920	0	0	1
-1	0.28	1	0	0	0.3640	0	0	1
-1	0.59	0	0	1	0.6940	0	0	1
Posted in PAW, PyTorch | Leave a comment

Understanding SimCLR – Simple Contrastive Loss Representation for Image Data

I’ve been looking at an interesting research paper titled “A Simple Framework for Contrastive Learning of Visual Representations” (2020) by T. Chen, S. Kornblith, M. Norouzi, and G. Hinton. The main idea is to take unlabeled image data and use it to create an abstract representation (as in a numeric vector of, say, 500 values) of the images. This representation has no direct use, but it can be fine-tuned for downstream tasks. See my earlier post at https://jamesmccaffrey.wordpress.com/2022/04/11/an-example-of-normalized-temperature-scaled-cross-entropy-loss/.

To help me understand, I refactored the architecture diagram in the paper. Here’s the diagram from the research paper:



And here’s my version:



My explanation:

The input is a 32 x 32 CIFAR-10 color image. The image is sent twice to a sequence of three augmentations: 1.) a random part of the image (such as a 9 x 9 block of pixels) is cropped and then resized back to 32 x 32 pixels, 2.) the colors are randomly distorted, 3.) Gaussian blur is applied. The result is a pair of augmented images, x1 and x2. Note that “augmented” usually means “added to” but in the context of contrastive loss, “augmented” means “mutated”.

The pair of augmented images (x1, x2) are fed to a ResNet-50 neural network. ResNet-50 is a large neural network with 50 layers used for image classification. Intermediate results of the ResNet-50 network (h1, h2), just after the average pooling layer, are fetched rather than the final output vector of 10 values. The (h1, h2) outputs from the ResNet-50 component are abstract representations of the two augmented images. These two abstract representations could be compared by a contrastive loss function. But it was discovered that passing the representations to a simple, single-hidden-layer neural network to get a pair derived representations (z1, z2) and then feeding the derived representations to the normalized temperature-scaled cross entropy contrastive loss function works better.

The results of the loss function are used to update all the weights and biases in the SimCLR system. This results in the internal h1 and h2 representations being better. After training, the internal h-representations can be used for downstream tasks.

SimCLR architecture is an example of what’s sometimes called a Siamese network. This is because you feed two inputs to the network — conceptually it’s like there are two identical networks that share the same weights. In addition to SimCLR, there are many other specific examples of Siamese networks, but they tend to be more complicated than SimCLR. In fact, the complexity of the other Siamese architectures is what motivated the creation of SimCLR (the “Sim” stands for “simple”).


One thing that both diagrams leave out is that a SimCLR network is trained using a batch of pairs of images. The first pair are similar to each other, but the other pairs are randomly selected and are assumed to be dissimilar. The similar and dissimilar pairs are actually fed to the contrastive loss function, not just the similar pair as shown.

Interesting stuff.



Images from an Internet search for “contrastive image”. I sort of understand photography but I don’t grok photography at a deep level.


Posted in Machine Learning | Leave a comment

Autoencoder Anomaly Detection Using Keras 2.8 on Windows 11

Every few months I revisit my standard neural network examples to make sure that changes in the underlying code libraries (PyTorch, Keras/TensorFlow) haven’t introduced a breaking change(s). One of my standard examples is autoencoder anomaly detection.

The idea is to take a set of data and implement a deep neural network that predicts its input. The values of the interior hidden layer of nodes is a condensed representation of the input. The output nodes are a reconstruction of the input. Data items where the reconstructed input is very different from the associated input are anomalous in some way.

My demo uses a synthetic set of Employee data. There are x feature variables: employee sex (M, F), age, city (anaheim, boulder, concord), annual income, and job-type (mgmt, supp, tech). There are 240 items. The normalized and encoded data looks like:

# sex  age   city      income   job_type
 -1   0.27   0  1  0   0.7610   0  0  1
  1   0.19   0  0  1   0.6550   0  1  0
. . .

My demo network uses a 9-4-(2)-4-9 architecture. The input and output size are determined by the data, but the number of hidden layers and the number of nodes in each, are hyperparameters that must be determined by trial and error. The middle hidden layer, with 2 nodes, represents a condensed version of a data item. This internal representation isn’t used directly — it’s used to reconstruct a data item. The difference between a 9-value input item and its 9-value output is used to find anomalies.



A problem facing artists who want to create images of alien animals and vegetation is to find a balance between images that are too anomalous to real plants and animals (making them look implausible) and images that are not anomalous enough (making them look not alien enough). Artist Jorge Abalo creates beautiful digital renderings of alien plants — a perfect balance of anomalous to my eye.


Demo code. Replace “lt”, “gt”, “let”, “gte” with Boolean operator symbols — my lame blog editor chokes on symbols.

# employee_autoanom_tfk.py
# autoencoder reconstruction error anomaly detection
# Keras 2.8.0 in TensorFlow 2.8.0 ("_tfk")
# Anaconda3-2020.02  Python 3.7.6  Windows 10/11

import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'  # suppress CPU warn

import numpy as np
import tensorflow as tf
from tensorflow import keras as K

# -----------------------------------------------------------

class MyLogger(K.callbacks.Callback):
  def __init__(self, n):
    self.n = n   # print loss every n epochs
    # self.data_x = data_x  # for accuracy
    # self.data_y = data_y
    
  def on_epoch_end(self, epoch, logs={}):
    if epoch % self.n == 0:
      curr_loss = logs.get('loss')  # loss on curr batch
      print("epoch = %4d  |  loss = %0.6f " % \
        (epoch, curr_loss))

# -----------------------------------------------------------

def analyze_error(model, data_x):
  largest_err = 0.0
  worst_x = None
  worst_y = None
  n_features = len(data_x[0])  # 9 predictors

  for i in range(len(data_x)):
    X = data_x[i].reshape(1,-1)
    Y = model(X)
    err = tf.reduce_sum( (X-Y)*(X-Y) )  # across all predictors
    err /= n_features

    if err "gt"For  largest_err:
      largest_err = err
      worst_x = X
      worst_y = Y.numpy()

  np.set_printoptions(formatter={'float': '{: 0.4f}'.format})
  print("Largest reconstruction error: %0.4f" % largest_err)
  print("Worst data item    = ")
  print(worst_x)
  print("Its reconstruction = " )
  print(worst_y)

# -----------------------------------------------------------

def main():
  # 0. prepare
  print("\nBegin Employee autoencoder anomaly using Keras ")
  np.random.random(1)
  tf.random.set_seed(1)

  # 1. load data
  # sex age   city    income   job_type
  # -1  0.27  0 1 0   0.7610   0 0 1
  # +1  0.19  0 0 1   0.6550   1 0 0

  print("\nLoading Employee data into memory ")
  data_file = ".\\Data\\employee_all.txt"  # 240 lines
  data_x = np.loadtxt(data_file, usecols=[0,1,2,3,4,5,6,7,8],
    delimiter="\t", comments="#", dtype=np.float32)
  
# -----------------------------------------------------------

  # 2. create network
  print("\nCreating 9-4-(2)-4-9 network ")
  model = K.models.Sequential()
  model.add(K.layers.Dense(units=4, input_dim=9,
    activation='tanh', kernel_initializer='glorot_uniform',
    bias_initializer='zeros'))  # enc1
  model.add(K.layers.Dense(units=2,
    activation='tanh', kernel_initializer='glorot_uniform',
    bias_initializer='zeros'))  # enc2
  model.add(K.layers.Dense(units=4,
    activation='tanh', kernel_initializer='glorot_uniform',
    bias_initializer='zeros'))  # dec1
  model.add(K.layers.Dense(units=9,
    activation=None, kernel_initializer='glorot_uniform',
    bias_initializer='zeros'))  # dec2
  opt = K.optimizers.Adam(learning_rate=0.005)
  model.compile(loss='mean_squared_error',
    optimizer=opt, metrics=['mse'])

# -----------------------------------------------------------

  # 3. train model
  print("\nbat_size = 10 ")
  print("loss = mean_squared_error ")
  print("optimizer = Adam ")
  print("lrn_rate = 0.005 ")

  my_logger = MyLogger(100) 

  print("\nStarting training ")
  h = model.fit(data_x, data_x, batch_size=10,
    epochs=1000, verbose=0, callbacks=[my_logger])
  print("Done ")

# -----------------------------------------------------------

  # 4. find item with largest reconstruction error
  print("\nAnalyzing data for largest reconstruction error \n")
  analyze_error(model, data_x)

# -----------------------------------------------------------

  # 5. save model
  # print("\nSaving trained model ")
  # model.save_weights(".\\Models\\employee_model_wts.h5")
  # model.save(".\\Models\\employee_model.h5")

# -----------------------------------------------------------

  print("\nEnd Employee autoencoder anomaly demo ")

if __name__=="__main__":
  main()

Demo data:

# employee_all.txt
# sex (M = -1, F = +1), age / 100,
# city (anaheim = 100, boulder = 010, concord = 001),
# income / 100_00,
# job_type (mgmt = 100, supp = 010, tech = 001)
#
1	0.24	1	0	0	0.2950	0	0	1
-1	0.39	0	0	1	0.5120	0	1	0
1	0.63	0	1	0	0.7580	1	0	0
-1	0.36	1	0	0	0.4450	0	1	0
1	0.27	0	1	0	0.2860	0	0	1
1	0.50	0	1	0	0.5650	0	1	0
1	0.50	0	0	1	0.5500	0	1	0
-1	0.19	0	0	1	0.3270	1	0	0
1	0.22	0	1	0	0.2770	0	1	0
-1	0.39	0	0	1	0.4710	0	0	1
1	0.34	1	0	0	0.3940	0	1	0
-1	0.22	1	0	0	0.3350	1	0	0
1	0.35	0	0	1	0.3520	0	0	1
-1	0.33	0	1	0	0.4640	0	1	0
1	0.45	0	1	0	0.5410	0	1	0
1	0.42	0	1	0	0.5070	0	1	0
-1	0.33	0	1	0	0.4680	0	1	0
1	0.25	0	0	1	0.3000	0	1	0
-1	0.31	0	1	0	0.4640	1	0	0
1	0.27	1	0	0	0.3250	0	0	1
1	0.48	1	0	0	0.5400	0	1	0
-1	0.64	0	1	0	0.7130	0	0	1
1	0.61	0	1	0	0.7240	1	0	0
1	0.54	0	0	1	0.6100	1	0	0
1	0.29	1	0	0	0.3630	1	0	0
1	0.50	0	0	1	0.5500	0	1	0
1	0.55	0	0	1	0.6250	1	0	0
1	0.40	1	0	0	0.5240	1	0	0
1	0.22	1	0	0	0.2360	0	0	1
1	0.68	0	1	0	0.7840	1	0	0
-1	0.60	1	0	0	0.7170	0	0	1
-1	0.34	0	0	1	0.4650	0	1	0
-1	0.25	0	0	1	0.3710	1	0	0
-1	0.31	0	1	0	0.4890	0	1	0
1	0.43	0	0	1	0.4800	0	1	0
1	0.58	0	1	0	0.6540	0	0	1
-1	0.55	0	1	0	0.6070	0	0	1
-1	0.43	0	1	0	0.5110	0	1	0
-1	0.43	0	0	1	0.5320	0	1	0
-1	0.21	1	0	0	0.3720	1	0	0
1	0.55	0	0	1	0.6460	1	0	0
1	0.64	0	1	0	0.7480	1	0	0
-1	0.41	1	0	0	0.5880	0	1	0
1	0.64	0	0	1	0.7270	1	0	0
-1	0.56	0	0	1	0.6660	0	0	1
1	0.31	0	0	1	0.3600	0	1	0
-1	0.65	0	0	1	0.7010	0	0	1
1	0.55	0	0	1	0.6430	1	0	0
-1	0.25	1	0	0	0.4030	1	0	0
1	0.46	0	0	1	0.5100	0	1	0
-1	0.36	1	0	0	0.5350	1	0	0
1	0.52	0	1	0	0.5810	0	1	0
1	0.61	0	0	1	0.6790	1	0	0
1	0.57	0	0	1	0.6570	1	0	0
-1	0.46	0	1	0	0.5260	0	1	0
-1	0.62	1	0	0	0.6680	0	0	1
1	0.55	0	0	1	0.6270	1	0	0
-1	0.22	0	0	1	0.2770	0	1	0
-1	0.50	1	0	0	0.6290	1	0	0
-1	0.32	0	1	0	0.4180	0	1	0
-1	0.21	0	0	1	0.3560	1	0	0
1	0.44	0	1	0	0.5200	0	1	0
1	0.46	0	1	0	0.5170	0	1	0
1	0.62	0	1	0	0.6970	1	0	0
1	0.57	0	1	0	0.6640	1	0	0
-1	0.67	0	0	1	0.7580	0	0	1
1	0.29	1	0	0	0.3430	0	0	1
1	0.53	1	0	0	0.6010	1	0	0
-1	0.44	1	0	0	0.5480	0	1	0
1	0.46	0	1	0	0.5230	0	1	0
-1	0.20	0	1	0	0.3010	0	1	0
-1	0.38	1	0	0	0.5350	0	1	0
1	0.50	0	1	0	0.5860	0	1	0
1	0.33	0	1	0	0.4250	0	1	0
-1	0.33	0	1	0	0.3930	0	1	0
1	0.26	0	1	0	0.4040	1	0	0
1	0.58	1	0	0	0.7070	1	0	0
1	0.43	0	0	1	0.4800	0	1	0
-1	0.46	1	0	0	0.6440	1	0	0
1	0.60	1	0	0	0.7170	1	0	0
-1	0.42	1	0	0	0.4890	0	1	0
-1	0.56	0	0	1	0.5640	0	0	1
-1	0.62	0	1	0	0.6630	0	0	1
-1	0.50	1	0	0	0.6480	0	1	0
1	0.47	0	0	1	0.5200	0	1	0
-1	0.67	0	1	0	0.8040	0	0	1
-1	0.40	0	0	1	0.5040	0	1	0
1	0.42	0	1	0	0.4840	0	1	0
1	0.64	1	0	0	0.7200	1	0	0
-1	0.47	1	0	0	0.5870	0	0	1
1	0.45	0	1	0	0.5280	0	1	0
-1	0.25	0	0	1	0.4090	1	0	0
1	0.38	1	0	0	0.4840	1	0	0
1	0.55	0	0	1	0.6000	0	1	0
-1	0.44	1	0	0	0.6060	0	1	0
1	0.33	1	0	0	0.4100	0	1	0
1	0.34	0	0	1	0.3900	0	1	0
1	0.27	0	1	0	0.3370	0	0	1
1	0.32	0	1	0	0.4070	0	1	0
1	0.42	0	0	1	0.4700	0	1	0
-1	0.24	0	0	1	0.4030	1	0	0
1	0.42	0	1	0	0.5030	0	1	0
1	0.25	0	0	1	0.2800	0	0	1
1	0.51	0	1	0	0.5800	0	1	0
-1	0.55	0	1	0	0.6350	0	0	1
1	0.44	1	0	0	0.4780	0	0	1
-1	0.18	1	0	0	0.3980	1	0	0
-1	0.67	0	1	0	0.7160	0	0	1
1	0.45	0	0	1	0.5000	0	1	0
1	0.48	1	0	0	0.5580	0	1	0
-1	0.25	0	1	0	0.3900	0	1	0
-1	0.67	1	0	0	0.7830	0	1	0
1	0.37	0	0	1	0.4200	0	1	0
-1	0.32	1	0	0	0.4270	0	1	0
1	0.48	1	0	0	0.5700	0	1	0
-1	0.66	0	0	1	0.7500	0	0	1
1	0.61	1	0	0	0.7000	1	0	0
-1	0.58	0	0	1	0.6890	0	1	0
1	0.19	1	0	0	0.2400	0	0	1
1	0.38	0	0	1	0.4300	0	1	0
-1	0.27	1	0	0	0.3640	0	1	0
1	0.42	1	0	0	0.4800	0	1	0
1	0.60	1	0	0	0.7130	1	0	0
-1	0.27	0	0	1	0.3480	1	0	0
1	0.29	0	1	0	0.3710	1	0	0
-1	0.43	1	0	0	0.5670	0	1	0
1	0.48	1	0	0	0.5670	0	1	0
1	0.27	0	0	1	0.2940	0	0	1
-1	0.44	1	0	0	0.5520	1	0	0
1	0.23	0	1	0	0.2630	0	0	1
-1	0.36	0	1	0	0.5300	0	0	1
1	0.64	0	0	1	0.7250	1	0	0
1	0.29	0	0	1	0.3000	0	0	1
-1	0.33	1	0	0	0.4930	0	1	0
-1	0.66	0	1	0	0.7500	0	0	1
-1	0.21	0	0	1	0.3430	1	0	0
1	0.27	1	0	0	0.3270	0	0	1
1	0.29	1	0	0	0.3180	0	0	1
-1	0.31	1	0	0	0.4860	0	1	0
1	0.36	0	0	1	0.4100	0	1	0
1	0.49	0	1	0	0.5570	0	1	0
-1	0.28	1	0	0	0.3840	1	0	0
-1	0.43	0	0	1	0.5660	0	1	0
-1	0.46	0	1	0	0.5880	0	1	0
1	0.57	1	0	0	0.6980	1	0	0
-1	0.52	0	0	1	0.5940	0	1	0
-1	0.31	0	0	1	0.4350	0	1	0
-1	0.55	1	0	0	0.6200	0	0	1
1	0.50	1	0	0	0.5640	0	1	0
1	0.48	0	1	0	0.5590	0	1	0
-1	0.22	0	0	1	0.3450	1	0	0
1	0.59	0	0	1	0.6670	1	0	0
1	0.34	1	0	0	0.4280	0	0	1
-1	0.64	1	0	0	0.7720	0	0	1
1	0.29	0	0	1	0.3350	0	0	1
-1	0.34	0	1	0	0.4320	0	1	0
-1	0.61	1	0	0	0.7500	0	0	1
1	0.64	0	0	1	0.7110	1	0	0
-1	0.29	1	0	0	0.4130	1	0	0
1	0.63	0	1	0	0.7060	1	0	0
-1	0.29	0	1	0	0.4000	1	0	0
-1	0.51	1	0	0	0.6270	0	1	0
-1	0.24	0	0	1	0.3770	1	0	0
1	0.48	0	1	0	0.5750	0	1	0
1	0.18	1	0	0	0.2740	1	0	0
1	0.18	1	0	0	0.2030	0	0	1
1	0.33	0	1	0	0.3820	0	0	1
-1	0.20	0	0	1	0.3480	1	0	0
1	0.29	0	0	1	0.3300	0	0	1
-1	0.44	0	0	1	0.6300	1	0	0
-1	0.65	0	0	1	0.8180	1	0	0
-1	0.56	1	0	0	0.6370	0	0	1
-1	0.52	0	0	1	0.5840	0	1	0
-1	0.29	0	1	0	0.4860	1	0	0
-1	0.47	0	1	0	0.5890	0	1	0
1	0.68	1	0	0	0.7260	0	0	1
1	0.31	0	0	1	0.3600	0	1	0
1	0.61	0	1	0	0.6250	0	0	1
1	0.19	0	1	0	0.2150	0	0	1
1	0.38	0	0	1	0.4300	0	1	0
-1	0.26	1	0	0	0.4230	1	0	0
1	0.61	0	1	0	0.6740	1	0	0
1	0.40	1	0	0	0.4650	0	1	0
-1	0.49	1	0	0	0.6520	0	1	0
1	0.56	1	0	0	0.6750	1	0	0
-1	0.48	0	1	0	0.6600	0	1	0
1	0.52	1	0	0	0.5630	0	0	1
-1	0.18	1	0	0	0.2980	1	0	0
-1	0.56	0	0	1	0.5930	0	0	1
-1	0.52	0	1	0	0.6440	0	1	0
-1	0.18	0	1	0	0.2860	0	1	0
-1	0.58	1	0	0	0.6620	0	0	1
-1	0.39	0	1	0	0.5510	0	1	0
-1	0.46	1	0	0	0.6290	0	1	0
-1	0.40	0	1	0	0.4620	0	1	0
-1	0.60	1	0	0	0.7270	0	0	1
1	0.36	0	1	0	0.4070	0	0	1
1	0.44	1	0	0	0.5230	0	1	0
1	0.28	1	0	0	0.3130	0	0	1
1	0.54	0	0	1	0.6260	1	0	0
-1	0.51	1	0	0	0.6120	0	1	0
-1	0.32	0	1	0	0.4610	0	1	0
1	0.55	1	0	0	0.6270	1	0	0
1	0.25	0	0	1	0.2620	0	0	1
1	0.33	0	0	1	0.3730	0	0	1
-1	0.29	0	1	0	0.4620	1	0	0
1	0.65	1	0	0	0.7270	1	0	0
-1	0.43	0	1	0	0.5140	0	1	0
-1	0.54	0	1	0	0.6480	0	0	1
1	0.61	0	1	0	0.7270	1	0	0
1	0.52	0	1	0	0.6360	1	0	0
1	0.3	0	1	0	0.3350	0	0	1
1	0.29	1	0	0	0.3140	0	0	1
-1	0.47	0	0	1	0.5940	0	1	0
1	0.39	0	1	0	0.4780	0	1	0
1	0.47	0	0	1	0.5200	0	1	0
-1	0.49	1	0	0	0.5860	0	1	0
-1	0.63	0	0	1	0.6740	0	0	1
-1	0.3	1	0	0	0.3920	1	0	0
-1	0.61	0	0	1	0.6960	0	0	1
-1	0.47	0	0	1	0.5870	0	1	0
1	0.3	0	0	1	0.3450	0	0	1
-1	0.51	0	0	1	0.5800	0	1	0
-1	0.24	1	0	0	0.3880	0	1	0
-1	0.49	1	0	0	0.6450	0	1	0
1	0.66	0	0	1	0.7450	1	0	0
-1	0.65	1	0	0	0.7690	1	0	0
-1	0.46	0	1	0	0.5800	1	0	0
-1	0.45	0	0	1	0.5180	0	1	0
-1	0.47	1	0	0	0.6360	1	0	0
-1	0.29	1	0	0	0.4480	1	0	0
-1	0.57	0	0	1	0.6930	0	0	1
-1	0.2	1	0	0	0.2870	0	0	1
-1	0.35	1	0	0	0.4340	0	1	0
-1	0.61	0	0	1	0.6700	0	0	1
-1	0.31	0	0	1	0.3730	0	1	0
1	0.18	1	0	0	0.2080	0	0	1
1	0.26	0	0	1	0.2920	0	0	1
-1	0.28	1	0	0	0.3640	0	0	1
-1	0.59	0	0	1	0.6940	0	0	1
Posted in Keras, PAW | Leave a comment

Logistic Regression for the Banknote Problem Using Raw Python

Every few months I implement a logistic regression (binary classification) model using raw Python (or some other language). The idea is that coding is a skill that must be practiced. One rainy Pacific Northwest afternoon, I zapped out logistic regression for the Banknote Authentication (BA) problem.

The goal of the BA problem is to predict if a banknote (think euro or dollar bill) is real/authentic (class 0) or fake/forgery (class 1). The raw data is available in several places on the Internet and looks like:

3.6216, 8.6661, -2.8073, -0.44699, 0
4.5459, 8.1674, -2.4586, -1.4621, 0
. . . 
-3.5637, -8.3827, 12.393, -1.2823, 1
-2.5419, -0.65804, 2.6842, 1.1952, 1

There are 1,372 data items. There are four predictor variables derived from a digital image of each banknote: variance, skewness, kurtosis, entropy. I broke the dataset into a 1,000-item set for training and a 372-item set by testing. I normalized all predictor values by dividing each by 20.0 so that the normalized values are all between 0.0 and 1.0 and replaced comma delimiters with tabs. The normalized data looks like:

-0.177550    0.094775   0.009325   -0.122045   1
 0.065570    0.227310   0.114675    0.011271   0
-0.200865   -0.415615   0.622735   -0.071875   1
. . . 

There are many design possibilities for logistic regression. I opted for simplicity and just maintained an array of weights (one for each predictor) and a bias value. Therefore, creating the LR model is:

print("Creating logistic regression model ")
wts = np.zeros(4)  # one wt per predictor
lo = -0.01; hi = 0.01
for i in range(len(wts)):
  wts[i] = (hi - lo) * np.random.random() + lo
bias = 0.00

I implemented a compute_outpute() function as:

def compute_output(w, b, x):
  # input x using weights w and bias b
  z = 0.0
  for i in range(len(w)):
    z += w[i] * x[i]
  z += b
  p = 1.0 / (1.0 + np.exp(-z))  # logistic sigmoid
  return p

Anyway, it was a fun exercise and, as always, I gained some new insights into the details of logistic regression.



Some fake upscale brand watches are very difficult to distinguish from authentic. But some fakes are easy to identify. Left: This Rolex is creative but not convincing. Center: I strongly suspect that Ghetto University is not legit. Right: An Apple watch — pretty punny.


Demo code: (replace “lt”, “gt”, lte”, “gte” with Boolean operator symbols — my blog editor chokes on them)

# banknote_logreg.py

# predict real (0) or forgery (1) from
# variance, skewness, kurtosis, entropy (all div by 20.0)
# data:
# -0.177550  0.094775  0.009325  -0.122045   1
#  0.065570  0.227310  0.114675   0.011271   0

# Anaconda3-2020.02  Python 3.7.6
# Windows 10/11

import numpy as np

# -----------------------------------------------------------

def compute_output(w, b, x):
  # input x using weights w and bias b
  z = 0.0
  for i in range(len(w)):
    z += w[i] * x[i]
  z += b
  p = 1.0 / (1.0 + np.exp(-z))  # logistic sigmoid
  return p

# -----------------------------------------------------------

def accuracy(w, b, data_x, data_y):
  n_correct = 0; n_wrong = 0
  for i in range(len(data_x)):
    x = data_x[i]  # inputs
    y = int(data_y[i])  # target 0 or 1
    p = compute_output(w, b, x)
    if (y == 0 and p = 0.5):
      n_correct += 1
    else:
      n_wrong += 1
  acc = (n_correct * 1.0) / (n_correct + n_wrong)
  return acc

# -----------------------------------------------------------

def mse_loss(w, b, data_x, data_y):
  sum = 0.0
  for i in range(len(data_x)):
    x = data_x[i]  # inputs
    y = int(data_y[i])  # target 0 or 1
    p = compute_output(w, b, x)
    sum += (y - p) * (y - p)
  mse = sum / len(data_x)
  return mse

# -----------------------------------------------------------

def main():
  # 0. get ready
  print("\nBegin logistic regression with raw Python demo ")
  np.random.seed(1)

  # 1. load data
  print("\nLoading Banknote train and test data to memory ")
  # variance, skewness, kurtosis, entropy (all div by 20.0)
  # 0 = real, 1 = forgery
  # -0.177550  0.094775  0.009325  -0.122045   1
  #  0.065570  0.227310  0.114675   0.011271   0

  train_file = ".\\Data\\banknote_train.txt"
  train_xy = np.loadtxt(train_file, usecols=range(0,5),
    delimiter="\t", comments="#",  dtype=np.float32) 
  train_x = train_xy[:,0:4]
  train_y = train_xy[:,4]

  test_file = ".\\Data\\banknote_test.txt"
  test_xy = np.loadtxt(test_file, usecols=range(0,5),
    delimiter="\t", comments="#", dtype=np.float32)
  test_x = test_xy[:,0:4]
  test_y = test_xy[:,4]

# -----------------------------------------------------------

  # 2. create model
  print("\nCreating logistic regression model ")
  wts = np.zeros(4)  # one wt per predictor
  lo = -0.01; hi = 0.01
  for i in range(len(wts)):
    wts[i] = (hi - lo) * np.random.random() + lo
  bias = 0.00

# -----------------------------------------------------------

  # 3. train model
  lrn_rate = 0.01
  max_epochs = 100
  indices = np.arange(len(train_x))  # [0, 1, .. 999]
  print("\nTraining using SGD with lrn_rate = %0.4f " % lrn_rate)
  for epoch in range(max_epochs):
    np.random.shuffle(indices)
    for i in indices:
      x = train_x[i]  # inputs
      y = train_y[i]  # target 0.0 or 1.0
      p = compute_output(wts, bias, x)

      # update all wts and the bias
      for j in range(len(wts)):
        wts[j] += lrn_rate * x[j] * (y - p)  # target - oupt
      bias += lrn_rate * (y - p)
    if epoch % 10 == 0:
      loss = mse_loss(wts, bias, train_x, train_y)
      print("epoch = %5d  |  loss = %9.4f " % (epoch, loss))
  print("Done ")

# -----------------------------------------------------------

  # 4. evaluate model
  print("\nEvaluating trained model ")
  acc_train = accuracy(wts, bias, train_x, train_y)
  print("Accuracy on train data: %0.4f " % acc_train)
  acc_test = accuracy(wts, bias, test_x, test_y)
  print("Accuracy on test data: %0.4f " % acc_test)

  # 5. use model
  print("\nPrediction for [0.2, 0.3, 0.5, 0.7] banknote: ")
  x = np.array([0.2, 0.3, 0.5, 0.7], dtype=np.float32)
  p = compute_output(wts, bias, x)
  print("%0.8f " % p)
  if p "lt" 0.5:  # replace here
    print("class 0 (real) ")
  else:
    print("class 1 (forgery) ") 

  # 6. TODO: save trained weights and bias to file

  print("\nEnd Banknote logistic regression demo ")

if __name__ == "__main__":
  main()

Training data. You might lose the tabs if you copy-paste.

Continue reading

Posted in Machine Learning, PAW | Leave a comment

Autoencoder Anomaly Detection Using PyTorch 1.10 on Windows 11

Every few months I revisit my standard neural network examples to make sure that changes in the underlying code libraries (PyTorch, Keras/TensorFlow) haven’t introduced a breaking change(s). One of my standard examples is autoencoder anomaly detection.

The idea is to take a set of data and implement a deep neural network that predicts its input. The values of the interior hidden layer of nodes is a condensed representation of the input. The output nodes are a reconstruction of the input. Data items where the reconstructed input is very different from the associated input are anomalous in some way.

My demo uses a synthetic set of Employee data. There are five feature variables: employee sex (M, F), age, city (anaheim, boulder, concord), annual income, and job-type (mgmt, supp, tech). There are 240 items. The normalized and encoded data looks like:

# sex  age   city      income   job_type
 -1   0.27   0  1  0   0.7610   0  0  1
  1   0.19   0  0  1   0.6550   0  1  0
. . .

My demo network uses a 9-4-(2)-4-9 architecture. The input and output size is determined by the data, but the number of hidden layers and the number of nodes in each, are hyperparameters that must be determined by trial and error.



I love to observe people and things, especially in Las Vegas. On a recent trip to speak at a tech conference, I noticed that electronic versions of games such as Blackjack, Roulette, and Craps display results of recent games. This encourages players to seek out and bet on anomalies — results that appear less than expected or more than expected. Left: This craps game at the MGM Grand shows “hot” numbers and “cold” numbers. Right: The Fortune Cup horse race game shows the results of the most recent 40 races. Fascinating.


Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols — my weak blog editor chokes on symbols.

# employee_auto_anom.py
# autoencoder reconstruction error anomaly detection
# PyTorch 1.10.0-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10/11

import numpy as np
import torch as T

device = T.device('cpu') 

# -----------------------------------------------------------

class EmployeeDataset(T.utils.data.Dataset):
  # sex  age   city     income  job
  # -1   0.27  0  1  0  0.7610  0  0  1
  # +1   0.19  0  0  1  0.6550  0  1  0
  # sex: -1 = male, +1 = female
  # city: anaheim, boulder, concord
  # job: mgmt, supp, tech

  def __init__(self, src_file):
    tmp_x = np.loadtxt(src_file, usecols=range(0,9),
      delimiter="\t", comments="#", dtype=np.float32)
    self.x_data = T.tensor(tmp_x, dtype=T.float32).to(device)

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    preds = self.x_data[idx, :]  # row idx, all cols
    sample = { 'predictors' : preds }  # as Dictionary
    return sample

# -----------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.enc1 = T.nn.Linear(9, 4)  # 9-4-2-4-9
    self.enc2 = T.nn.Linear(4, 2)

    self.dec1 = T.nn.Linear(2, 4)
    self.dec2 = T.nn.Linear(4, 9)

    T.nn.init.xavier_uniform_(self.enc1.weight)
    T.nn.init.zeros_(self.enc1.bias)
    T.nn.init.xavier_uniform_(self.enc2.weight)
    T.nn.init.zeros_(self.enc2.bias)
    T.nn.init.xavier_uniform_(self.dec1.weight)
    T.nn.init.zeros_(self.dec1.bias)
    T.nn.init.xavier_uniform_(self.dec2.weight)
    T.nn.init.zeros_(self.dec2.bias)

  def forward(self, x):
    z = T.tanh(self.enc1(x))
    z = T.tanh(self.enc2(z))
    z = T.tanh(self.dec1(z))
    z = self.dec2(z)  # no activation
    return z

# -----------------------------------------------------------

def analyze_error(model, ds):
  largest_err = 0.0
  worst_x = None
  worst_y = None
  n_features = len(ds[0]['predictors'])

  for i in range(len(ds)):
    X = ds[i]['predictors']
    with T.no_grad():
      Y = model(X)  # should be same as X
    err = T.sum((X-Y)*(X-Y)).item()  # SSE all features
    err = err / n_features           # sort of norm'ed SSE 

    if err "gt" largest_err:
      largest_err = err
      worst_x = X
      worst_y = Y

  np.set_printoptions(formatter={'float': '{: 0.4f}'.format})
  print("Largest reconstruction error: %0.4f" % largest_err)
  print("Worst data item    = ")
  print(worst_x.numpy())
  print("Its reconstruction = " )
  print(worst_y.numpy())

# -----------------------------------------------------------

def main():
  # 0. get started
  print("\nBegin Employee autoencoder anomaly detection ")
  T.manual_seed(2)
  np.random.seed(2)
  
  # 1. create DataLoader objects
  print("\nCreating Employee Dataset ")

  data_file = ".\\Data\\employee_all.txt"
  data_ds = EmployeeDataset(data_file)  # all 240 rows

  bat_size = 20
  data_ldr = T.utils.data.DataLoader(data_ds,
    batch_size=bat_size, shuffle=True)

  # 2. create network
  print("\nCreating 9-4-(2)-4-9 network ")
  net = Net().to(device)

# -----------------------------------------------------------

  # 3. train autoencoder model
  max_epochs = 1000
  ep_log_interval = 100
  lrn_rate = 0.005

  loss_func = T.nn.MSELoss()
  optimizer = T.optim.Adam(net.parameters(), lr=lrn_rate)

  print("\nbat_size = %3d " % bat_size)
  print("loss = " + str(loss_func))
  print("optimizer = Adam")
  print("lrn_rate = %0.3f " % lrn_rate)
  print("max_epochs = %3d " % max_epochs)
  

  print("\nStarting training")
  net.train()
  for epoch in range(0, max_epochs):
    epoch_loss = 0  # for one full epoch

    for (batch_idx, batch) in enumerate(data_ldr):
      X = batch['predictors'] 
      Y = batch['predictors'] 

      optimizer.zero_grad()
      oupt = net(X)
      loss_val = loss_func(oupt, Y)  # a tensor
      epoch_loss += loss_val.item()  # accumulate
      loss_val.backward()
      optimizer.step()

    if epoch % ep_log_interval == 0:
      print("epoch = %4d  |  loss = %0.4f" % \
       (epoch, epoch_loss))
  print("Done ")

# -----------------------------------------------------------

  # 4. find item with largest reconstruction error
  print("\nAnalyzing data for largest reconstruction error \n")
  net.eval()
  analyze_error(net, data_ds)

  print("\nEnd Employee autoencoder anomaly demo ")

if __name__ == "__main__":
  main()

Demo data:

# employee_all.txt
# sex (M = -1, F = +1), age / 100,
# city (anaheim = 100, boulder = 010, concord = 001),
# income / 100_00,
# job_type (mgmt = 100, supp = 010, tech = 001)
#
1	0.24	1	0	0	0.2950	0	0	1
-1	0.39	0	0	1	0.5120	0	1	0
1	0.63	0	1	0	0.7580	1	0	0
-1	0.36	1	0	0	0.4450	0	1	0
1	0.27	0	1	0	0.2860	0	0	1
1	0.50	0	1	0	0.5650	0	1	0
1	0.50	0	0	1	0.5500	0	1	0
-1	0.19	0	0	1	0.3270	1	0	0
1	0.22	0	1	0	0.2770	0	1	0
-1	0.39	0	0	1	0.4710	0	0	1
1	0.34	1	0	0	0.3940	0	1	0
-1	0.22	1	0	0	0.3350	1	0	0
1	0.35	0	0	1	0.3520	0	0	1
-1	0.33	0	1	0	0.4640	0	1	0
1	0.45	0	1	0	0.5410	0	1	0
1	0.42	0	1	0	0.5070	0	1	0
-1	0.33	0	1	0	0.4680	0	1	0
1	0.25	0	0	1	0.3000	0	1	0
-1	0.31	0	1	0	0.4640	1	0	0
1	0.27	1	0	0	0.3250	0	0	1
1	0.48	1	0	0	0.5400	0	1	0
-1	0.64	0	1	0	0.7130	0	0	1
1	0.61	0	1	0	0.7240	1	0	0
1	0.54	0	0	1	0.6100	1	0	0
1	0.29	1	0	0	0.3630	1	0	0
1	0.50	0	0	1	0.5500	0	1	0
1	0.55	0	0	1	0.6250	1	0	0
1	0.40	1	0	0	0.5240	1	0	0
1	0.22	1	0	0	0.2360	0	0	1
1	0.68	0	1	0	0.7840	1	0	0
-1	0.60	1	0	0	0.7170	0	0	1
-1	0.34	0	0	1	0.4650	0	1	0
-1	0.25	0	0	1	0.3710	1	0	0
-1	0.31	0	1	0	0.4890	0	1	0
1	0.43	0	0	1	0.4800	0	1	0
1	0.58	0	1	0	0.6540	0	0	1
-1	0.55	0	1	0	0.6070	0	0	1
-1	0.43	0	1	0	0.5110	0	1	0
-1	0.43	0	0	1	0.5320	0	1	0
-1	0.21	1	0	0	0.3720	1	0	0
1	0.55	0	0	1	0.6460	1	0	0
1	0.64	0	1	0	0.7480	1	0	0
-1	0.41	1	0	0	0.5880	0	1	0
1	0.64	0	0	1	0.7270	1	0	0
-1	0.56	0	0	1	0.6660	0	0	1
1	0.31	0	0	1	0.3600	0	1	0
-1	0.65	0	0	1	0.7010	0	0	1
1	0.55	0	0	1	0.6430	1	0	0
-1	0.25	1	0	0	0.4030	1	0	0
1	0.46	0	0	1	0.5100	0	1	0
-1	0.36	1	0	0	0.5350	1	0	0
1	0.52	0	1	0	0.5810	0	1	0
1	0.61	0	0	1	0.6790	1	0	0
1	0.57	0	0	1	0.6570	1	0	0
-1	0.46	0	1	0	0.5260	0	1	0
-1	0.62	1	0	0	0.6680	0	0	1
1	0.55	0	0	1	0.6270	1	0	0
-1	0.22	0	0	1	0.2770	0	1	0
-1	0.50	1	0	0	0.6290	1	0	0
-1	0.32	0	1	0	0.4180	0	1	0
-1	0.21	0	0	1	0.3560	1	0	0
1	0.44	0	1	0	0.5200	0	1	0
1	0.46	0	1	0	0.5170	0	1	0
1	0.62	0	1	0	0.6970	1	0	0
1	0.57	0	1	0	0.6640	1	0	0
-1	0.67	0	0	1	0.7580	0	0	1
1	0.29	1	0	0	0.3430	0	0	1
1	0.53	1	0	0	0.6010	1	0	0
-1	0.44	1	0	0	0.5480	0	1	0
1	0.46	0	1	0	0.5230	0	1	0
-1	0.20	0	1	0	0.3010	0	1	0
-1	0.38	1	0	0	0.5350	0	1	0
1	0.50	0	1	0	0.5860	0	1	0
1	0.33	0	1	0	0.4250	0	1	0
-1	0.33	0	1	0	0.3930	0	1	0
1	0.26	0	1	0	0.4040	1	0	0
1	0.58	1	0	0	0.7070	1	0	0
1	0.43	0	0	1	0.4800	0	1	0
-1	0.46	1	0	0	0.6440	1	0	0
1	0.60	1	0	0	0.7170	1	0	0
-1	0.42	1	0	0	0.4890	0	1	0
-1	0.56	0	0	1	0.5640	0	0	1
-1	0.62	0	1	0	0.6630	0	0	1
-1	0.50	1	0	0	0.6480	0	1	0
1	0.47	0	0	1	0.5200	0	1	0
-1	0.67	0	1	0	0.8040	0	0	1
-1	0.40	0	0	1	0.5040	0	1	0
1	0.42	0	1	0	0.4840	0	1	0
1	0.64	1	0	0	0.7200	1	0	0
-1	0.47	1	0	0	0.5870	0	0	1
1	0.45	0	1	0	0.5280	0	1	0
-1	0.25	0	0	1	0.4090	1	0	0
1	0.38	1	0	0	0.4840	1	0	0
1	0.55	0	0	1	0.6000	0	1	0
-1	0.44	1	0	0	0.6060	0	1	0
1	0.33	1	0	0	0.4100	0	1	0
1	0.34	0	0	1	0.3900	0	1	0
1	0.27	0	1	0	0.3370	0	0	1
1	0.32	0	1	0	0.4070	0	1	0
1	0.42	0	0	1	0.4700	0	1	0
-1	0.24	0	0	1	0.4030	1	0	0
1	0.42	0	1	0	0.5030	0	1	0
1	0.25	0	0	1	0.2800	0	0	1
1	0.51	0	1	0	0.5800	0	1	0
-1	0.55	0	1	0	0.6350	0	0	1
1	0.44	1	0	0	0.4780	0	0	1
-1	0.18	1	0	0	0.3980	1	0	0
-1	0.67	0	1	0	0.7160	0	0	1
1	0.45	0	0	1	0.5000	0	1	0
1	0.48	1	0	0	0.5580	0	1	0
-1	0.25	0	1	0	0.3900	0	1	0
-1	0.67	1	0	0	0.7830	0	1	0
1	0.37	0	0	1	0.4200	0	1	0
-1	0.32	1	0	0	0.4270	0	1	0
1	0.48	1	0	0	0.5700	0	1	0
-1	0.66	0	0	1	0.7500	0	0	1
1	0.61	1	0	0	0.7000	1	0	0
-1	0.58	0	0	1	0.6890	0	1	0
1	0.19	1	0	0	0.2400	0	0	1
1	0.38	0	0	1	0.4300	0	1	0
-1	0.27	1	0	0	0.3640	0	1	0
1	0.42	1	0	0	0.4800	0	1	0
1	0.60	1	0	0	0.7130	1	0	0
-1	0.27	0	0	1	0.3480	1	0	0
1	0.29	0	1	0	0.3710	1	0	0
-1	0.43	1	0	0	0.5670	0	1	0
1	0.48	1	0	0	0.5670	0	1	0
1	0.27	0	0	1	0.2940	0	0	1
-1	0.44	1	0	0	0.5520	1	0	0
1	0.23	0	1	0	0.2630	0	0	1
-1	0.36	0	1	0	0.5300	0	0	1
1	0.64	0	0	1	0.7250	1	0	0
1	0.29	0	0	1	0.3000	0	0	1
-1	0.33	1	0	0	0.4930	0	1	0
-1	0.66	0	1	0	0.7500	0	0	1
-1	0.21	0	0	1	0.3430	1	0	0
1	0.27	1	0	0	0.3270	0	0	1
1	0.29	1	0	0	0.3180	0	0	1
-1	0.31	1	0	0	0.4860	0	1	0
1	0.36	0	0	1	0.4100	0	1	0
1	0.49	0	1	0	0.5570	0	1	0
-1	0.28	1	0	0	0.3840	1	0	0
-1	0.43	0	0	1	0.5660	0	1	0
-1	0.46	0	1	0	0.5880	0	1	0
1	0.57	1	0	0	0.6980	1	0	0
-1	0.52	0	0	1	0.5940	0	1	0
-1	0.31	0	0	1	0.4350	0	1	0
-1	0.55	1	0	0	0.6200	0	0	1
1	0.50	1	0	0	0.5640	0	1	0
1	0.48	0	1	0	0.5590	0	1	0
-1	0.22	0	0	1	0.3450	1	0	0
1	0.59	0	0	1	0.6670	1	0	0
1	0.34	1	0	0	0.4280	0	0	1
-1	0.64	1	0	0	0.7720	0	0	1
1	0.29	0	0	1	0.3350	0	0	1
-1	0.34	0	1	0	0.4320	0	1	0
-1	0.61	1	0	0	0.7500	0	0	1
1	0.64	0	0	1	0.7110	1	0	0
-1	0.29	1	0	0	0.4130	1	0	0
1	0.63	0	1	0	0.7060	1	0	0
-1	0.29	0	1	0	0.4000	1	0	0
-1	0.51	1	0	0	0.6270	0	1	0
-1	0.24	0	0	1	0.3770	1	0	0
1	0.48	0	1	0	0.5750	0	1	0
1	0.18	1	0	0	0.2740	1	0	0
1	0.18	1	0	0	0.2030	0	0	1
1	0.33	0	1	0	0.3820	0	0	1
-1	0.20	0	0	1	0.3480	1	0	0
1	0.29	0	0	1	0.3300	0	0	1
-1	0.44	0	0	1	0.6300	1	0	0
-1	0.65	0	0	1	0.8180	1	0	0
-1	0.56	1	0	0	0.6370	0	0	1
-1	0.52	0	0	1	0.5840	0	1	0
-1	0.29	0	1	0	0.4860	1	0	0
-1	0.47	0	1	0	0.5890	0	1	0
1	0.68	1	0	0	0.7260	0	0	1
1	0.31	0	0	1	0.3600	0	1	0
1	0.61	0	1	0	0.6250	0	0	1
1	0.19	0	1	0	0.2150	0	0	1
1	0.38	0	0	1	0.4300	0	1	0
-1	0.26	1	0	0	0.4230	1	0	0
1	0.61	0	1	0	0.6740	1	0	0
1	0.40	1	0	0	0.4650	0	1	0
-1	0.49	1	0	0	0.6520	0	1	0
1	0.56	1	0	0	0.6750	1	0	0
-1	0.48	0	1	0	0.6600	0	1	0
1	0.52	1	0	0	0.5630	0	0	1
-1	0.18	1	0	0	0.2980	1	0	0
-1	0.56	0	0	1	0.5930	0	0	1
-1	0.52	0	1	0	0.6440	0	1	0
-1	0.18	0	1	0	0.2860	0	1	0
-1	0.58	1	0	0	0.6620	0	0	1
-1	0.39	0	1	0	0.5510	0	1	0
-1	0.46	1	0	0	0.6290	0	1	0
-1	0.40	0	1	0	0.4620	0	1	0
-1	0.60	1	0	0	0.7270	0	0	1
1	0.36	0	1	0	0.4070	0	0	1
1	0.44	1	0	0	0.5230	0	1	0
1	0.28	1	0	0	0.3130	0	0	1
1	0.54	0	0	1	0.6260	1	0	0
-1	0.51	1	0	0	0.6120	0	1	0
-1	0.32	0	1	0	0.4610	0	1	0
1	0.55	1	0	0	0.6270	1	0	0
1	0.25	0	0	1	0.2620	0	0	1
1	0.33	0	0	1	0.3730	0	0	1
-1	0.29	0	1	0	0.4620	1	0	0
1	0.65	1	0	0	0.7270	1	0	0
-1	0.43	0	1	0	0.5140	0	1	0
-1	0.54	0	1	0	0.6480	0	0	1
1	0.61	0	1	0	0.7270	1	0	0
1	0.52	0	1	0	0.6360	1	0	0
1	0.3	0	1	0	0.3350	0	0	1
1	0.29	1	0	0	0.3140	0	0	1
-1	0.47	0	0	1	0.5940	0	1	0
1	0.39	0	1	0	0.4780	0	1	0
1	0.47	0	0	1	0.5200	0	1	0
-1	0.49	1	0	0	0.5860	0	1	0
-1	0.63	0	0	1	0.6740	0	0	1
-1	0.3	1	0	0	0.3920	1	0	0
-1	0.61	0	0	1	0.6960	0	0	1
-1	0.47	0	0	1	0.5870	0	1	0
1	0.3	0	0	1	0.3450	0	0	1
-1	0.51	0	0	1	0.5800	0	1	0
-1	0.24	1	0	0	0.3880	0	1	0
-1	0.49	1	0	0	0.6450	0	1	0
1	0.66	0	0	1	0.7450	1	0	0
-1	0.65	1	0	0	0.7690	1	0	0
-1	0.46	0	1	0	0.5800	1	0	0
-1	0.45	0	0	1	0.5180	0	1	0
-1	0.47	1	0	0	0.6360	1	0	0
-1	0.29	1	0	0	0.4480	1	0	0
-1	0.57	0	0	1	0.6930	0	0	1
-1	0.2	1	0	0	0.2870	0	0	1
-1	0.35	1	0	0	0.4340	0	1	0
-1	0.61	0	0	1	0.6700	0	0	1
-1	0.31	0	0	1	0.3730	0	1	0
1	0.18	1	0	0	0.2080	0	0	1
1	0.26	0	0	1	0.2920	0	0	1
-1	0.28	1	0	0	0.3640	0	0	1
-1	0.59	0	0	1	0.6940	0	0	1
Posted in PAW, PyTorch | Leave a comment

Naive Bayes Classification Using C# in Visual Studio Magazine

I wrote an article titled “Naive Bayes Classification Using C#” in the May 2022 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2022/05/02/naive-bayes-classification-csharp.aspx.

I present a complete demo program. The demo uses a set of 40 data items where each item consists of a person’s occupation (actor, baker, clerk or diver), eye color (green or hazel), country (Italy, Japan or Korea), and their personality optimism score (0, 1 or 2). You want to predict a person’s optimism score from their occupation, eye color and country. (This is an example of multiclass classification because the variable to predict, optimism, has three or more possible values.)

The first few data items look like:

actor  green  korea  1
baker  green  italy  0
diver  hazel  japan  0
diver  green  japan  1
clerk  hazel  japan  2
. . . 

The demo sets up an item to predict as (“baker”, “hazel”, “italy”). Next, the demo scans through the data and computes and displays smoothed (“add 1”) joint counts. For example, the 5 in the screenshot means that there are 4 bakers who have optimism class = 0.

The demo computes the raw, unsmoothed class counts as (19, 14, 7). This means there are 19 people with optimism class = 0, 14 people with class = 1, and 7 people with class = 2. Notice that 19 + 14 + 7 = 40, the number of data items.

The smoothed joint counts and the raw class counts are combined mathematically to produce evidence terms of (0.0027, 0.0013, 0.0021). These correspond to the likelihoods of class (0, 1, 2). Because the largest evidence value is at index [0], the prediction for the (“baker”, “hazel”, “italy”) person is class 0.

Evidence terms are somewhat difficult to interpret so the demo converts the three evidence terms to pseudo-probabilities: (0.4418, 0.2116, 0.3466). The values are not true mathematical probabilities but because they sum to 1.0 they can loosely be interpreted as probabilities. The largest probability is at index [0].

Naive Bayes classification is called “naive” because it analyzes each predictor column independently. This doesn’t take into account interactions between predictor values. For example, in the demo data, maybe clerks who have green eyes might have some special characteristics. The technique is “Bayesian” because the math is based on observed counts of data rather than some underlying theory.

The technique presented in the article works only with categorical data. There are other forms of naive Bayes classification that can handle numeric data. However, you must make assumptions about the math properties of the data, for example that the data has a normal (Gaussian) distribution with a certain mean and standard deviation.

Naive Bayes classification isn’t used as much as it used to be because techniques based on neural networks are much more powerful. However, neural techniques usually require lots of data. Naive Bayes classification often works well with small datasets.

You can find the complete C# demo code in the VSM article at the URL/link above.



In many of the comedy movies that I like, there is a naive character whose lack of sophistication leads to funny situations. Left: In “Dumb and Dumber To” (2014), buddies Lloyd (actor Jim Carrey) and Harry (Jeff Daniels) are orders of magnitude beyond naive but somehow always manage to emerge with success. Center: In “Stuck On You” (2003), conjoined twins Bob (Matt Damon) and Walt (Greg Kinnear) go to Hollywood so Walt can become an actor. The brothers are nice to everyone including their neighbor April (Eva Mendes) who is blissfully unaware of her surroundings. Right: In “Game Night” (2018) wife Annie (Rachel McAdams) is oblivious to danger when she and husband Max (Jason Bateman) are in a sketchy bar filled with not-very-nice criminals.


Posted in Machine Learning | Leave a comment

The Boston Area Housing Problem Using Keras 2.8 on Windows 11

For the past few days I’ve been revisiting some of my standard neural network problems. One of these is the Boston Area House Price problem. The data has 506 data items. Each data item represents a town or village near Boston. The goal is to predict the median house price in the town. There are 13 predictors variables such as crime rate in town, average number of rooms per house in town, density of Blacks in town, and so on. All of the predictors are numeric except one, which is a Boolean (0, 1) indicator if the town is adjacent to the Charles River or not.

The data is from a 1978 research paper. I fetched the raw data from lib.stat.cmu.edu/datasets/boston and normalized it by dividing each of the 12 predictor columns, and the house price column, by either 1, 10, 100, or 1000 so that all numeric values are between 0.0 and 1.0. I re-encoded the Charles River predictor variable as adjacent = 1 and not-adjacent = -1. I randomly split the 506 data items into a 400-item training set and a 106-item test set.

I set up a 13-(10-10)-1 neural network with tanh() hidden activation and no output activation. For training, I used the Adam optimizer with initial learning rate of 0.01, and a batch size of 10.

For a regression problem, you must define a custom accuracy() function that marks a predicted house price as correct if it’s within a specified percent of the true house price. My model achieved 73% accuracy (where the predicted house price is within 15% of the true price).

It was a fun experiment. Note: I did the same problem using PyTorch 1.10. See https://jamesmccaffrey.wordpress.com/2022/05/09/the-boston-area-house-price-problem-using-pytorch-1-10-on-windows-10-11/.



Left: “Mars and Beyond” was an episode of the Disneyland TV show which aired on December 4, 1957. It speculated about house structures on Mars. I’ve seen this show and it’s amazing.

Center: “The Jetsons” was a TV show that aired for one season, 1962-1963. There were 24 episodes. I’ve seen a couple of episodes. I like the art, especially the house architecture, but the stories are too simple for my taste.

Right: “Colonel Bleep” was the first cartoon series made for television. It ran in 1957. There were 100 episodes made where each episode was about 5 minutes long. Colonel Bleep was an alien policeman sent to Earth to stop crime. Bleep lived in a cool space age domed house.


Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols — my blog editor chokes on symbols.

For the data, see https://jamesmccaffrey.wordpress.com/2022/05/09/the-boston-area-house-price-problem-using-pytorch-1-10-on-windows-10-11/

# boston_tfk.py
# Boston Area House Price regression
# Keras 2.8.0 in TensorFlow 2.8.0 ("_tfk")
# Anaconda3-2020.02  Python 3.7.6  Windows 10/11

import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'  # suppress CPU warn

import numpy as np
import tensorflow as tf
from tensorflow import keras as K

# -----------------------------------------------------------

class MyLogger(K.callbacks.Callback):
  def __init__(self, n):
    self.n = n   # print loss every n epochs
        
  def on_epoch_end(self, epoch, logs={}):
    if epoch % self.n == 0:
      curr_loss = logs.get('loss')  # loss on curr batch
      print("epoch = %4d  |  curr loss = \
%0.6f " % (epoch, curr_loss))

# -----------------------------------------------------------

def accuracy(model, data_x, data_y, pct_close):
  # item-by-item -- slow -- for debugging
  n_correct = 0; n_wrong = 0
  n = len(data_x)
  for i in range(n):
    x = np.array([data_x[i]])  # [[ x ]]
    predicted = model.predict(x)  
    actual = data_y[i]
    if np.abs(predicted[0][0] - actual) "lt" \
      np.abs(pct_close * actual):
      n_correct += 1
    else:
      n_wrong += 1
  return (n_correct * 1.0) / (n_correct + n_wrong)

# -----------------------------------------------------------

def accuracy_x(model, data_x, data_y, pct_close):
  n = len(data_x)
  oupt = model(data_x)
  oupt = tf.reshape(oupt, [-1])  # 1D
 
  max_deltas = tf.abs(pct_close * data_y)  # max allow deltas
  abs_deltas = tf.abs(oupt - data_y)   # actual differences
  results = abs_deltas "lt" max_deltas    # [True, False, . .]

  n_correct = np.sum(results)
  acc = n_correct / n
  return acc

# -----------------------------------------------------------

def main():
  # 0. prepare
  print("\nBoston regression using PyTorch ")
  np.random.random(9)
  tf.random.set_seed(9)

  # 1. load data
  print("\nLoading Boston train and test data into memory ")
  train_file = ".\\Data\\boston_train.txt"
  all_xy = np.loadtxt(train_file, usecols=range(0,14),
    delimiter="\t", comments="#", dtype=np.float32)
  train_x = all_xy[:,0:13]
  train_y = all_xy[:,13]

  test_file = ".\\Data\\boston_test.txt"
  all_xy = np.loadtxt(test_file, usecols=range(0,14),
    delimiter="\t", comments="#", dtype=np.float32)
  test_x = all_xy[:,0:13]
  test_y = all_xy[:,13]
  
# -----------------------------------------------------------
  
  # 2. create network
  print("\nCreating 13-(10-10)-1 regression network ")
  model = K.models.Sequential()
  model.add(K.layers.Dense(units=10, input_dim=13,
    activation='tanh', kernel_initializer='glorot_uniform',
    bias_initializer='zeros'))  # hid1
  model.add(K.layers.Dense(units=10,
    activation='tanh', kernel_initializer='glorot_uniform',
    bias_initializer='zeros'))  # hid2
  model.add(K.layers.Dense(units=1,
    activation=None, kernel_initializer='glorot_uniform',
    bias_initializer='zeros'))    # output layer
  opt = K.optimizers.Adam(learning_rate=0.01)
  model.compile(loss='mean_squared_error',
    optimizer=opt, metrics=['mse'])

# -----------------------------------------------------------

  # 3. train network
  max_epochs = 1000
  log_every = 100
  my_logger = MyLogger(log_every) 

  print("\nbatch size = 10 ")
  print("loss = mean_squared_error ")
  print("optimizer = Adam ")
  print("learn rate = 0.01 ")
  print("max epochs = 1000 ")

  print("\nStarting training ")
  h = model.fit(train_x, train_y, batch_size=10,
    epochs=max_epochs, verbose=0, callbacks=[my_logger])
  print("Done ")

# -----------------------------------------------------------

  # 4. evaluate network
  acc_train = accuracy_x(model, train_x, train_y, 0.15) 
  print("\nAccuracy on train (within 0.15) = %0.4f " % acc_train)
  acc_test = accuracy_x(model, test_x, test_y, 0.15) 
  print("Accuracy on test (within 0.15) = %0.4f " % acc_test)

# -----------------------------------------------------------

  # 5. save model
  # print("\nSaving trained model as boston_model.h5 ")
  # model.save_weights(".\\Models\\boston_model_wts.h5")
  # model.save(".\\Models\\boston_model.h5")

# -----------------------------------------------------------

  # 6. use model
  np.set_printoptions(formatter={'float': '{: 0.6f}'.format})
  print("\nPredicting normalized price from first train item ")
  print("Actual price = 0.2160 ")

  x = np.array([[0.000273, 0.000, 0.0707, -1, 0.469,
    0.6421, 0.789, 0.049671, 0.02, 0.242, 0.178,
    0.39690, 0.0914]], dtype=np.float32)

  # x = np.array([[0.000063, 0.180, 0.0231, -1, 0.538,
  #   0.6575, 0.652, 0.040900, 0.01, 0.296, 0.153,
  #   0.39690, 0.0498]], dtype=np.float32)
  oupt = model.predict(x)
  print("Predicted price = %0.4f " % oupt)

  print("\nEnd demo ")

if __name__=="__main__":
  main()
Posted in Keras | Leave a comment

Naive Bayes Classification Example Using Raw Python 3.7

I’m preparing the content for an all-day hands-on workshop. My main topics are all about neural networks, but I have a few classical techniques too, including naive Bayes classification. Here’s an example that I’ll use in the workshop.

There are 40 data items that look like:

actuary  green  korea  1
barista  green  italy  0
dentist  hazel  japan  0
chemist  hazel  japan  2
. . . 

Each line of data is a person. The columns are job-type, eye-color, country, and personality extraversion (0, 1, 2). Suppose you want to predict the personality extraversion score of a person who is (barista, hazel, italy).

The first step is to compute the joint counts of each class (0, 1, 2) looking at each predictor variable separately (“naive”).

baker and class 0 = 3 + 1 = 4
baker and class 1 = 0 + 1 = 1
baker and class 2 = 1 + 1 = 2

hazel and class 0 = 5 + 1 = 6
hazel and class 1 = 2 + 1 = 3
hazel and class 2 = 2 + 1 = 3

italy and class 0 = 1 + 1 = 2
italy and class 1 = 5 + 1 = 6
italy and class 2 = 1 + 1 = 2

You add 1 to each raw count so that no count is 0. This is called Laplacian Smoothing.

The second step is to compute the raw counts, without smoothing, of each class:

class 0 = 19
class 1 = 14
class 2 =  7

The third step is to combine the results from step 1 and 2 using some fancy probability (“Bayes”), to get what are called evidence values (Z) for each class:

Z(0) = (5 / 19+3) * (6 / 19+3) * (2 / 19+3) * (19 / 40)
     = 4/22 * 5/22 * 1/22 * 19/40
     = 0.1818 * 0.2273 * 0.0435 * 0.4750
     = 0.0027

Z(1) = (1 / 14+3) * (3 / 14+3) * (6 / 14+3) * (14 / 40)
     = 1/17 * 3/17 * 6/17 * 14/40
     = 0.0588 * 0.1765 * 0.3529 * 0.3500
     = 0.0013

Z(2) = (2 / 7+3) * (3 / 7+3) * (2 / 7+3) * (7 / 40)
     = 2/10 * 3/10 * 2/10 * 7/40
     = 0.2000 * 0.3000 * 0.2000 * 0.1750
     = 0.0021

Note: All the “+3” terms are because there are 3 predictor variables. At this point, the predicted class is the one with the largest evidence value, which is class 0.

An optional final step is to normalize the evidence values so that they sum to 1.0 and can be loosely interpreted as pseudo-probabilities. The easiest way to do this is to divide each evidence value by the sum:

sum = 0.0027 + 0.0013 + 0.0021 = 0.0061

P(class 0) = 0.0027 / 0.0061 = 0.4418
P(class 1) = 0.0013 / 0.0061 = 0.2116
P(class 2) = 0.0021 / 0.0061 = 0.3466

As before, class 0 has the largest pseudo-probability so that’s the predicted class for a (barista, hazel, italy) person.

There are many variations of naive Bayes classification. This example is just one version, for problems where the predictor values are categorical (non-numeric).



The term “naive” means simple and unsophisticated. The terms applies well to my two dogs, Kevin and Riley. Left: Kevin when he just joined my family which already included Riley. Center: I woke up from a nap one afternoon, to find that Riley had proudly brought me my “Chess Life” magazine and some socks. She is waiting for praise. Right: Kevin went through a phase where he was obsessed by socks.


Demo code:

# naive_bayes.py
# Anaconda3-2020.02  Python 3.7.6
# Windows 10/11

import numpy as np

# -----------------------------------------------------------

def main():
  print("\nBegin naive Bayes classification ")
  data = np.loadtxt(".\\people_data.txt", dtype=str,
    delimiter=" ", comments="#")
  print("\nData looks like: ")
  for i in range(5):
    print(data[i])
  print(". . . \n")

  nx = 3  # number predictor variables
  nc = 3  # number classes
  N = 40  # data items
  joint_cts = np.zeros((nx,nc), dtype=np.int64) 
  y_cts = np.zeros(nc, dtype=np.int64)

# -----------------------------------------------------------

  # X = ['dentist', 'hazel', 'italy']
  X = ['barista', 'hazel', 'italy']
  print("Item to predict/classify: ")
  print(X)

  for i in range(N):
    y = int(data[i,nx])  # class is in last column
    y_cts[y] += 1
    for j in range(nx):
      if data[i][j] == X[j]:
        joint_cts[j][y] += 1

  joint_cts += 1  # Laplacian smoothing

  print("\nJoint counts (smoothed): ")
  print(joint_cts)
  print("\nClass counts (raw): ")
  print(y_cts)

# -----------------------------------------------------------

  # compute evidence terms directly
  # e_terms = np.zeros(nc, dtype=np.float32) 
  # for k in range(nc):
  #   v = 1.0
  #   for j in range(nx):
  #     v *= joint_cts[j,k] / (y_cts[k] + nx)
  #   v *= y_cts[k] / N
  #   e_terms[k] = v

# -----------------------------------------------------------

  # compute evidence terms using log trick to avoid underflow
  e_terms = np.zeros(nc, dtype=np.float32) 
  for k in range(nc):
    v = 0.0
    for j in range(nx):
      v += np.log(joint_cts[j,k]) - np.log(y_cts[k] + nx)
    v += np.log(y_cts[k]) - np.log(N)
    e_terms[k] = np.exp(v)

# -----------------------------------------------------------

  np.set_printoptions(precision=4, suppress=True)
  print("\nEvidence terms: ")
  print(e_terms)

  sum_evidence = np.sum(e_terms)
  probs = np.zeros(nc, dtype=np.float32)
  for k in range(nc):
    probs[k] = e_terms[k] / sum_evidence

  print("\nPseudo-probabilities: ")
  print(probs)

  pc = np.argmax(probs)
  print("\nPredicted class: ")
  print(pc)

  print("\nEnd naive Bayes demo ")

if __name__ == "__main__":
  main()

Demo data:

# people_data.txt
# job-type  eye-color country  extraversion
#
actuary green korea 1
barista green italy 0
dentist hazel japan 0
dentist green japan 1
chemist hazel japan 2
actuary green japan 1
actuary green japan 0
chemist green italy 1
chemist green italy 2
dentist green japan 1
dentist green japan 0
dentist green japan 1
dentist green japan 2
chemist green italy 1
dentist green japan 1
dentist hazel japan 0
chemist green korea 1
barista green japan 0
actuary green italy 1
actuary green italy 1
dentist green korea 0
barista green japan 2
dentist green japan 0
barista green korea 0
dentist green japan 0
actuary hazel italy 1
dentist hazel japan 0
dentist green japan 2
dentist green japan 0
chemist hazel japan 2
dentist green korea 0
dentist hazel korea 0
dentist green japan 0
dentist green japan 2
dentist hazel japan 0
actuary hazel japan 1
actuary green japan 0
actuary green japan 1
dentist green japan 0
barista green japan 0
Posted in Machine Learning, PAW | Leave a comment

The Intuition Behind Naive Bayes Classification

I’m preparing an all-day hands-on workshop. My main topics are all about neural networks, but I have a few classical techniques too, including naive Bayes classification.

I start my explanation of naive Bayes by explaining the general idea (“the intuition”). In the image below, there are 40 data items. Each item has three predictor variables: job-type, eye-color, country. The variable to predict has three possible values (0, 1, 2). You can imagine the variable to predict is personality extraversion.

In the example, the goal is to predict the class of a person who is (barista, hazel, italy). Notice that the 30-item dataset doesn’t have anyone that matches the input. Also notice the predictor variables are all categorical. Naive Bayes works on counts/frequencies of items so all data must be categorical.

I ask attendees, “Suppose you know only that the person is a barista. What do you predict?” Attendees usually see that of the 5 barista items, 4 are class 0, so the best guess is class 0. Mathematically, P(0 | barista) = 4/5 = 0.80.

Then I ask, “Suppose you know only that the person has hazel eye color.” Of the 9 hazel items, 5 are class 0, so again the best guess is class 0. Mathematically, P(0 | hazel) = 5/9 = 0.55.

Finally, if you know only that the person is from Italy, 5 of 7 are class 1, so class 1 is the best guess. Mathematically, P(1 | italy) = 5/7 = 0.71.

At this point, the best overall guess is class 0 because two out of three guesses were class 0. The technique is called “naive” because it looks at each predictor column independently, and “Bayes” because it’s based on observed probabilities.

This is the intuition behind naive Bayes classification. The next step is to combine the observed probabilities in a logical way, but that’s another blog post.



A naive person is someone who is overly optimistic and believes in the good intentions of others. This doesn’t always work well with aliens.

Left: Shortly after aliens land in “War of the Worlds” (1951), a pastor goes out to greet them with a message of peace. It did not end well.

Right: In “Independence Day” (1996) a large group of people went to the top of an office building below an alien spaceship, carrying “Welcome!” signs. The aliens were not impressed.


Posted in Machine Learning | Leave a comment