Multi-Class Classification Using a scikit MLPClassifier Neural Network

The scikit-learn (aka scikit) library can do neural networks. In a work environment, I use PyTorch because it’s very flexible. But one morning, just for fun and mental exercise, I decided to create a scikit neural network multi-class classifier to compare it to PyTorch.

I used one of my standard synthetic datasets. The data looks like:

 1   0.24   1   0   0   0.2950   2
-1   0.39   0   0   1   0.5120   1
 1   0.63   0   1   0   0.7580   0
-1   0.36   1   0   0   0.4450   1
. . . 

Each line of data represents a person. The fields are sex (male = -1, female = 1), age (normalized by dividing by 100), state (michigan = 100, nebraska = 010, oklahoma = 001), annual income (divided by 100,000), and politics type (0 = conservative, 1 = moderate, 2 = liberal). The goal is to predict politics type from sex, age, state, income.

One of the characteristics of scikit classes is that many of them have zillions of parameters:

  # MLPClassifier(hidden_layer_sizes=(100,),
  #  activation='relu', *, solver='adam', alpha=0.0001,
  #  batch_size='auto', learning_rate='constant',
  #  learning_rate_init=0.001, power_t=0.5, max_iter=200,
  #  shuffle=True, random_state=None, tol=0.0001,
  #  verbose=False, warm_start=False, momentum=0.9,
  #  nesterovs_momentum=True, early_stopping=False,
  #  validation_fraction=0.1, beta_1=0.9, beta_2=0.999,
  #  epsilon=1e-08, n_iter_no_change=10, max_fun=15000)

From my years of experience with neural networks, I understand what these parameters do, but for someone new to neural networks, parsing through these parameters would take a long, long time.

For my demo, I set these parameters and created the network like so:

  import numpy as np 
  from sklearn.neural_network import MLPClassifier

  params = { 'hidden_layer_sizes' : [10,10],
    'activation' : 'tanh',
    'solver' : 'sgd',
    'alpha' : 0.0,
    'batch_size' : 10,
    'random_state' : 1,
    'tol' : 0.0001,
    'nesterovs_momentum' : False,
    'learning_rate' : 'constant',
    'learning_rate_init' : 0.01,
    'max_iter' : 1000,
    'shuffle' : True,
    'n_iter_no_change' : 90,
    'verbose' : False }

  print("\nCreating 6-(10-10)-3 tanh neural network ")
  net = MLPClassifier(**params)

Explaining all these parameters would take pages so I won’t try. I will point out that the n_iter_no_change is very important because otherwise training will automatically stop after a default of 10 iterations with no significant (the tol parameter) improvement — and in my experiments, the default often ended training too soon.

It was a fun and interesting experiment.



The parameters of the scikit MLPClassifier are essentially the control panel of the module. Left: Control panel for the Three Mile Island nuclear power planet, near Harrisburg, Pennsylvania. It was shut down in 2019 and is being decommissioned. Right: Control panel for the Watts Bar nuclear plant near Chattanooga, Tennessee. It’s the newest nuclear plant in the U.S.


Demo code below. The training and test data are below and also at: https://jamesmccaffrey.wordpress.com/2022/09/01/multi-class-classification-using-pytorch-1-12-1-on-windows-10-11/.

# people_politics_nn_sckit.py

# predict politics (0 = con, 1 = mod, 2 = lib) 
# from sex, age, state, income

# sex  age    state    income   politics
# -1   0.27   0  1  0   0.7610   2
#  1   0.19   0  0  1   0.6550   0
# sex: 0 = male, 1 = female
# state: michigan = 100, nebraska = 010, oklahoma = 001
# politics: conservative, moderate, liberal

# Anaconda3-2020.02  Python 3.7.6  scikit 0.22.1
# Windows 10/11

import numpy as np 
from sklearn.neural_network import MLPClassifier
import warnings
warnings.filterwarnings('ignore')  # early-stop warnings

# ---------------------------------------------------------

def show_confusion(cm):
  dim = len(cm)
  mx = np.max(cm)             # largest count in cm
  wid = len(str(mx)) + 1      # width to print
  fmt = "%" + str(wid) + "d"  # like "%3d"
  for i in range(dim):
    print("actual   ", end="")
    print("%3d:" % i, end="")
    for j in range(dim):
      print(fmt % cm[i][j], end="")
    print("")
  print("------------")
  print("predicted    ", end="")
  for j in range(dim):
    print(fmt % j, end="")
  print("")

# ---------------------------------------------------------

def main():
  # 0. get ready
  print("\nBegin scikit neural network example ")
  print("Predict politics from sex, age, State, income ")
  np.random.seed(1)
  np.set_printoptions(precision=4, suppress=True)

  # sex  age    state    income   politics
  # -1   0.27   0  1  0   0.7610   2
  #  1   0.19   0  0  1   0.6550   0

  # 1. load data
  print("\nLoading data into memory ")
  train_file = ".\\Data\\people_train.txt"
  train_xy = np.loadtxt(train_file, usecols=range(0,7),
    delimiter="\t", comments="#",  dtype=np.float32) 
  train_x = train_xy[:,0:6]
  train_y = train_xy[:,6].astype(int)

  test_file = ".\\Data\\people_test.txt"
  test_xy = np.loadtxt(test_file, usecols=range(0,7),
    delimiter="\t", comments="#",  dtype=np.float32) 
  test_x = test_xy[:,0:6]
  test_y = test_xy[:,6].astype(int)

  print("\nTraining data:")
  print(train_x[0:4])
  print(". . . \n")
  print(train_y[0:4])
  print(". . . ")
 
# ---------------------------------------------------------

  # 2. create network 
  # MLPClassifier(hidden_layer_sizes=(100,),
  #  activation='relu', *, solver='adam', alpha=0.0001,
  #  batch_size='auto', learning_rate='constant',
  #  learning_rate_init=0.001, power_t=0.5, max_iter=200,
  #  shuffle=True, random_state=None, tol=0.0001,
  #  verbose=False, warm_start=False, momentum=0.9,
  #  nesterovs_momentum=True, early_stopping=False,
  #  validation_fraction=0.1, beta_1=0.9, beta_2=0.999,
  #  epsilon=1e-08, n_iter_no_change=10, max_fun=15000)

  params = { 'hidden_layer_sizes' : [10,10],
    'activation' : 'tanh',
    'solver' : 'sgd',
    'alpha' : 0.0,
    'batch_size' : 10,
    'random_state' : 1,
    'tol' : 0.0001,
    'nesterovs_momentum' : False,
    'learning_rate' : 'constant',
    'learning_rate_init' : 0.01,
    'max_iter' : 1000,
    'shuffle' : True,
    'n_iter_no_change' : 90,
    'verbose' : False }
       
  print("\nCreating 6-(10-10)-3 tanh neural network ")
  net = MLPClassifier(**params)

# ---------------------------------------------------------

  # 3. train
  print("\nTraining with bat sz = " + \
    str(params['batch_size']) + " lrn rate = " + \
    str(params['learning_rate_init']) + " ")
  print("Stop if no change " + \
    str(params['n_iter_no_change']) + " iterations ")
  net.fit(train_x, train_y)
  print("Done ")

  # 4. evaluate
  acc_train = net.score(train_x, train_y)
  print("\nAccuracy on train = %0.4f " % acc_train)
  acc_test = net.score(test_x, test_y)
  print("Accuracy on test = %0.4f " % acc_test)

  # from sklearn.metrics import confusion_matrix
  # y_predicteds = net.predict(test_x)
  # cm = confusion_matrix(test_y, y_predicteds) 
  # print("\nConfusion matrix raw: ")
  # print(cm)
  # show_confusion(cm)  # with formatted labels

  # 5. use model
  print("\nPredict for: M 35 Nebraska $55K ")
  X = np.array([[-1, 0.35, 0,1,0, 0.5500]],
    dtype=np.float32)

  probs = net.predict_proba(X)
  print("\nPrediction pseudo-probs: ")
  print(probs)

  politic = net.predict(X)  # 0,1,2
  lbls = ["conservative", "moderate", "liberal"]
  print("\nPredicted class: ")
  print(lbls[politic[0]])

  # 6. TODO: save model using pickle
  # import pickle
  # print("Saving trained network ")
  # path = ".\\Models\\network.sav"
  # pickle.dump(model, open(path, "wb"))

  # load and use saved model
  # X = np.array([[-1, 0.35, 0,1,0, 0.5500]],
  #   dtype=np.float32)
  # with open(path, 'rb') as f:
  #   loaded_model = pickle.load(f)
  # pa = loaded_model.predict_proba(X)
  # print(pa)

  print("\nEnd scikit neural network demo ")

if __name__ == "__main__":
  main()

Training data. Replace commas with tabs or modify program.

# people_train.txt
# sex (M=-1, F=1) , age (div 100),  state
# (michigan=100, nebraska=010, oklahoma=001) 
# income (div $100,000)
# politics (con=0, mod=1, lib=2)
#
 1,0.24,1,0,0,0.2950,2
-1,0.39,0,0,1,0.5120,1
 1,0.63,0,1,0,0.7580,0
-1,0.36,1,0,0,0.4450,1
 1,0.27,0,1,0,0.2860,2
 1,0.50,0,1,0,0.5650,1
 1,0.50,0,0,1,0.5500,1
-1,0.19,0,0,1,0.3270,0
 1,0.22,0,1,0,0.2770,1
-1,0.39,0,0,1,0.4710,2
 1,0.34,1,0,0,0.3940,1
-1,0.22,1,0,0,0.3350,0
1,0.35,0,0,1,0.3520,2
-1,0.33,0,1,0,0.4640,1
1,0.45,0,1,0,0.5410,1
1,0.42,0,1,0,0.5070,1
-1,0.33,0,1,0,0.4680,1
1,0.25,0,0,1,0.3000,1
-1,0.31,0,1,0,0.4640,0
1,0.27,1,0,0,0.3250,2
1,0.48,1,0,0,0.5400,1
-1,0.64,0,1,0,0.7130,2
1,0.61,0,1,0,0.7240,0
1,0.54,0,0,1,0.6100,0
1,0.29,1,0,0,0.3630,0
1,0.50,0,0,1,0.5500,1
1,0.55,0,0,1,0.6250,0
1,0.40,1,0,0,0.5240,0
1,0.22,1,0,0,0.2360,2
1,0.68,0,1,0,0.7840,0
-1,0.60,1,0,0,0.7170,2
-1,0.34,0,0,1,0.4650,1
-1,0.25,0,0,1,0.3710,0
-1,0.31,0,1,0,0.4890,1
1,0.43,0,0,1,0.4800,1
1,0.58,0,1,0,0.6540,2
-1,0.55,0,1,0,0.6070,2
-1,0.43,0,1,0,0.5110,1
-1,0.43,0,0,1,0.5320,1
-1,0.21,1,0,0,0.3720,0
1,0.55,0,0,1,0.6460,0
1,0.64,0,1,0,0.7480,0
-1,0.41,1,0,0,0.5880,1
1,0.64,0,0,1,0.7270,0
-1,0.56,0,0,1,0.6660,2
1,0.31,0,0,1,0.3600,1
-1,0.65,0,0,1,0.7010,2
1,0.55,0,0,1,0.6430,0
-1,0.25,1,0,0,0.4030,0
1,0.46,0,0,1,0.5100,1
-1,0.36,1,0,0,0.5350,0
1,0.52,0,1,0,0.5810,1
1,0.61,0,0,1,0.6790,0
1,0.57,0,0,1,0.6570,0
-1,0.46,0,1,0,0.5260,1
-1,0.62,1,0,0,0.6680,2
1,0.55,0,0,1,0.6270,0
-1,0.22,0,0,1,0.2770,1
-1,0.50,1,0,0,0.6290,0
-1,0.32,0,1,0,0.4180,1
-1,0.21,0,0,1,0.3560,0
1,0.44,0,1,0,0.5200,1
1,0.46,0,1,0,0.5170,1
1,0.62,0,1,0,0.6970,0
1,0.57,0,1,0,0.6640,0
-1,0.67,0,0,1,0.7580,2
1,0.29,1,0,0,0.3430,2
1,0.53,1,0,0,0.6010,0
-1,0.44,1,0,0,0.5480,1
1,0.46,0,1,0,0.5230,1
-1,0.20,0,1,0,0.3010,1
-1,0.38,1,0,0,0.5350,1
1,0.50,0,1,0,0.5860,1
1,0.33,0,1,0,0.4250,1
-1,0.33,0,1,0,0.3930,1
1,0.26,0,1,0,0.4040,0
1,0.58,1,0,0,0.7070,0
1,0.43,0,0,1,0.4800,1
-1,0.46,1,0,0,0.6440,0
1,0.60,1,0,0,0.7170,0
-1,0.42,1,0,0,0.4890,1
-1,0.56,0,0,1,0.5640,2
-1,0.62,0,1,0,0.6630,2
-1,0.50,1,0,0,0.6480,1
1,0.47,0,0,1,0.5200,1
-1,0.67,0,1,0,0.8040,2
-1,0.40,0,0,1,0.5040,1
1,0.42,0,1,0,0.4840,1
1,0.64,1,0,0,0.7200,0
-1,0.47,1,0,0,0.5870,2
1,0.45,0,1,0,0.5280,1
-1,0.25,0,0,1,0.4090,0
1,0.38,1,0,0,0.4840,0
1,0.55,0,0,1,0.6000,1
-1,0.44,1,0,0,0.6060,1
1,0.33,1,0,0,0.4100,1
1,0.34,0,0,1,0.3900,1
1,0.27,0,1,0,0.3370,2
1,0.32,0,1,0,0.4070,1
1,0.42,0,0,1,0.4700,1
-1,0.24,0,0,1,0.4030,0
1,0.42,0,1,0,0.5030,1
1,0.25,0,0,1,0.2800,2
1,0.51,0,1,0,0.5800,1
-1,0.55,0,1,0,0.6350,2
1,0.44,1,0,0,0.4780,2
-1,0.18,1,0,0,0.3980,0
-1,0.67,0,1,0,0.7160,2
1,0.45,0,0,1,0.5000,1
1,0.48,1,0,0,0.5580,1
-1,0.25,0,1,0,0.3900,1
-1,0.67,1,0,0,0.7830,1
1,0.37,0,0,1,0.4200,1
-1,0.32,1,0,0,0.4270,1
1,0.48,1,0,0,0.5700,1
-1,0.66,0,0,1,0.7500,2
1,0.61,1,0,0,0.7000,0
-1,0.58,0,0,1,0.6890,1
1,0.19,1,0,0,0.2400,2
1,0.38,0,0,1,0.4300,1
-1,0.27,1,0,0,0.3640,1
1,0.42,1,0,0,0.4800,1
1,0.60,1,0,0,0.7130,0
-1,0.27,0,0,1,0.3480,0
1,0.29,0,1,0,0.3710,0
-1,0.43,1,0,0,0.5670,1
1,0.48,1,0,0,0.5670,1
1,0.27,0,0,1,0.2940,2
-1,0.44,1,0,0,0.5520,0
1,0.23,0,1,0,0.2630,2
-1,0.36,0,1,0,0.5300,2
1,0.64,0,0,1,0.7250,0
1,0.29,0,0,1,0.3000,2
-1,0.33,1,0,0,0.4930,1
-1,0.66,0,1,0,0.7500,2
-1,0.21,0,0,1,0.3430,0
1,0.27,1,0,0,0.3270,2
1,0.29,1,0,0,0.3180,2
-1,0.31,1,0,0,0.4860,1
1,0.36,0,0,1,0.4100,1
1,0.49,0,1,0,0.5570,1
-1,0.28,1,0,0,0.3840,0
-1,0.43,0,0,1,0.5660,1
-1,0.46,0,1,0,0.5880,1
1,0.57,1,0,0,0.6980,0
-1,0.52,0,0,1,0.5940,1
-1,0.31,0,0,1,0.4350,1
-1,0.55,1,0,0,0.6200,2
1,0.50,1,0,0,0.5640,1
1,0.48,0,1,0,0.5590,1
-1,0.22,0,0,1,0.3450,0
1,0.59,0,0,1,0.6670,0
1,0.34,1,0,0,0.4280,2
-1,0.64,1,0,0,0.7720,2
1,0.29,0,0,1,0.3350,2
-1,0.34,0,1,0,0.4320,1
-1,0.61,1,0,0,0.7500,2
1,0.64,0,0,1,0.7110,0
-1,0.29,1,0,0,0.4130,0
1,0.63,0,1,0,0.7060,0
-1,0.29,0,1,0,0.4000,0
-1,0.51,1,0,0,0.6270,1
-1,0.24,0,0,1,0.3770,0
1,0.48,0,1,0,0.5750,1
1,0.18,1,0,0,0.2740,0
1,0.18,1,0,0,0.2030,2
1,0.33,0,1,0,0.3820,2
-1,0.20,0,0,1,0.3480,0
1,0.29,0,0,1,0.3300,2
-1,0.44,0,0,1,0.6300,0
-1,0.65,0,0,1,0.8180,0
-1,0.56,1,0,0,0.6370,2
-1,0.52,0,0,1,0.5840,1
-1,0.29,0,1,0,0.4860,0
-1,0.47,0,1,0,0.5890,1
1,0.68,1,0,0,0.7260,2
1,0.31,0,0,1,0.3600,1
1,0.61,0,1,0,0.6250,2
1,0.19,0,1,0,0.2150,2
1,0.38,0,0,1,0.4300,1
-1,0.26,1,0,0,0.4230,0
1,0.61,0,1,0,0.6740,0
1,0.40,1,0,0,0.4650,1
-1,0.49,1,0,0,0.6520,1
1,0.56,1,0,0,0.6750,0
-1,0.48,0,1,0,0.6600,1
1,0.52,1,0,0,0.5630,2
-1,0.18,1,0,0,0.2980,0
-1,0.56,0,0,1,0.5930,2
-1,0.52,0,1,0,0.6440,1
-1,0.18,0,1,0,0.2860,1
-1,0.58,1,0,0,0.6620,2
-1,0.39,0,1,0,0.5510,1
-1,0.46,1,0,0,0.6290,1
-1,0.40,0,1,0,0.4620,1
-1,0.60,1,0,0,0.7270,2
1,0.36,0,1,0,0.4070,2
1,0.44,1,0,0,0.5230,1
1,0.28,1,0,0,0.3130,2
1,0.54,0,0,1,0.6260,0

Test data:

-1,0.51,1,0,0,0.6120,1
-1,0.32,0,1,0,0.4610,1
1,0.55,1,0,0,0.6270,0
1,0.25,0,0,1,0.2620,2
1,0.33,0,0,1,0.3730,2
-1,0.29,0,1,0,0.4620,0
1,0.65,1,0,0,0.7270,0
-1,0.43,0,1,0,0.5140,1
-1,0.54,0,1,0,0.6480,2
1,0.61,0,1,0,0.7270,0
1,0.52,0,1,0,0.6360,0
1,0.30,0,1,0,0.3350,2
1,0.29,1,0,0,0.3140,2
-1,0.47,0,0,1,0.5940,1
1,0.39,0,1,0,0.4780,1
1,0.47,0,0,1,0.5200,1
-1,0.49,1,0,0,0.5860,1
-1,0.63,0,0,1,0.6740,2
-1,0.30,1,0,0,0.3920,0
-1,0.61,0,0,1,0.6960,2
-1,0.47,0,0,1,0.5870,1
1,0.30,0,0,1,0.3450,2
-1,0.51,0,0,1,0.5800,1
-1,0.24,1,0,0,0.3880,1
-1,0.49,1,0,0,0.6450,1
1,0.66,0,0,1,0.7450,0
-1,0.65,1,0,0,0.7690,0
-1,0.46,0,1,0,0.5800,0
-1,0.45,0,0,1,0.5180,1
-1,0.47,1,0,0,0.6360,0
-1,0.29,1,0,0,0.4480,0
-1,0.57,0,0,1,0.6930,2
-1,0.20,1,0,0,0.2870,2
-1,0.35,1,0,0,0.4340,1
-1,0.61,0,0,1,0.6700,2
-1,0.31,0,0,1,0.3730,1
1,0.18,1,0,0,0.2080,2
1,0.26,0,0,1,0.2920,2
-1,0.28,1,0,0,0.3640,2
-1,0.59,0,0,1,0.6940,2
Posted in Scikit | Leave a comment

Chess as a Metaphor for Life

On a recent weekend, I played in a local chess tournament. I played five games and all were quite interesting. One of the reasons that chess is sometimes encouraged for school children is that chess can teach concentration, calculation, memorization, and be a metaphor for valuable life lessons.


In round 1, I had the white pieces and played the ultra-aggressive Halloween Gambit where I sacrificed a knight on the fourth move. In the diagram, Black has just played 9..h6 but I responded with 10. Nd6 check followed by 11. Nxf7 check and 12. Nxh8 and so I gained a decisive material advantage. But my opponent unleashed a strong attack on my king and after several close calls, I barely held on to win. Moral: When things are going your way, don’t get complacent.


In round 2, I had the black pieces and played the French Defense. By move 35 we reached a complicated position where White has a passed pawn on a4 that’s ready to march forward but I had maneuvered to try and trap white’s queen by playing Ra8. White has just played Bh3 so he can move his queen to d7 and which also attacks my rook on c8. I had tunnel vision on the queenside and played 36..f5 but White wriggled away with 37. Qa5. Instead I could have changed my plan and won easily by playing 36..Rh5 and if White takes my rook, I have a nice 37..Rxh2 mate. Moral: When a situation changes, don’t hesitate to change your plans.


In round 3, I had the black pieces and played the Sicilian Defense. The game was balanced until move 43 when White played 43. Rc6. I could have played 43..Rxc6 and then gone after the resulting pawn on c6 but I was worried that White could sneak into my position and go after my pawn on h5 and run the pawn on h4 to become a queen. What I missed was that after White played 44. Rxe6 check, my king is one square farther away from the pawn on b5. So, I barely held a draw after capturing the pawn and then scrambling back to the king side. Moral: Try to predict future circumstances and try to plan as best you can for the future.


In round 4, I had the white pieces and played the wild Halloween Gambit again (as in round 1). By move 16 the position was extremely complicated. In the diagram, it’s my move and I could have gotten the upper hand by 17. Bxe8 but instead I played 17. Nxe4 and a couple of moves later I was in an endgame, down material, with a much worse position. I fought hard and eventually won some material back and got an even position. Moral: Opportunity knocks rarely, so keep an eye out for it and grab it when you have a chance.


In round 5, I had the black pieces and played the French Defense. My opponent played the Exchange Variation and the game was even until move 43 when White played 43. d5 check. Here I could have simply played 43..cxd5 44. cxd5 Kxd5 with a probable draw but chances for me to win. Instead, I over-thought the position and set a trap by playing 43..Kd6 but White saw through my trap. The game remained tense and ended in a nail-biting draw. Moral: Don’t get fancy and over-think a situation.




When I learned to play chess, my three main chess heroes were J.R. Capablanca (1888-1942, world champion 1921-1927); Max Euwe (1901-1981, world champion 1935-1937); and Reuben Fine (1914-1993, a strong player who might have been world champion except for World War II). I had to learn from books — especially the three shown. Internet videos are now a much better way to learn chess.


Posted in Miscellaneous | Leave a comment

The House Voting Dataset Example Using PyTorch

A relatively well-known machine learning dataset is the House Voting data, also called the Congressional Voting Records Dataset. The raw data looks like:

republican,n,y,n,y,y,y,n,n,n,y,?,y,y,y,n,y
democrat,n,y,y,n,y,y,n,n,n,n,n,n,y,y,y,y
democrat,n,y,n,y,y,y,n,n,n,n,n,n,?,y,y,y
republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,y
. . .

There are 435 data items, corresponding to each of the 435 members of the U.S. House of Representatives in the year 1984. The first column is the member’s political party, Democrat or Republican. The next 16 values correspond to a vote on a particular bill. The possible values are ‘n’ (no), ‘y’ (yes), or ‘?’ (abstain).

The goal is to predict a person’s political party from their 16 votes.

The 16 votes/bills are related to “handicapped infants”, “water project”, and so on. The raw data and details can be found at: https://archive.ics.uci.edu/ml/datasets/congressional+voting+records.

To prepare the data, first I removed all the items that had one or more ‘?’ values, which left me with 232 items from the original 435. I encoded democrat as 0, republican as 1, and ‘n’ as 0, ‘y’ as 1. The result looks like:

0,0,1,1,0,1,1,0,0,0,0,0,0,1,1,1,1
1,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,1
. . .

I non-randomly split the data into the first 200 items for training data (86%), and the last 32 items for test data (the remaining 14%).

I implemented a 16-(10-10)-1 network architecture with tanh hidden activation, sigmoid output activation, and Xavier/Glorot weight initialization. For training, I used binary cross entropy loss with stochastic gradient descent optimization and a batch size of 10.

The trained model scored 99.50% accuracy on the training data (199 of 200 correct) and 90.62% accuracy on the test data (29 of 32 correct).

The House Voting Dataset is somewhat unusual because all the predictor values are binary/Boolean. A classical machine learning alternative to a neural network classifier is a Bernoulli naive Bayes classifier, which is a specific type of naive Bayes.



I have very little interest in politics, in part because of a disturbing amount of corruption and lack of personal integrity compared to the fields of science and mathematics. Here are just three of several members of the U.S. House of Representatives who were arrested and convicted of crimes during the administration of former politician B. Obama.

Left: Jesse Jackson Jr. (Democrat, IL) was convicted of wire and mail fraud and sentenced to two-and-one-half years in jail.

Center: Corrine Brown (Democrat, FL) was convicted on 18 felony counts of wire and tax fraud, conspiracy, lying to federal investigators, and corruption and was sentenced to five years in jail.

Right: Chaka Fattah (Democrat, PA) was convicted on 23 counts of racketeering, fraud, and corruption and was sentenced to 10 years in jail.


Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols.

# voting.py
# House Voting binary classification
# PyTorch 1.12.1-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10/11 

import numpy as np
import torch as T
device = T.device('cpu')  # apply to Tensor or Module

class VotingDataset(T.utils.data.Dataset):
  # party (0=dem, 1=rep) - 16 votes (0=no, 1=yes)
  # 0,0,1,1,0,1,1,0,0,0,0,0,0,1,1,1,1
  # 1,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,1
  # . . .
 
  def __init__(self, src_file):
    all_data = np.loadtxt(src_file, usecols=range(0,17),
      delimiter=",", comments="#", dtype=np.float32) 

    self.x_data = T.tensor(all_data[:,1:17],
      dtype=T.float32).to(device)
    self.y_data = T.tensor(all_data[:,0],
      dtype=T.float32).to(device)  # float32 label

    self.y_data = self.y_data.reshape(-1,1)  # 2-D required

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    votes = self.x_data[idx,:]  # idx row, all 16 cols
    party = self.y_data[idx,:]  # idx row, the only col
    return votes, party  # as a Tuple

# ---------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(16, 10)  # 16-(10-10)-1
    self.hid2 = T.nn.Linear(10, 10)
    self.oupt = T.nn.Linear(10, 1)

    T.nn.init.xavier_uniform_(self.hid1.weight) 
    T.nn.init.zeros_(self.hid1.bias)
    T.nn.init.xavier_uniform_(self.hid2.weight) 
    T.nn.init.zeros_(self.hid2.bias)
    T.nn.init.xavier_uniform_(self.oupt.weight) 
    T.nn.init.zeros_(self.oupt.bias)

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.tanh(self.hid2(z))
    z = T.sigmoid(self.oupt(z))  # for BCELoss()
    return z

# ---------------------------------------------------------

def metrics(model, ds, thresh=0.5):
  # note: N = total number of items = TP + FP + TN + FN
  # accuracy  = (TP + TN)  / N
  # precision = TP / (TP + FP)
  # recall    = TP / (TP + FN)
  # F1        = 2 / [(1 / precision) + (1 / recall)]

  tp = 0; tn = 0; fp = 0; fn = 0
  for i in range(len(ds)):
    inpts = ds[i][0]         # dictionary style
    target = ds[i][1]        # float32  [0.0] or [1.0]
    target = target.int()    # int 0 or 1
    with T.no_grad():
      p = model(inpts)       # between 0.0 and 1.0

    # avoid 'target == 1.0'
    if target == 1 and p "gte" thresh:    # TP
      tp += 1
    elif target == 1 and p "lt" thresh:   # falsely pred neg 
      fn += 1
    elif target == 0 and p "lt" thresh:   # TN
      tn += 1
    elif target == 0 and p "gte" thresh:  # falsely pred pos
      fp += 1

  N = tp + fp + tn + fn
  if N != len(ds):
    print("FATAL LOGIC ERROR in metrics()")

  accuracy = (tp + tn) / (N * 1.0)
  precision = (1.0 * tp) / (tp + fp)  # tp + fp != 0
  recall = (1.0 * tp) / (tp + fn)     # tp + fn != 0
  f1 = 2.0 / ((1.0 / precision) + (1.0 / recall))
  return (accuracy, precision, recall, f1)  # as a Tuple

# ---------------------------------------------------------

def main():
  # 0. get started
  print("\nHouse Voting Dataset using PyTorch ")
  T.manual_seed(1)
  np.random.seed(1)

  # 1. create Dataset and DataLoader objects
  print("\nCreating Voting train and test Datasets ")

  train_file = ".\\Data\\votes_train.txt"
  test_file = ".\\Data\\votes_test.txt"

  train_ds = VotingDataset(train_file)  # 200 rows
  test_ds = VotingDataset(test_file)    # 32 rows

  bat_size = 10
  train_ldr = T.utils.data.DataLoader(train_ds,
    batch_size=bat_size, shuffle=True)

  # 2. create neural network
  print("\nCreating 16-(10-10)-1 binary NN classifier \n")
  net = Net().to(device)
  net.train()  # set training mode

  # 3. train network
  lrn_rate = 0.01
  loss_func = T.nn.BCELoss()  # binary cross entropy
  # loss_func = T.nn.MSELoss()
  optimizer = T.optim.SGD(net.parameters(),
    lr=lrn_rate)
  max_epochs = 500
  ep_log_interval = 100

  print("Loss function: " + str(loss_func))
  print("Optimizer: " + str(optimizer.__class__.__name__))
  print("Learn rate: " + "%0.3f" % lrn_rate)
  print("Batch size: " + str(bat_size))
  print("Max epochs: " + str(max_epochs))

  print("\nStarting training")
  for epoch in range(0, max_epochs):
    epoch_loss = 0.0            # for one full epoch
    for (batch_idx, batch) in enumerate(train_ldr):
      X = batch[0]             # [bs,8]  inputs
      Y = batch[1]             # [bs,1]  targets
      oupt = net(X)            # [bs,1]  computeds 

      loss_val = loss_func(oupt, Y)   # a tensor
      epoch_loss += loss_val.item()  # accumulate
      optimizer.zero_grad() # reset all gradients
      loss_val.backward()   # compute new gradients
      optimizer.step()      # update all weights

    if epoch % ep_log_interval == 0:
      print("epoch = %4d   loss = %8.4f" % \
        (epoch, epoch_loss))
  print("Done ")

# ---------------------------------------------------------

  # 4. evaluate model
  net.eval()
  metrics_train = metrics(net, train_ds, thresh=0.5)
  print("\nMetrics for train data: ")
  print("accuracy  = %0.4f " % metrics_train[0])
  print("precision = %0.4f " % metrics_train[1])
  print("recall    = %0.4f " % metrics_train[2])
  print("F1        = %0.4f " % metrics_train[3])

  metrics_test = metrics(net, test_ds, thresh=0.5)
  print("\nMetrics for test data: ")
  print("accuracy  = %0.4f " % metrics_test[0])
  print("precision = %0.4f " % metrics_test[1])
  print("recall    = %0.4f " % metrics_test[2])
  print("F1        = %0.4f " % metrics_test[3])

  # 5. save model
  print("\nSaving trained model state_dict ")
  net.eval()
  # path = ".\\Models\\voting_model.pt"
  # T.save(net.state_dict(), path)

  # 6. make a prediction 
  X = np.array([[0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1]])
  X = T.tensor(X, dtype=T.float32).to(device)
  print("\nSetting votes: ")
  print(X) 

  net.eval()
  with T.no_grad():
    oupt = net(X)    # a Tensor
  pred_prob = oupt.item()  # scalar, [0.0, 1.0]
  print("Computed output: ", end="")
  print("%0.4f" % pred_prob)

  if pred_prob "lt" 0.5:
    print("Prediction = Democrat")
  else:
    print("Prediction = Republican")

  print("\nEnd House Voting demo ")

if __name__== "__main__":
  main()

Training data:

# votes_train.txt
# 1st col = democrat (0), republican (1)
# next 16 cols = no vote (0) or yes (1)
# 200 rows (32 in test)
#
0,0,1,1,0,1,1,0,0,0,0,0,0,1,1,1,1
1,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,1
0,1,1,1,0,0,0,1,1,1,0,1,0,0,0,1,1
0,1,1,1,0,0,0,1,1,1,0,0,0,0,0,1,1
0,1,0,1,0,0,0,1,1,1,1,0,0,0,0,1,1
0,1,0,1,0,0,0,1,1,1,0,1,0,0,0,1,1
0,1,1,1,0,0,0,1,1,1,0,1,0,0,0,1,1
1,1,0,0,1,1,0,1,1,1,0,0,1,1,1,0,1
0,1,1,1,0,0,0,1,1,1,0,1,0,0,0,1,1
1,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,0
0,1,1,1,0,0,0,1,1,1,1,0,0,1,0,1,1
1,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,1
0,1,1,1,0,0,0,1,1,1,0,0,0,0,0,1,1
1,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,0
1,1,1,0,1,1,1,0,0,0,0,0,0,1,1,0,1
1,0,1,0,1,1,1,0,0,0,1,0,1,1,1,0,0
0,1,0,1,0,0,0,1,1,1,1,1,0,1,0,1,1
0,1,0,1,0,0,0,1,1,1,0,0,0,0,0,0,1
0,1,0,1,0,0,0,1,1,1,0,0,0,0,0,1,1
0,1,1,1,0,0,0,1,1,1,0,0,0,0,0,0,1
0,1,1,1,0,0,0,1,1,0,0,0,0,0,1,0,1
0,1,1,1,0,0,0,1,1,1,0,1,0,0,0,1,1
1,1,1,0,1,1,1,0,0,0,1,0,1,1,1,0,0
1,0,1,0,1,1,1,0,0,0,1,1,1,1,1,0,0
1,0,1,0,1,1,1,0,0,0,1,1,1,1,1,0,1
1,0,1,0,1,1,1,0,0,0,1,0,1,1,1,0,1
1,0,1,0,1,1,1,0,0,0,1,0,1,1,1,0,1
1,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,0
0,1,1,1,0,0,0,1,1,1,0,1,0,0,0,0,1
1,1,1,0,1,1,1,1,0,0,0,0,1,1,1,0,1
1,0,1,0,1,1,1,1,0,0,0,1,1,1,1,0,1
1,0,1,0,1,1,1,0,0,0,1,0,1,1,1,0,0
0,1,1,1,0,0,0,1,1,1,0,0,0,0,0,1,1
1,1,1,1,1,0,0,1,1,1,1,1,0,0,1,0,1
1,1,0,1,1,1,0,1,0,1,1,0,0,1,1,0,1
0,1,0,1,0,0,1,1,1,1,1,1,0,0,1,1,1
0,0,1,1,1,1,1,0,0,0,1,1,0,1,1,0,0
0,0,1,1,1,1,1,0,1,1,1,1,1,1,1,0,1
0,1,1,1,0,1,1,0,0,0,1,1,0,1,1,0,1
1,0,0,0,1,1,0,0,0,0,1,0,1,1,1,0,0
1,0,0,0,1,1,1,0,0,0,1,0,1,1,1,0,1
1,0,0,0,1,1,1,0,0,0,0,0,1,1,1,0,0
0,0,0,1,0,1,1,0,0,0,1,1,1,1,1,0,1
1,0,0,0,1,1,1,0,0,0,1,0,1,1,1,0,0
1,0,0,0,1,1,1,0,0,0,0,0,1,1,1,0,0
0,0,1,1,0,1,1,1,0,1,1,1,0,1,1,0,1
0,1,0,1,0,0,0,1,1,1,1,0,0,0,0,1,1
0,1,0,1,0,0,0,1,1,1,1,1,0,0,0,1,1
0,1,0,1,0,0,0,1,0,1,1,1,0,0,0,1,1
0,1,0,1,0,1,1,0,0,0,0,0,0,0,0,0,1
0,1,0,0,0,1,1,1,0,0,1,1,0,0,1,0,1
0,1,1,1,0,0,1,1,1,1,1,0,0,0,0,0,1
0,1,0,0,0,1,1,0,0,0,0,1,1,0,1,0,1
0,1,0,1,0,1,1,1,0,0,0,1,0,0,1,0,1
0,1,1,1,0,0,0,0,1,1,0,1,0,0,0,1,1
1,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,1
0,0,0,1,0,0,0,1,1,1,1,0,0,0,0,1,1
0,1,0,1,0,0,0,1,1,1,0,1,0,0,0,1,1
1,1,1,1,1,1,0,1,0,0,0,0,1,1,1,0,1
0,0,1,1,0,0,0,0,1,1,1,1,0,0,0,1,1
1,0,0,0,1,1,1,0,0,0,0,0,1,1,1,0,0
1,0,0,0,1,1,1,0,0,0,1,0,1,0,1,0,1
0,0,0,1,0,0,1,0,1,1,1,0,0,0,1,0,1
1,0,0,0,1,1,1,0,0,0,1,0,1,1,1,0,1
1,0,0,0,1,1,1,0,0,0,1,0,1,1,1,0,0
1,0,1,0,1,1,1,0,0,0,1,1,1,1,0,0,1
0,0,0,1,0,0,1,1,1,1,1,0,0,0,1,0,1
0,1,0,1,0,0,1,1,1,1,0,0,0,0,0,1,1
1,0,0,0,1,0,0,1,1,1,1,0,0,1,1,0,1
1,0,0,0,1,1,1,1,1,1,1,0,1,1,1,0,1
1,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,1
0,0,0,0,0,0,0,1,1,1,1,0,1,1,1,1,1
1,0,1,0,1,1,1,0,0,0,1,1,1,1,1,0,1
0,0,0,1,0,0,0,1,1,1,1,0,0,1,0,1,1
1,1,1,0,1,1,1,0,0,0,1,0,1,1,1,0,1
0,0,1,1,0,0,1,0,1,1,1,1,0,1,0,1,1
0,0,0,1,0,0,1,1,1,1,1,1,0,1,1,0,1
1,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,0
1,1,1,0,1,1,1,1,0,0,0,0,1,1,1,0,0
1,0,1,0,1,1,1,0,0,0,1,0,1,1,1,0,0
0,0,1,0,0,1,1,0,0,0,0,0,1,1,1,1,1
0,0,0,0,0,1,1,1,0,0,0,0,1,1,1,0,1
0,0,1,1,0,1,1,1,0,0,0,1,1,1,1,0,1
1,0,1,0,1,1,1,1,0,0,0,0,1,1,1,0,1
1,1,0,1,1,1,1,1,1,0,1,0,1,0,1,1,1
1,1,0,1,1,1,1,1,1,0,1,1,1,0,1,1,1
0,1,0,1,0,0,0,1,1,1,1,1,0,0,1,0,1
0,0,0,0,0,1,1,0,0,0,1,1,1,1,1,0,1
0,0,1,1,0,0,0,1,1,1,1,0,0,0,0,1,1
1,0,0,1,1,0,0,1,1,1,1,0,0,0,1,1,1
0,1,0,1,0,0,0,1,1,1,1,0,0,0,0,1,1
0,0,0,1,0,0,0,1,1,1,1,1,0,0,0,1,1
0,0,0,1,0,0,0,1,1,1,1,1,0,0,0,1,1
0,0,1,1,0,0,0,1,1,1,1,1,0,0,0,1,1
0,1,0,1,0,0,0,1,1,1,0,0,0,0,0,1,1
0,0,0,0,0,0,1,1,1,1,0,1,0,0,1,1,1
0,0,0,1,0,0,0,1,1,1,0,0,0,0,0,1,1
0,0,0,1,0,0,0,1,1,1,0,0,0,0,1,1,1
0,1,1,1,0,0,0,1,1,1,0,0,0,0,0,1,1
0,1,0,1,0,0,1,1,1,1,1,1,0,0,0,1,1
0,1,0,1,0,0,0,1,1,1,1,0,0,0,0,1,1
1,0,0,1,1,1,1,1,0,0,0,0,1,1,1,0,1
0,0,0,1,0,0,1,1,1,1,1,0,1,0,0,0,1
1,0,0,0,1,1,1,0,0,0,1,0,1,0,1,0,1
0,1,1,1,0,0,0,1,1,1,1,1,0,0,0,0,1
0,0,0,1,0,0,1,1,1,1,0,0,0,0,0,1,1
1,0,1,0,1,1,1,0,0,0,1,0,1,1,1,0,1
0,0,0,1,0,0,0,1,1,1,0,1,0,0,0,1,1
0,0,1,1,0,0,1,0,1,1,0,1,0,1,0,1,1
1,1,1,0,1,1,1,0,0,0,1,0,1,1,1,0,1
1,0,1,0,1,1,1,0,0,0,1,0,1,1,1,0,0
0,0,1,1,0,0,0,0,1,1,0,1,0,0,1,1,1
1,0,0,0,1,1,0,0,0,0,0,0,1,1,1,0,1
0,0,0,1,0,0,1,1,1,1,0,1,0,0,1,1,1
1,0,1,1,1,1,1,1,0,1,1,0,1,1,1,0,1
1,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,1
1,0,1,0,1,1,1,0,0,1,1,0,1,1,1,0,1
1,0,1,0,1,1,1,0,0,0,1,0,1,1,1,0,1
1,0,0,0,1,1,1,0,0,0,1,0,1,0,1,0,1
0,0,0,1,0,0,0,1,1,1,0,0,0,0,0,1,1
0,1,0,1,0,0,1,1,1,0,0,0,1,1,0,0,1
1,0,0,0,1,1,1,1,0,0,1,0,0,0,1,1,1
0,1,0,1,0,0,0,1,1,1,1,1,0,0,1,1,1
0,1,0,1,0,0,0,0,1,1,1,0,0,0,0,1,1
0,1,0,1,0,0,0,1,1,1,1,1,0,0,0,1,1
1,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,0
1,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,0
0,1,1,1,0,0,1,1,1,1,0,0,0,0,0,1,1
1,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,1
0,1,0,1,0,0,0,1,1,1,1,0,0,0,0,0,1
0,1,0,1,0,0,0,1,1,1,1,0,0,0,1,1,1
1,0,0,0,1,1,0,0,0,0,0,0,1,0,1,0,0
0,0,0,1,0,0,0,1,1,1,0,1,0,0,0,1,1
0,1,0,1,0,0,0,1,1,1,0,0,0,0,0,0,1
0,1,0,1,0,0,0,1,1,1,1,0,0,0,0,0,1
0,1,0,1,0,0,0,1,1,1,1,0,0,0,0,0,1
1,0,0,0,1,1,1,0,0,0,1,0,1,0,1,0,1
1,1,0,0,0,0,0,1,1,1,1,0,0,0,1,0,1
0,1,0,1,0,0,0,1,1,1,0,0,0,0,0,0,1
0,1,1,1,0,0,0,1,1,1,0,0,0,0,0,1,1
1,1,0,0,1,1,0,1,0,0,1,0,0,0,1,1,1
1,0,0,0,1,1,1,0,0,0,0,0,1,1,1,1,0
1,0,0,1,1,1,1,1,1,0,1,0,0,0,1,0,1
1,0,0,0,1,1,1,0,0,0,0,0,1,1,1,0,1
1,0,0,0,1,1,1,0,0,0,1,0,1,1,1,0,0
0,0,0,1,0,0,0,1,1,1,1,0,0,0,1,0,1
1,0,0,0,1,1,1,0,0,0,0,0,1,1,1,0,1
0,0,0,1,0,0,1,1,1,1,1,1,0,0,0,1,1
0,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1
0,1,0,1,0,0,1,1,1,0,1,1,0,1,1,1,1
0,1,1,1,0,0,1,1,1,1,1,1,0,1,1,1,1
1,0,0,1,1,1,1,0,0,0,1,0,1,1,1,1,1
0,0,1,0,0,0,0,1,1,1,1,1,0,0,0,1,1
0,0,1,1,0,0,1,1,1,1,1,0,0,1,1,1,1
1,0,0,0,1,1,0,1,1,1,1,0,1,1,1,0,1
1,0,0,0,1,1,1,1,0,0,1,0,1,1,1,0,1
1,0,0,0,1,1,1,0,0,0,0,0,1,1,1,0,0
1,0,0,0,1,1,1,0,0,0,1,0,1,1,1,0,0
1,0,0,0,1,1,1,0,0,0,0,0,1,1,1,0,0
1,0,0,0,1,1,1,0,0,0,1,0,1,1,1,0,0
0,1,0,0,0,0,1,1,1,1,1,0,0,0,1,1,1
1,0,0,0,1,1,1,0,0,0,1,0,1,1,1,1,0
0,0,0,1,0,0,1,1,1,1,1,0,0,1,0,0,1
0,1,1,1,0,0,0,1,1,1,1,0,0,0,0,1,1
1,0,1,1,1,1,1,0,0,0,1,0,1,1,1,0,1
1,0,1,0,1,1,1,1,1,0,0,1,1,1,1,1,1
0,0,0,0,0,0,1,0,1,1,0,1,1,1,1,1,0
0,1,0,0,0,0,0,1,1,1,1,0,0,0,0,1,1
0,0,1,1,0,0,1,0,1,1,1,0,0,1,1,0,1
0,1,1,1,0,0,0,1,1,1,1,0,0,1,0,0,1
1,0,1,0,1,1,1,0,0,0,0,1,1,1,1,0,0
0,1,1,0,1,0,0,1,1,1,0,1,0,0,1,0,1
1,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,1
0,1,1,1,0,0,0,1,1,1,0,1,0,0,0,0,1
1,0,1,0,1,1,1,0,0,0,1,0,1,1,1,0,0
0,0,0,1,0,0,0,1,1,1,0,0,0,0,0,1,1
0,1,0,1,0,0,0,1,1,1,0,0,0,0,0,1,1
0,1,0,1,0,0,0,1,1,1,1,0,0,0,1,1,1
1,1,0,0,1,1,1,0,0,0,0,1,1,1,1,0,0
1,0,0,0,1,1,1,0,0,0,1,1,1,0,1,0,1
1,0,0,0,1,1,0,1,0,1,1,0,0,0,1,0,1
0,0,0,1,0,0,0,1,1,1,1,1,0,0,0,1,1
1,0,0,0,1,1,1,1,0,0,1,0,1,0,1,1,1
1,0,0,0,1,1,1,0,0,0,1,0,1,1,1,0,1
1,1,0,0,1,1,1,0,0,0,1,0,1,1,1,0,0
1,0,1,1,1,1,1,1,1,1,0,0,1,1,1,0,1
0,0,1,0,0,0,1,1,0,1,0,1,0,0,0,1,1
1,0,0,1,1,1,1,1,1,1,1,0,1,1,1,1,1
1,0,0,1,1,1,1,1,0,0,1,1,1,1,1,0,1
1,1,0,1,1,0,0,0,1,1,1,0,0,0,1,1,1
1,0,0,0,1,1,1,0,0,0,0,0,1,1,1,0,0
1,0,0,0,1,1,1,0,0,0,0,0,1,1,1,0,0
0,1,0,1,0,0,1,1,1,1,1,0,0,1,0,0,1
0,1,1,1,0,0,1,1,1,1,1,1,1,1,0,0,1
1,1,1,0,1,1,1,0,0,0,1,1,0,1,0,0,0
1,1,1,0,1,1,1,0,0,0,0,1,0,1,1,0,1
0,0,1,0,0,1,1,0,0,0,1,1,0,1,1,0,0
0,0,1,1,0,0,1,1,1,0,1,0,0,0,0,1,1
1,0,1,0,1,1,1,0,0,0,0,0,0,1,1,0,1
1,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,1

Test data:

0,0,1,0,1,1,1,0,0,0,0,1,1,0,1,0,0
1,0,0,0,1,1,1,0,0,0,0,0,1,1,1,0,1
1,0,0,0,1,1,1,0,0,0,0,0,1,1,1,0,1
0,1,1,1,0,1,1,0,1,1,1,1,0,0,0,0,1
0,1,1,1,1,1,1,0,0,0,0,1,1,1,1,0,1
0,1,1,0,0,1,1,0,0,0,0,1,1,1,1,1,0
0,1,1,0,0,0,0,0,1,1,0,1,0,0,0,1,0
1,1,1,0,1,1,1,0,0,0,0,1,1,1,1,0,1
0,1,1,1,0,1,1,0,1,0,0,1,0,1,0,1,1
0,0,1,1,0,1,1,0,1,0,0,0,0,0,0,0,1
1,0,1,0,1,1,1,0,0,0,1,1,1,1,1,0,0
1,1,1,0,1,1,1,0,0,0,1,0,1,1,1,0,1
1,0,0,0,1,1,1,0,0,0,0,0,1,1,1,0,1
0,1,0,1,0,1,1,0,0,1,1,0,0,1,1,0,1
0,0,0,0,1,1,1,0,0,0,0,1,1,1,1,0,0
1,0,0,0,1,1,1,0,0,0,0,0,1,1,1,0,0
1,0,0,0,1,1,1,0,0,0,0,1,1,1,1,0,1
0,1,0,1,0,0,1,1,1,1,1,1,0,0,0,0,1
1,0,0,0,1,1,1,0,0,0,1,0,1,1,1,0,1
0,1,1,1,0,0,0,1,1,1,0,0,0,0,0,0,1
1,1,1,0,1,1,1,0,0,0,1,0,0,1,1,0,1
0,1,1,1,0,0,0,1,1,1,1,1,0,1,0,0,1
0,1,1,1,0,0,0,1,1,0,1,0,0,0,0,0,1
0,1,1,1,0,0,0,1,1,1,0,0,0,0,0,0,1
1,1,1,1,1,1,1,1,1,0,1,0,0,1,1,0,1
0,0,1,1,0,1,1,1,1,0,0,1,0,1,0,1,1
0,0,0,1,0,0,1,1,1,1,0,1,0,0,0,1,1
0,0,1,1,0,0,1,1,1,1,0,1,0,0,1,1,1
0,1,0,1,0,0,0,1,1,1,1,0,0,0,0,1,1
1,0,0,0,1,1,1,1,1,0,1,0,1,1,1,0,1
1,0,0,1,1,1,1,0,0,1,1,0,1,1,1,0,1
0,0,0,1,0,0,0,1,1,1,1,0,0,0,0,0,1
Posted in PyTorch | Leave a comment

Simulating the NumPy loadtxt() Function in JavaScript

When I write Python/PyTorch code, when I load numeric data from a text file into memory, I usually use the NumPy loadtxt() function. For example:

all_xy = np.loadtxt(src_file, usecols=range(0,7),
  delimiter="\t", comments="#", dtype=np.float32)

I have an inexplicable liking for the JavaScript language. I’ve implemented neural networks completely from scratch using JavaScript. To load data into memory, I had written a JavaScript loadTxt() function that mimics the Python loadtxt() function. However, my JavaScript loadTxt() function didn’t handle comment lines in the source text file because doing so is surprisingly tricky.

So, one weekend evening I decided to enhance my JavaScript loadTxt() function to handle comment lines. Along the way, I ran into a common theme: I could use a relatively simple algorithm that is not efficient because it duplicates the data in memory, or I could use a more complex algorithm that is memory efficient.

The simple version is:

function loadTxt2(fn, delimit, usecols, comment) {
  // simple but doubles in-memory usage
  let all = FS.readFileSync(fn, "utf8");  // giant string
  all = all.trim();  // strip final crlf in file
  let lines = all.split("\n");  // array of lines

  let validLines = [];
  for (let i = 0; i < lines.length; ++i) {
    if (!lines[i].startsWith(comment))
      validLines.push(lines[i]);
  }
  
  let rows = validLines.length;
  let cols = usecols.length;
  let result = matMake(rows, cols, 0.0); 
  for (let i = 0; i < rows; ++i) {  // each line
    let tokens = validLines[i].split(delimit);
    for (let j = 0; j < cols; ++j) {
      result[i][j] = parseFloat(tokens[usecols[j]]);
    }
  }
  return result;
}

My test data file is:

// people_train_4.txt
// sex (M = -1, F = +1), age (div by 100)
// state (Michigan = 100, Nebraska = 010, Oklahoma = 001)
// income (div by $100,000)
// politics (conservative = 0, moderate = 1, liberal = 2)
//
 1	0.24	1	0	0	0.2950	2
-1	0.39	0	0	1	0.5120	1
 1	0.63	0	1	0	0.7580	0
-1	0.36	1	0	0	0.4450	1
// end data

The program to call the function is:

// test_loadTxt.js

let U = require("../../Utilities/utilities_lib.js");
let FS = require("fs");

// ----------------------------------------------------------

function main()
{
  console.log("\nBegin test loadTxt() with JavaScript ");

  // raw data looks like:   M   32   michigan  52,000.00  liberal
  // norm data looks like: -1  0.32   1 0 0     0.5250     2

  // memory inefficient but simple
  let trainX = U.loadTxt2(".\\Data\\people_train_4.txt", "\t",
    [0,1,2,3,4,5], "//");
  console.log("");
  U.matShow(trainX, 4, 12);

  // memory efficient but complicated
  trainX = U.loadTxt3(".\\Data\\people_train_4.txt", "\t",
    [0,1,2,3,4,5], "//");
  console.log("");
  U.matShow(trainX, 4, 12);

  console.log("\nEnd demo ");
}

main();

I put the loadTxt2() and loadTxt3() functions in a Utility library.

The more-efficient but less-simple version is:

function loadTxt3(fn, delimit, usecols, comment) {
  // efficient but complicated
  let all = FS.readFileSync(fn, "utf8");  // giant string
  all = all.trim();  // strip final crlf in file
  let lines = all.split("\n");  // array of lines

  // count number non-comment lines
  let nRows = 0;
  for (let i = 0; i < lines.length; ++i) {
    if (!lines[i].startsWith(comment))
      ++nRows;
  }
  nCols = usecols.length;
  let result = matMake(nRows, nCols, 0.0); 
 
  let r = 0;  // into lines
  let i = 0;  // into result[][]
  while (r < lines.length) {
    if (lines[r].startsWith(comment)) {
      ++r;  // next row
    }
    else {
      let tokens = lines[r].split(delimit);
      for (let j = 0; j < nCols; ++j) {
        result[i][j] = parseFloat(tokens[usecols[j]]);
      }
      ++r;
      ++i;
    }
  }

  return result;
}

Good fun.



There are several traditional tradeoff themes in computer science, such as performance vs. simplicity. Here are three examples of traditional Eurasian clothing that trade off attractive complexity vs. functional simplicity.


Posted in JavaScript | Leave a comment

A PyTorch Dataset Using the Pandas read_csv() Function

To train a PyTorch neural network, the most common approach is to read training data into a Dataset object, and then use a DataLoader object to serve the training data up in batches. When I implement a Dataset, I almost always use the NumPy loadtxt() function to read training data from file into memory. But it’s possible to use the Pandas read_csv() function instead. Bottom line: the Pandas approach isn’t especially useful because the Pandas data frame has to be converted to a NumPy matrix anyway.

I used one of my standard examples to code up a demo of NumPy loadtxt() vs Pandas read_csv() functions. The goal is to predict political leaning (conservative = 0, moderate = 1, liberal = 2) from sex, age, state of residence, and income. The data looks like:

 1  0.24  1  0  0  0.2950  2
-1  0.39  0  0  1  0.5120  1
 1  0.63  0  1  0  0.7580  0
-1  0.36  1  0  0  0.4450  1
 1  0.27  0  1  0  0.2860  2
. . .

The columns are sex (M = -1, F = +1), age divided by 100, state (Michigan = 100, Nebraska = 010, Oklahoma = 001), income divided by $100,000, and political leaning. The data is synthetic.

A standard NumPy loadtxt() version of a Dataset is:

import numpy as np
import pandas as pd  # not used this version

class PeopleDataset(T.utils.data.Dataset):
  def __init__(self, src_file):
    # numpy loadtxt() version
    all_xy = np.loadtxt(src_file, usecols=range(0,7),
      delimiter="\t", comments="#", dtype=np.float32)

    tmp_x = all_xy[:,0:6]   # cols [0,6) = [0,5]
    tmp_y = all_xy[:,6]     # 1-D

    self.x_data = T.tensor(tmp_x, 
      dtype=T.float32).to(device)
    self.y_data = T.tensor(tmp_y,
      dtype=T.int64).to(device)  # 1-D

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    preds = self.x_data[idx]
    trgts = self.y_data[idx] 
    return preds, trgts  # as a Tuple

A version using the Pandas read_csv() and the to_nump() method is:

class PeopleDataset(T.utils.data.Dataset):
  def __init__(self, src_file):
    # pandas version
    xy_frame = pd.read_csv(src_file, usecols=range(0,7),
      delimiter="\t", comment="#", dtype=np.float32)
    all_xy = xy_frame.to_numpy()

    # as above
. . . 

Instead of using the Pandas to_numpy() function, it’s possible to access the Pandas dataframe directly using the iloc property:

class PeopleDataset(T.utils.data.Dataset):
  def __init__(self, src_file):
    # pandas version
    xy_frame = pd.read_csv(src_file, usecols=range(0,7),
      delimiter="\t", comment="#", dtype=np.float32)
    all_xy = np.array(xy_frame.iloc[:,:])

    # as above
. . . 

The rest of the program and the training and test data can be found at: https://jamesmccaffrey.wordpress.com/2022/09/01/multi-class-classification-using-pytorch-1-12-1-on-windows-10-11/.

There’s no big moral to this story — just some fun mental exercise to stay in practice with PyTorch.



Two wonderful illustrations tagged as “amazingsurf” from fractal.batjorge.com. I don’t know the artist, but I’ll bet he does artistic exercises to stay in practice.


Posted in PyTorch | Leave a comment

“AI Coding Assistants Shake Up Software Development, But May Have Unintended Consequences” on the Pure AI Web Site

I contributed to an article titled “AI Coding Assistants Shake Up Software Development, But May Have Unintended Consequences” in the February 2023 edition of the Pure AI web site. See https://pureai.com/articles/2023/02/03/coding-assistants.aspx.

OpenAI Codex and GitHub Copilot are closely related AI systems that can automatically write computer code snippets and even entire computer programs.



Example of Codex



Example of Copilot


Briefly, a Transformer is a complex low-level neural code module. GPT-3 is a large language model that uses Transformer architecture, is trained on Wikipedia, and that understands English. ChatGPT is a conversation chatbot application built on top of GPT-3. Codex is a system with ChatGPT plus additional training on computer code so Codex understands English and computer languages. Copilot is a wrapper over Codex that integrates directly into a software development programming environment.

I contributed a few opinions:

McCaffrey commented, “Like most technologies, AI assisted programming will have pros and cons. Like it or not, systems like Codex and Copilot are probably here to stay.”

He added, “These AI assistant systems are all based on existing knowledge. In their current states, they can simulate creativity in very surprising ways by combining ideas but they really can’t generate completely new algorithms.”

McCaffrey further noted, “One of my work colleagues speculated about a future with many specialized AI assistant systems, such as a physics assistant, a biology assistant and so on. Imagine if all these AI assistants could automatically communicate with each other using natural language via ChatGPT. It’s fun to imagine what that scenario could lead to.”



The thought of computer programs that write computer programs is mildly scary because of the possibility of good programs going bad. For example:

Left: In “Red Planet” (2000) an expedition to Mars has a robot named AMEE (Autonomous Mapping Exploration and Evasion). It gets damaged by a gamma-ray burst and starts hunting the crew.

Center: In “Demon Seed” (1977) a scientist creates an AI program named Proteus IV, which in turn creates a robot named Joshua. Proteus and Joshua are not pleased when the scientist tries to turn them off.

Right: In “Gog” (1954), Gog (and its partner Magog) are research robots. The Soviets gain control of the robots and they run amok.


Posted in Machine Learning | Leave a comment

Implementing a Neural Network Using Raw JavaScript

Quite some time ago I implemented a neural network multi-class classifier using raw JavaScript. The implementation had a single hidden layer of nodes, but even so, the implementation took many days to complete. See https://jamesmccaffrey.wordpress.com/2022/01/24/a-neural-network-with-raw-javascript/.

My old implementation used one-hot encoding for predictor variables, softmax output node activation, and mean squared error based optimization. But recently, on a cold and dreary Pacific Northwest weekend, in a moment of temporary insanity, I decided to update my old JavaScript code to mirror common PyTorch techniques: ordinal encoding for predictor variables, log-softmax output node activation, and negative log loss (aka cross entropy) based optimization. Many hours later, I got a revised version of the JavaScript neural network running.

I used one of my standard multi-class datasets where the goal is to predict a person’s political leaning (conservative = 0, moderate = 1, liberal = 2) from sex (M = -1, F = +1), age (divided by 100), state (Michigan = 1 0 0, Nebraska = 0 1 0, Oklahoma = 0 0 1), and income (divided by $100,000). The data looks like:

 1  0.24  1  0  0  0.2950  2
-1  0.39  0  0  1  0.5120  1
 1  0.63  0  1  0  0.7580  0
-1  0.36  1  0  0  0.4450  1
. . . 

The JavaScript neural network program (along with the supporting Utility library of functions) is hundreds of lines of code so I won’t present it all in this blog post. The program main() function starts with:

let U = require("../Utilities/utilities_lib.js");
let FS = require("fs");

function main()
{
  process.stdout.write("\033[0m");  // reset
  process.stdout.write("\x1b[1m" + "\x1b[37m");  // white
  console.log("\nBegin People Data demo with JavaScript ");

  // 1. load data
  // raw data looks like:   M  32   michigan  52,000.00  lib
  // norm data looks like: -1  0.32  1 0 0     0.5250     2
  let trainX = U.loadTxt(".\\Data\\people_train.txt",
    "\t", [0,1,2,3,4,5]);
  let trainY = U.loadTxt(".\\Data\\people_train.txt",
    "\t", [6]);
  let testX = U.loadTxt(".\\Data\\people_test.txt",
    "\t", [0,1,2,3,4,5]);
  let testY = U.loadTxt(".\\Data\\people_test.txt",
    "\t", [6]);

  // 2. create network
  console.log("\nCreating 6-25-3 tanh, log-softmax NN ");
  let seed = 0;
  let nn = new NeuralNet(6, 25, 3, seed);

  // 3. train network
  let lrnRate = 0.01;
  let maxEpochs = 5000;
  console.log("Starting training lrn rate = 0.01 ");
  nn.train(trainX, trainY, lrnRate, maxEpochs);
  console.log("Training complete");

  . . .

Ultimately, my JavaScript exploration was essentially nothing more than weekend mental exercise and a way to practice my JavaScript skills. It was a lot of work but satisfying.



Most of the guys I work with love what they do and so we often write code on weekends — because we want to, not because we have to. I suspect that artists paint and draw even when they don’t have to, because they love their work. Here are examples of three of my favorite comic book artists of the 1960s. Left: A “The Atom” cover by Gil Kane (1926-2000). Center: A “Superman” cover by Curt Swan (1920-1996). Right: A “The Flash” cover by Carmine Infantino (1925-2013).


Posted in JavaScript | 1 Comment

The Apparent Contradiction Between Warm-Start Training and Fine-Tuning Training, and the Physics of AI

Briefly: The term warm-start training applies to standard neural networks, and the term fine-tuning training applies to Transformer architecture networks. Both are essentially the same technique but warm-start is ineffective and fine-tuning is effective. The reason for this apparent contradiction isn’t completely clear and is related to a new idea being called “the physics of AI”. However, there’s really no contradiction because Transformer architecture networks work quite a bit differently from standard neural networks.

Bear with me for a minute. In warm-start training, you have a standard neural network model such as a CNN image classifier or a regression prediction system. As new data arrives (for example, new house sales data to a house price prediction system), instead of retraining your network from scratch using all your data (the old data plus the new data) with weights randomly initialized, you retrain your network using just the new data, starting with the old weights. This is called warm-start training. As it turns out, surprisingly, warm-start training doesn’t work very well in the sense that new model doesn’t generalize well on new, previously unseen data.


An example of warm-start training

This phenomenon — the relative ineffectiveness of warm-start training — has been explored in the research paper “On Warm-Starting Neural Network Training” by J. Ash and R. Adams.

OK. Now in fine-tuning training, you have a Transformer architecture model such as a GPT-3 large language model. The pre-trained model understands the English language in a way that is not fully understood. To adapt the large model to a specific problem domain, such as an AI chemistry assistant, you train the model starting with the existing GPT-3 weights (175 billion of them) and add the new chemistry data. The resulting model seems to work very well (although as I write this blog post, this is all a very new area of exploration).

Note: Augmenting a large language model in this way is normally accomplished using a technique with relatively little data, called one-shot training or few-shot training.

Note: The terms “warm-start training” and “fine-tuning training” are not rigidly defined in research literature and so they can have different meanings in different research papers.

So, if you think about the standard neural network warm-start training and the Transformer architecture fine-tuning training, they’re both the same technique — you train a new model using new data but starting with the existing model weights. But warm-start training appears to be ineffective but fine-tuning training appears to be effective. It’s likely that pre-trained Transformer architecture networks learn general purpose connections that allow a fine-tuned network to generalize better.

This comparison between standard and Transformer networks points out that deep neural models are not well understood.

One of my work colleagues, Sebastien Bubeck, has suggested an approach to the science of deep learning that roughly follows what physicists do to understand reality:

1.) Explore phenomena through controlled experiments.
2.) Build theories based on simple mathematical models that aren’t necessarily fully rigorous.

Fascinating stuff. By the way, I became aware of these ideas via ad hoc, impromptu hallway conversations at the large tech company I work for. These conversations would not have happened if I was working from home remotely. There’s overwhelming evidence that for the type of work I do, working in a traditional office/lab environment increases productivity and creativity, and which (for me at least) increases my job satisfaction.



If you Google for “physics of AI” you can find a YouTube presentation. Sebastien looks somewhat menacing here but in real-life he’s friendly.


Posted in Machine Learning | Leave a comment

Example of Spectral Clustering Using the scikit Library

Ah, where to begin. Bottom line: spectral clustering is a machine learning technique that is great in theory but just isn’t practical or useful in most real-world scenarios.

The idea of clustering is to group data points together so that similar data points are in the same group. By far the most common algorithm for clustering is the k-means (also called Lloyd’s) algorithm. It’s simple and effective. Note: k-means++ is just k-means with clever initialization.

But researchers do research and the spectral clustering technique was introduced in 1995 (although the ideas involved had been around since the 1970s). Spectral clustering is intended to cluster data that has unusual geometry. The standard example is data that forms two concentric circles when graphed.

Briefly, spectral clustering starts by creating a graph from the source data, typically by using the k-nearest neighbors algorithm. Then the matrix that defines the graph is analyzed using eigenvalue decomposition. Then the results of that decomposition are clustered using the k-means algorithm. Note: There are dozes of variations of spectral clustering.

I put together a demo of spectral clustering using the scikit library. The main problem from a practical point of view is that spectral clustering has too many parameters:

SpectralClustering(n_clusters=8, *,
  eigen_solver=None,
  n_components=None,
  random_state=None,
  n_init=10,
  gamma=1.0,
  affinity='rbf',
  n_neighbors=10,
  eigen_tol='auto',
  assign_labels='kmeans',
  degree=3,
  coef0=1,
  kernel_params=None,
  n_jobs=None,
  verbose=False)

The result clustering is highly sensitive to the parameters used. In the image above, the data sort of looks like there are two concentric circles and an artist would cluster the inner data points together and the outer data points together. But is that science or is it art?

If you think about how spectral clustering works, when the k-nearest neighbors algorithm is used to create the data graph, that’s where clustering is actually happening.

I have followed clustering research for many years. In my opinion, spectral clustering is somewhat of an example of a research solution in search of a practical problem, at least for real-world data science scenarios. That statement is a bit of an exaggeration, but in all the practical engineering situations I’ve been in, k-means works better than spectral clustering when you take all factors into account.



Three travel posters by Danish illustrator Mads Berg. The similar style makes them easy to cluster together from an art point of view.


Demo code.

# spectral_cluster_scikit.py

# PyTorch 1.12.1-CPU Anaconda3-2020.02  Python 3.7.6
# scikit 0.22.1
# Windows 10/11 

import numpy as np
from sklearn.cluster import SpectralClustering
import matplotlib.pyplot as plt

# ---------------------------------------------------------

def my_make_circles(n_samples=100, factor=0.8,
  noise=None, seed=1):

  rnd = np.random.RandomState(seed)
  n_samples_out = n_samples // 2
  n_samples_in = n_samples - n_samples_out

  lin_out = np.linspace(0, 2 * np.pi, n_samples_out,
    endpoint=False)
  lin_in = np.linspace(0, 2 * np.pi, n_samples_in,
    endpoint=False)
  outer_circ_x = np.cos(lin_out)
  outer_circ_y = np.sin(lin_out)
  inner_circ_x = np.cos(lin_in) * factor
  inner_circ_y = np.sin(lin_in) * factor

  X = np.vstack(
    [np.append(outer_circ_x, inner_circ_x),
     np.append(outer_circ_y, inner_circ_y)]).T
  y = np.hstack(
    [np.zeros(n_samples_out, dtype=np.int64),
     np.ones(n_samples_in, dtype=np.int64)])

  # add noise
  if noise is not None:
    X += rnd.normal(loc=0.0, scale=noise, size=X.shape)
  
  return X, y

# ---------------------------------------------------------

def main():
  print("\nBegin spectral clustering demo ")

  data, labels = my_make_circles(n_samples=20, 
    factor=0.40, noise=0.06, seed=0)

  print("\ndata = ")
  print(data)
  print("\nlabels = ")
  print(labels)

  plt.scatter(data[:,0], data[:,1])
  plt.show()

  # from sklearn.cluster import KMeans
  # print("\nClustering using basic k-means ")
  # clustering = KMeans(n_clusters=2,
  #   random_state=0).fit(data)
  # print("Result clustering: ")
  # print(clustering.labels_)

  print("\nClustering using spectral k-NN(5) ")
  clustering = SpectralClustering(n_clusters=2,
    affinity='nearest_neighbors',
    n_neighbors=4,
    assign_labels='kmeans',
    random_state=0).fit(data)
  print("Result clustering: ")
  print(clustering.labels_)

  print("\nClustering using spectral RBF ")
  clustering = SpectralClustering(n_clusters=2,
    affinity='rbf',
    assign_labels='kmeans',
    random_state=0).fit(data)
  print("Result clustering: ")
  print(clustering.labels_)
  
  print("\nEnd demo ")

if __name__ == "__main__":
  main()
Posted in Scikit | Leave a comment

Experimenting with the Codex AI Programming Assistant

ChatGPT is a chatbot that is based on the GPT-3 neural transformer model. ChatGPT was trained on an enormous corpus of English text — all of Wikipedia, books, Web sites, and so on. In ways that aren’t fully understood, ChatGPT understands the English language.

OpenAI, the company that created ChatGPT, also created Codex. Codex was trained on billions of lines of computer code, mostly from GitHub. So Codex understands English and also programming. Codex understands most common programming languages such as SQL, C#, and Python.

Note: Copilot is a wrapper over Codex that is integrated directly into the Visual Studio IDE program. I looked at Copilot in a separate set of experiments.

I went to the Codex web site at https://openai.com/blog/openai-codex/ and registered myself. After logging in, I went to their Playground page where I experimented.

First I tried:

Show me an array binary search function in Python.

The result was:

def binary_search(arr, target):
  low = 0
  high = len(arr) - 1
  while low <= high:
  mid = (low + high) // 2
  if arr[mid] == target:
    return mid
  elif arr[mid] < target:
    low = mid + 1
  else:
    high = mid - 1

Pretty good. Next I tried:

SQL code to create a database of books.

The result was:

CREATE DATABASE books;

USE books;

CREATE TABLE books (
  book_id INT AUTO_INCREMENT PRIMARY KEY,
  title VARCHAR(255) NOT NULL,
  author VARCHAR(255) NOT NULL,
  isbn VARCHAR(13) NOT NULL,
  year_published INT NOT NULL
);

Again, pretty good. Next, I got aggressive and tried:

PyTorch code to create and train a classifier for MNIST data.

I got the results shown at the bottom of this blog post. At first glance the Codex result looked good but on close inspection, the result really wasn’t very good.

Moral: Codex is quite good at answering queries that have short amounts of code but it’s not capable of constructing complex programs.



Artificial Intelligence is in its infancy. Airplane designs went from rudimentary ideas in the 1910s to jets capable of flying at 1,500 mph just 40 years later. The same crazy-fast pace of development could happen with AI.


The result of the PyTorch query:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

# define the transformer
# convert the data to a normalized torch tensor
transform = transforms.Compose([transforms.ToTensor(),
                transforms.Normalize((0.5,), (0.5,)),
                ])

# download and load the training data
# use a subset of the training set
trainset = datasets.MNIST('~/.pytorch/MNIST_data/',
  download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset,
  batch_size=64, shuffle=True)

# build the neural network
# define the model
class Classifier(nn.Module):
  def __init__(self):
    super().__init__()
    self.fc1 = nn.Linear(784, 256)
    self.fc2 = nn.Linear(256, 128)
    self.fc3 = nn.Linear(128, 64)
    self.fc4 = nn.Linear(64, 10)
    
  def forward(self, x):
    # flatten the data
    x = x.view(x.shape[0], -1)
    
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = F.relu(self.fc3(x))
    x = F.log_softmax(self.fc4(x), dim=1)
    
    return x


# initialize the model
model = Classifier()

# define the loss
criterion = nn.NLLLoss()

# define the optimizer
optimizer = optim.Adam(model.parameters(), lr=0.003)

# train the model
epochs = 5

for i in range(epochs):
  running_loss = 0
  for images, labels in trainloader:
    
    # clear the gradients
    optimizer.zero_grad()
    
    # forward pass
    log_ps = model(images)
    loss = criterion(log_ps, labels)
    
    # backward pass
    loss.backward()
    optimizer.step()
    
    running_loss += loss.item()
    
  print(f"Training loss: {running_loss/len(trainloader)}")
Posted in Machine Learning | Leave a comment