The scikit-learn library is a good, relatively simple way to do machine learning. Early versions of scikit could only do classical machine learning but not neural networks. At some point (not sure when — I think it was version 0.18 in 2016), scikit added the MLPClassifier (“multi-layer perceptron classifier”) class for binary and multi-class neural network classification. I decided to put together a binary classification demo.
I used one of my standard datasets for binary classification. The data looks like:
1 0.24 1 0 0 0.2950 0 0 1 0 0.39 0 0 1 0.5120 0 1 0 1 0.63 0 1 0 0.7580 1 0 0 0 0.36 1 0 0 0.4450 0 1 0 . . .
Each line of data represents a person. The fields are sex (male = 0, female = 1), age (normalized by dividing by 100), state (Michigan = 100, Nebraska = 010, Oklahoma = 001), annual income (divided by 100,000), and politics type (conservative = 100, moderate = 010, liberal = 001). The goal is to predict the gender of a person from their age, state, income, and politics type.
My demo network used a 8-(10-10)-1 architecture with tanh() hidden activation. For training I used stochastic gradient descent optimization with constant learning rate = .01, a batch size of 10, and moderate L2 regularization (alpha = 0.001) to discourage overfitting:
params = { 'hidden_layer_sizes' : [10,10], 'activation' : 'tanh', 'solver' : 'sgd', 'alpha' : 0.001, 'batch_size' : 10, 'random_state' : 0, 'tol' : 0.0001, 'nesterovs_momentum' : False, 'learning_rate' : 'constant', 'learning_rate_init' : 0.01, 'max_iter' : 500, 'shuffle' : True, 'n_iter_no_change' : 50, 'verbose' : False } print("\nCreating 8-(10-10)-1 tanh neural network ") net = MLPClassifier(**params)
The main challenge when using the scikit MLPClassifier class is the overwhelming number of parameters. Explaining the parameters above, as well as the ones where default values were used, would take several pages so I won’t try.
My experiment was fun and I gained some good insights.
My first college degree was in psychology from UC Irvine. I took mostly experimental and cognitive psychology classes but I did take one social psychology class. I remember a couple of lectures on the research of biological and behavioral gender differences such as body language. For example, men do not touch their hair (a sign of lack of confidence), cover their torso with an arm (a sign of vulnerability), or tilt their heads (a sign of submission). I enjoyed my psychology classes but I found my math and computer classes more challenging and interesting.
Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols.
# people_gender_nn_sckit.py # predict sex (0 = male, 1 = female) # from age, state, income, politics # Anaconda3-2022.10 Python 3.9.13 scikit 1.0.2 # Windows 10/11 import numpy as np from sklearn.neural_network import MLPClassifier import warnings warnings.filterwarnings('ignore') # early-stop warnings # --------------------------------------------------------- def show_confusion(cm): dim = len(cm) mx = np.max(cm) # largest count in cm wid = len(str(mx)) + 1 # width to print fmt = "%" + str(wid) + "d" # like "%3d" for i in range(dim): print("actual ", end="") print("%3d:" % i, end="") for j in range(dim): print(fmt % cm[i][j], end="") print("") print("------------") print("predicted ", end="") for j in range(dim): print(fmt % j, end="") print("") # --------------------------------------------------------- def main(): # 0. get ready print("\nBegin scikit neural network binary example ") print("Predict sex from age, State, income, politics ") np.random.seed(1) np.set_printoptions(precision=4, suppress=True) # 1. load data print("\nLoading data into memory ") train_file = ".\\Data\\people_train.txt" train_xy = np.loadtxt(train_file, usecols=range(0,9), delimiter="\t", comments="#", dtype=np.float32) train_x = train_xy[:,1:9] train_y = train_xy[:,0].astype(np.int64) # load, two calls to loadtxt() technique test_file = ".\\Data\\people_test.txt" test_x = np.loadtxt(test_file, usecols=range(1,9), delimiter="\t", comments="#", dtype=np.float32) test_y = np.loadtxt(test_file, usecols=0, delimiter="\t", comments="#", dtype=np.int64) print("\nTraining data:") print(train_x[0:4]) print(". . . \n") print(train_y[0:4]) print(". . . ") # --------------------------------------------------------- # 2. create network # MLPClassifier(hidden_layer_sizes=(100,), # activation='relu', *, solver='adam', alpha=0.0001, # batch_size='auto', learning_rate='constant', # learning_rate_init=0.001, power_t=0.5, max_iter=200, # shuffle=True, random_state=None, tol=0.0001, # verbose=False, warm_start=False, momentum=0.9, # nesterovs_momentum=True, early_stopping=False, # validation_fraction=0.1, beta_1=0.9, beta_2=0.999, # epsilon=1e-08, n_iter_no_change=10, max_fun=15000) params = { 'hidden_layer_sizes' : [10,10], 'activation' : 'tanh', 'solver' : 'sgd', 'alpha' : 0.001, 'batch_size' : 10, 'random_state' : 0, 'tol' : 0.0001, 'nesterovs_momentum' : False, 'learning_rate' : 'constant', 'learning_rate_init' : 0.01, 'max_iter' : 500, 'shuffle' : True, 'n_iter_no_change' : 50, 'verbose' : False } print("\nCreating 8-(10-10)-1 tanh neural network ") net = MLPClassifier(**params) # --------------------------------------------------------- # 3. train print("\nTraining with bat sz = " + \ str(params['batch_size']) + " lrn rate = " + \ str(params['learning_rate_init']) + " ") print("Stop if no change " + \ str(params['n_iter_no_change']) + " iterations ") net.fit(train_x, train_y) print("Done ") # --------------------------------------------------------- # 4. evaluate model acc_train = net.score(train_x, train_y) print("\nAccuracy on train = %0.4f " % acc_train) acc_test = net.score(test_x, test_y) print("Accuracy on test = %0.4f " % acc_test) from sklearn.metrics import confusion_matrix y_predicteds = net.predict(test_x) cm = confusion_matrix(test_y, y_predicteds) print("\nConfusion matrix: \n") # print(cm) # raw show_confusion(cm) # custom formatted from sklearn.metrics import precision_score from sklearn.metrics import recall_score from sklearn.metrics import f1_score y_predicteds = net.predict(test_x) precision = precision_score(test_y, y_predicteds) print("\nPrecision on test = %0.4f " % precision) recall = recall_score(test_y, y_predicteds) print("Recall on test = %0.4f " % recall) f1 = f1_score(test_y, y_predicteds) print("F1 score on test = %0.4f " % f1) # --------------------------------------------------------- # 5. use model print("\nSetting age = 30 Oklahoma $40,000 moderate ") X = np.array([[0.30, 0,0,1, 0.4000, 0,1,0]], dtype=np.float32) probs = net.predict_proba(X) print("\nPrediction pseudo-probs: ") print(probs) sex = net.predict(X) print("\nPredicted class: ") print(sex) # a vector with a single value if sex[0] == 0: print("male") elif sex[0] == 1: print("female") # --------------------------------------------------------- # 6. TODO: save model using pickle print("\nEnd scikit binary neural network demo ") if __name__ == "__main__": main()
Training data. Replace commas with tabs or modify program.
# people_train.txt # sex (0 = male, 1 = female) - dependent variable # age (div 100), # state (michigan = 100, nebraska = 010, oklahoma = 001), # income (div $100,000), # politics type (conservative, moderate, liberal) # 1,0.24,1,0,0,0.2950,0,0,1 0,0.39,0,0,1,0.5120,0,1,0 1,0.63,0,1,0,0.7580,1,0,0 0,0.36,1,0,0,0.4450,0,1,0 1,0.27,0,1,0,0.2860,0,0,1 1,0.50,0,1,0,0.5650,0,1,0 1,0.50,0,0,1,0.5500,0,1,0 0,0.19,0,0,1,0.3270,1,0,0 1,0.22,0,1,0,0.2770,0,1,0 0,0.39,0,0,1,0.4710,0,0,1 1,0.34,1,0,0,0.3940,0,1,0 0,0.22,1,0,0,0.3350,1,0,0 1,0.35,0,0,1,0.3520,0,0,1 0,0.33,0,1,0,0.4640,0,1,0 1,0.45,0,1,0,0.5410,0,1,0 1,0.42,0,1,0,0.5070,0,1,0 0,0.33,0,1,0,0.4680,0,1,0 1,0.25,0,0,1,0.3000,0,1,0 0,0.31,0,1,0,0.4640,1,0,0 1,0.27,1,0,0,0.3250,0,0,1 1,0.48,1,0,0,0.5400,0,1,0 0,0.64,0,1,0,0.7130,0,0,1 1,0.61,0,1,0,0.7240,1,0,0 1,0.54,0,0,1,0.6100,1,0,0 1,0.29,1,0,0,0.3630,1,0,0 1,0.50,0,0,1,0.5500,0,1,0 1,0.55,0,0,1,0.6250,1,0,0 1,0.40,1,0,0,0.5240,1,0,0 1,0.22,1,0,0,0.2360,0,0,1 1,0.68,0,1,0,0.7840,1,0,0 0,0.60,1,0,0,0.7170,0,0,1 0,0.34,0,0,1,0.4650,0,1,0 0,0.25,0,0,1,0.3710,1,0,0 0,0.31,0,1,0,0.4890,0,1,0 1,0.43,0,0,1,0.4800,0,1,0 1,0.58,0,1,0,0.6540,0,0,1 0,0.55,0,1,0,0.6070,0,0,1 0,0.43,0,1,0,0.5110,0,1,0 0,0.43,0,0,1,0.5320,0,1,0 0,0.21,1,0,0,0.3720,1,0,0 1,0.55,0,0,1,0.6460,1,0,0 1,0.64,0,1,0,0.7480,1,0,0 0,0.41,1,0,0,0.5880,0,1,0 1,0.64,0,0,1,0.7270,1,0,0 0,0.56,0,0,1,0.6660,0,0,1 1,0.31,0,0,1,0.3600,0,1,0 0,0.65,0,0,1,0.7010,0,0,1 1,0.55,0,0,1,0.6430,1,0,0 0,0.25,1,0,0,0.4030,1,0,0 1,0.46,0,0,1,0.5100,0,1,0 0,0.36,1,0,0,0.5350,1,0,0 1,0.52,0,1,0,0.5810,0,1,0 1,0.61,0,0,1,0.6790,1,0,0 1,0.57,0,0,1,0.6570,1,0,0 0,0.46,0,1,0,0.5260,0,1,0 0,0.62,1,0,0,0.6680,0,0,1 1,0.55,0,0,1,0.6270,1,0,0 0,0.22,0,0,1,0.2770,0,1,0 0,0.50,1,0,0,0.6290,1,0,0 0,0.32,0,1,0,0.4180,0,1,0 0,0.21,0,0,1,0.3560,1,0,0 1,0.44,0,1,0,0.5200,0,1,0 1,0.46,0,1,0,0.5170,0,1,0 1,0.62,0,1,0,0.6970,1,0,0 1,0.57,0,1,0,0.6640,1,0,0 0,0.67,0,0,1,0.7580,0,0,1 1,0.29,1,0,0,0.3430,0,0,1 1,0.53,1,0,0,0.6010,1,0,0 0,0.44,1,0,0,0.5480,0,1,0 1,0.46,0,1,0,0.5230,0,1,0 0,0.20,0,1,0,0.3010,0,1,0 0,0.38,1,0,0,0.5350,0,1,0 1,0.50,0,1,0,0.5860,0,1,0 1,0.33,0,1,0,0.4250,0,1,0 0,0.33,0,1,0,0.3930,0,1,0 1,0.26,0,1,0,0.4040,1,0,0 1,0.58,1,0,0,0.7070,1,0,0 1,0.43,0,0,1,0.4800,0,1,0 0,0.46,1,0,0,0.6440,1,0,0 1,0.60,1,0,0,0.7170,1,0,0 0,0.42,1,0,0,0.4890,0,1,0 0,0.56,0,0,1,0.5640,0,0,1 0,0.62,0,1,0,0.6630,0,0,1 0,0.50,1,0,0,0.6480,0,1,0 1,0.47,0,0,1,0.5200,0,1,0 0,0.67,0,1,0,0.8040,0,0,1 0,0.40,0,0,1,0.5040,0,1,0 1,0.42,0,1,0,0.4840,0,1,0 1,0.64,1,0,0,0.7200,1,0,0 0,0.47,1,0,0,0.5870,0,0,1 1,0.45,0,1,0,0.5280,0,1,0 0,0.25,0,0,1,0.4090,1,0,0 1,0.38,1,0,0,0.4840,1,0,0 1,0.55,0,0,1,0.6000,0,1,0 0,0.44,1,0,0,0.6060,0,1,0 1,0.33,1,0,0,0.4100,0,1,0 1,0.34,0,0,1,0.3900,0,1,0 1,0.27,0,1,0,0.3370,0,0,1 1,0.32,0,1,0,0.4070,0,1,0 1,0.42,0,0,1,0.4700,0,1,0 0,0.24,0,0,1,0.4030,1,0,0 1,0.42,0,1,0,0.5030,0,1,0 1,0.25,0,0,1,0.2800,0,0,1 1,0.51,0,1,0,0.5800,0,1,0 0,0.55,0,1,0,0.6350,0,0,1 1,0.44,1,0,0,0.4780,0,0,1 0,0.18,1,0,0,0.3980,1,0,0 0,0.67,0,1,0,0.7160,0,0,1 1,0.45,0,0,1,0.5000,0,1,0 1,0.48,1,0,0,0.5580,0,1,0 0,0.25,0,1,0,0.3900,0,1,0 0,0.67,1,0,0,0.7830,0,1,0 1,0.37,0,0,1,0.4200,0,1,0 0,0.32,1,0,0,0.4270,0,1,0 1,0.48,1,0,0,0.5700,0,1,0 0,0.66,0,0,1,0.7500,0,0,1 1,0.61,1,0,0,0.7000,1,0,0 0,0.58,0,0,1,0.6890,0,1,0 1,0.19,1,0,0,0.2400,0,0,1 1,0.38,0,0,1,0.4300,0,1,0 0,0.27,1,0,0,0.3640,0,1,0 1,0.42,1,0,0,0.4800,0,1,0 1,0.60,1,0,0,0.7130,1,0,0 0,0.27,0,0,1,0.3480,1,0,0 1,0.29,0,1,0,0.3710,1,0,0 0,0.43,1,0,0,0.5670,0,1,0 1,0.48,1,0,0,0.5670,0,1,0 1,0.27,0,0,1,0.2940,0,0,1 0,0.44,1,0,0,0.5520,1,0,0 1,0.23,0,1,0,0.2630,0,0,1 0,0.36,0,1,0,0.5300,0,0,1 1,0.64,0,0,1,0.7250,1,0,0 1,0.29,0,0,1,0.3000,0,0,1 0,0.33,1,0,0,0.4930,0,1,0 0,0.66,0,1,0,0.7500,0,0,1 0,0.21,0,0,1,0.3430,1,0,0 1,0.27,1,0,0,0.3270,0,0,1 1,0.29,1,0,0,0.3180,0,0,1 0,0.31,1,0,0,0.4860,0,1,0 1,0.36,0,0,1,0.4100,0,1,0 1,0.49,0,1,0,0.5570,0,1,0 0,0.28,1,0,0,0.3840,1,0,0 0,0.43,0,0,1,0.5660,0,1,0 0,0.46,0,1,0,0.5880,0,1,0 1,0.57,1,0,0,0.6980,1,0,0 0,0.52,0,0,1,0.5940,0,1,0 0,0.31,0,0,1,0.4350,0,1,0 0,0.55,1,0,0,0.6200,0,0,1 1,0.50,1,0,0,0.5640,0,1,0 1,0.48,0,1,0,0.5590,0,1,0 0,0.22,0,0,1,0.3450,1,0,0 1,0.59,0,0,1,0.6670,1,0,0 1,0.34,1,0,0,0.4280,0,0,1 0,0.64,1,0,0,0.7720,0,0,1 1,0.29,0,0,1,0.3350,0,0,1 0,0.34,0,1,0,0.4320,0,1,0 0,0.61,1,0,0,0.7500,0,0,1 1,0.64,0,0,1,0.7110,1,0,0 0,0.29,1,0,0,0.4130,1,0,0 1,0.63,0,1,0,0.7060,1,0,0 0,0.29,0,1,0,0.4000,1,0,0 0,0.51,1,0,0,0.6270,0,1,0 0,0.24,0,0,1,0.3770,1,0,0 1,0.48,0,1,0,0.5750,0,1,0 1,0.18,1,0,0,0.2740,1,0,0 1,0.18,1,0,0,0.2030,0,0,1 1,0.33,0,1,0,0.3820,0,0,1 0,0.20,0,0,1,0.3480,1,0,0 1,0.29,0,0,1,0.3300,0,0,1 0,0.44,0,0,1,0.6300,1,0,0 0,0.65,0,0,1,0.8180,1,0,0 0,0.56,1,0,0,0.6370,0,0,1 0,0.52,0,0,1,0.5840,0,1,0 0,0.29,0,1,0,0.4860,1,0,0 0,0.47,0,1,0,0.5890,0,1,0 1,0.68,1,0,0,0.7260,0,0,1 1,0.31,0,0,1,0.3600,0,1,0 1,0.61,0,1,0,0.6250,0,0,1 1,0.19,0,1,0,0.2150,0,0,1 1,0.38,0,0,1,0.4300,0,1,0 0,0.26,1,0,0,0.4230,1,0,0 1,0.61,0,1,0,0.6740,1,0,0 1,0.40,1,0,0,0.4650,0,1,0 0,0.49,1,0,0,0.6520,0,1,0 1,0.56,1,0,0,0.6750,1,0,0 0,0.48,0,1,0,0.6600,0,1,0 1,0.52,1,0,0,0.5630,0,0,1 0,0.18,1,0,0,0.2980,1,0,0 0,0.56,0,0,1,0.5930,0,0,1 0,0.52,0,1,0,0.6440,0,1,0 0,0.18,0,1,0,0.2860,0,1,0 0,0.58,1,0,0,0.6620,0,0,1 0,0.39,0,1,0,0.5510,0,1,0 0,0.46,1,0,0,0.6290,0,1,0 0,0.40,0,1,0,0.4620,0,1,0 0,0.60,1,0,0,0.7270,0,0,1 1,0.36,0,1,0,0.4070,0,0,1 1,0.44,1,0,0,0.5230,0,1,0 1,0.28,1,0,0,0.3130,0,0,1 1,0.54,0,0,1,0.6260,1,0,0
Test data:
0,0.51,1,0,0,0.6120,0,1,0 0,0.32,0,1,0,0.4610,0,1,0 1,0.55,1,0,0,0.6270,1,0,0 1,0.25,0,0,1,0.2620,0,0,1 1,0.33,0,0,1,0.3730,0,0,1 0,0.29,0,1,0,0.4620,1,0,0 1,0.65,1,0,0,0.7270,1,0,0 0,0.43,0,1,0,0.5140,0,1,0 0,0.54,0,1,0,0.6480,0,0,1 1,0.61,0,1,0,0.7270,1,0,0 1,0.52,0,1,0,0.6360,1,0,0 1,0.30,0,1,0,0.3350,0,0,1 1,0.29,1,0,0,0.3140,0,0,1 0,0.47,0,0,1,0.5940,0,1,0 1,0.39,0,1,0,0.4780,0,1,0 1,0.47,0,0,1,0.5200,0,1,0 0,0.49,1,0,0,0.5860,0,1,0 0,0.63,0,0,1,0.6740,0,0,1 0,0.30,1,0,0,0.3920,1,0,0 0,0.61,0,0,1,0.6960,0,0,1 0,0.47,0,0,1,0.5870,0,1,0 1,0.30,0,0,1,0.3450,0,0,1 0,0.51,0,0,1,0.5800,0,1,0 0,0.24,1,0,0,0.3880,0,1,0 0,0.49,1,0,0,0.6450,0,1,0 1,0.66,0,0,1,0.7450,1,0,0 0,0.65,1,0,0,0.7690,1,0,0 0,0.46,0,1,0,0.5800,1,0,0 0,0.45,0,0,1,0.5180,0,1,0 0,0.47,1,0,0,0.6360,1,0,0 0,0.29,1,0,0,0.4480,1,0,0 0,0.57,0,0,1,0.6930,0,0,1 0,0.20,1,0,0,0.2870,0,0,1 0,0.35,1,0,0,0.4340,0,1,0 0,0.61,0,0,1,0.6700,0,0,1 0,0.31,0,0,1,0.3730,0,1,0 1,0.18,1,0,0,0.2080,0,0,1 1,0.26,0,0,1,0.2920,0,0,1 0,0.28,1,0,0,0.3640,0,0,1 0,0.59,0,0,1,0.6940,0,0,1
You must be logged in to post a comment.