One of my standard neural network examples is to predict employee income from sex, age, city, and job-type. Predicting a single numeric value is usually called a regression problem. (Note: “logistic regression” predicts a single numeric probability value between 0.0 and 1.0 but then that value is immediately used as a binary classification result).

My data is synthetic and looks like:

1 0.24 1 0 0 0.2950 0 0 1 -1 0.39 0 0 1 0.5120 0 1 0 1 0.63 0 1 0 0.7580 1 0 0 -1 0.36 1 0 0 0.4450 0 1 0 1 0.27 0 1 0 0.2860 0 0 1 . . .

There are 200 training items and 40 test items.

The first value in column [0] is sex (M = -1, F = +1). Column [1] is age, normalized by dividing by 100. Columns [2,3,4] is city one-hot encoded (anaheim, boulder, concord). Column [5] is annual income, divided by $100,000, and is the value to predict. Columns [6,7,8] is job-type (mgmt, supp, tech).

I designed an 8-(10-10)-1 neural network. I used glorot_uniform() weight initialization with zero-bias initialization. I used tanh() activation on the two hidden layers, and no activation (aka Identity activation) on the single output node.

For training, I used Adam optimization with an initial learning rate of 0.01 along with a batch size of 10. I used mean squared error for the loss function.

For regression problems you must define a custom accuracy() function. My accuracy() function counts an income prediction as correct if it’s within 10% of the true income. I implemented two accuracy() functions. The first version iterates through one data item at a time. This is slow but useful to examine results. The second version feeds all data to the model at the same time. This is faster but more opaque.

*There’s a strong correlation between a person’s job and their income. Here are three people who have interesting jobs.*

Left: According to the BBC, Alan Moore is a “writer, wizard, mall Santa, and Rasputin impersonator”. Impressive.

Center: According to the Food Network TV company, Richard Scheuerman is a “shredded cheese authority”. OK.

*Right: The BBC broadcast an interview with Andrew Drinkwater, from the “Water Research Centre”. He was meant to have that job.*

Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols. My lame blog editor chokes on symbols. For the training and test data, see my post at https://jamesmccaffrey.wordpress.com/2022/05/23/regression-employee-income-using-pytorch-1-10-on-windows-11/ where I did the same problem using PyTorch.

# employee_income_tfk.py # predict income from sex, age, city, job_type # Keras 2.8.0 in TensorFlow 2.8.0 ("_tfk") # Anaconda3-2020.02 Python 3.7.6 Windows 10/11 import os os.environ['TF_CPP_MIN_LOG_LEVEL']='2' # suppress CPU warn import numpy as np import tensorflow as tf from tensorflow import keras as K # ----------------------------------------------------------- class MyLogger(K.callbacks.Callback): def __init__(self, n, model, data_x, data_y): self.n = n # print loss every n epochs self.model = model self.data_x = data_x # needed to compute accuracy self.data_y = data_y def on_epoch_end(self, epoch, logs={}): if epoch % self.n == 0: curr_loss = logs.get('loss') # loss on curr batch acc = accuracy_x(self.model, self.data_x, self.data_y, 0.10) print("epoch = %4d | loss = %0.6f | acc = %0.4f" % \ (epoch, curr_loss, acc)) # ----------------------------------------------------------- def accuracy(model, data_x, data_y, pct_close): # item-by-item -- slow -- for debugging n_correct = 0; n_wrong = 0 n = len(data_x) for i in range(n): x = np.array([data_x[i]]) # [[ x ]] predicted = model.predict(x) actual = data_y[i] if np.abs(predicted[0][0] - actual) "lt" \ np.abs(pct_close * actual): n_correct += 1 else: n_wrong += 1 return (n_correct * 1.0) / (n_correct + n_wrong) # ----------------------------------------------------------- def accuracy_x(model, data_x, data_y, pct_close): n = len(data_x) oupt = model(data_x) oupt = tf.reshape(oupt, [-1]) # 1D max_deltas = tf.abs(pct_close * data_y) # max allow deltas abs_deltas = tf.abs(oupt - data_y) # actual differences results = abs_deltas "lt" max_deltas # [True, False, . .] n_correct = np.sum(results) acc = n_correct / n return acc # ----------------------------------------------------------- def main(): # 0. prepare print("\nBegin Employee predict income using Keras ") np.random.random(1) tf.random.set_seed(1) # 1. load data # sex age city income job_type # -1 0.27 0 1 0 0.7610 0 0 1 # +1 0.19 0 0 1 0.6550 1 0 0 print("\nLoading Employee data into memory ") train_file = ".\\Data\\employee_train.txt" # 200 lines train_x = np.loadtxt(train_file, usecols=[0,1,2,3,4,6,7,8], delimiter="\t", comments="#", dtype=np.float32) train_y = np.loadtxt(train_file, usecols=5, delimiter="\t", comments="#", dtype=np.float32) test_file = ".\\Data\\employee_test.txt" # 40 lines test_x = np.loadtxt(test_file, usecols=[0,1,2,3,4,6,7,8], delimiter="\t", comments="#", dtype=np.float32) test_y = np.loadtxt(test_file, usecols=5, delimiter="\t", comments="#", dtype=np.float32) # ----------------------------------------------------------- # 2. create network print("\nCreating 8-(10-10)-1 neural network ") model = K.models.Sequential() model.add(K.layers.Dense(units=10, input_dim=8, activation='tanh', kernel_initializer='glorot_uniform', bias_initializer='zeros')) # hid1 model.add(K.layers.Dense(units=10, activation='tanh', kernel_initializer='glorot_uniform', bias_initializer='zeros')) # hid2 model.add(K.layers.Dense(units=1, activation=None, kernel_initializer='glorot_uniform', bias_initializer='zeros')) # output layer opt = K.optimizers.Adam(learning_rate=0.01) model.compile(loss='mean_squared_error', optimizer=opt, metrics=['mse']) # ----------------------------------------------------------- # 3. train model print("\nbat_size = 10 ") print("loss = mean_squared_error ") print("optimizer = Adam ") print("lrn_rate = 0.01 ") my_logger = MyLogger(100, model, train_x, train_y) print("\nStarting training ") h = model.fit(train_x, train_y, batch_size=10, epochs=1000, verbose=0, callbacks=[my_logger]) print("Done ") # ----------------------------------------------------------- # 4. evaluate model print("\nComputing model accuracy (within 0.10 of true) ") train_acc = accuracy(model, train_x, train_y, 0.10) print("Accuracy on train data = %0.4f" % train_acc) test_acc = accuracy_x(model, test_x, test_y, 0.10) print("Accuracy on test data = %0.4f" % test_acc) # 5. use model # np.set_printoptions(formatter={'float': '{: 0.6f}'.format}) print("\nPredicting income for M 34 concord support: ") x = np.array([[-1, 0.34, 0,0,1, 0,1,0]], dtype=np.float32) pred_inc = model.predict(x) print("$%0.2f" % (pred_inc * 100_000)) # un-normalized # ----------------------------------------------------------- # 6. save model print("\nSaving trained model ") # model.save_weights(".\\Models\\employee_model_wts.h5") # model.save(".\\Models\\employee_model.h5") # ----------------------------------------------------------- print("\nEnd Employee income demo") if __name__=="__main__": main()

You must be logged in to post a comment.