I work at a large tech company. One of the things I do at work is present short (about an hour) talks on machine learning and artificial intelligence topics. A few days ago I gave a talk on performing regression using a neural network, with the PyTorch library.
A regression problem is one where the goal is to predict a numeric value. I used one of the most common datasets, the Boston Housing dataset. There are 506 data items. Each item represents a town near Boston. The goal is to predict the median house price in a town, using 13 predictor variables.
The predictor variables are: The first three predictors,  to ,  = are per capita crime rate,  = proportion of land zoned for large residential lots,  = and proportion of non-retail acres.
Predictor  is a Boolean if the town borders the Charles River (0 = no, 1 = yes). Briefly, the remaining predictors are:  = air pollution metric,  = average number rooms per house,  = proportion of old houses,  = weighted distance to Boston,  = index of accessibility to highways,  = tax rate,  = pupil-teacher ratio,  = measure of proportion of Black residents, and  = percentage lower socio-economic status residents.
I briefly talked about two approaches to normalizing the numeric predictor values. The simplest approach is to drop the data into Excel, normalize, then save the normalized data as a text file. The second approach is to programmatically normalize the data. The simple Excel approach has the minor downside that when you want to make a prediction after training, you have to normalize predictor values offline.
Even though regression is quite simple, I didn’t have enough time to discuss the entire program. Machine leaning is incredibly fascinating but there are lots and lots and lots of details.