I work at a large tech company. One of the things I do at work is present short (about an hour) talks on machine learning and artificial intelligence topics. A few days ago I gave a talk on performing regression using a neural network, with the PyTorch library.

A regression problem is one where the goal is to predict a numeric value. I used one of the most common datasets, the Boston Housing dataset. There are 506 data items. Each item represents a town near Boston. The goal is to predict the median house price in a town, using 13 predictor variables.

The predictor variables are: The first three predictors, [0] to [2], [0] = are per capita crime rate, [1] = proportion of land zoned for large residential lots, [2] = and proportion of non-retail acres.

Predictor [3] is a Boolean if the town borders the Charles River (0 = no, 1 = yes). Briefly, the remaining predictors are: [4] = air pollution metric, [5] = average number rooms per house, [6] = proportion of old houses, [7] = weighted distance to Boston, [8] = index of accessibility to highways, [9] = tax rate, [10] = pupil-teacher ratio, [11] = measure of proportion of Black residents, and [12] = percentage lower socio-economic status residents.

I briefly talked about two approaches to normalizing the numeric predictor values. The simplest approach is to drop the data into Excel, normalize, then save the normalized data as a text file. The second approach is to programmatically normalize the data. The simple Excel approach has the minor downside that when you want to make a prediction after training, you have to normalize predictor values offline.

Even though regression is quite simple, I didn’t have enough time to discuss the entire program. Machine leaning is incredibly fascinating but there are lots and lots and lots of details.

*A couple of humorous observations related to details by one of my favorite cartoonists, Jim Unger (1937-2012).*

### Like this:

Like Loading...

*Related*