## “Regression Using a scikit MLPRegressor Neural Network” in Visual Studio Magazine

I wrote an article titled “Regression Using a scikit MLPRegressor Neural Network” in the May 2023 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2023/05/01/regression-scikit.aspx.

A regression problem is one where the goal is to predict a single numeric value. For example, you might want to predict the annual income of a person based on their sex, age, state where they live and political leaning. Note that the common “logistic regression” machine learning technique is actually a binary classification system in spite of its name.

Arguably the most powerful regression technique is a neural network model. There are several tools and code libraries that you can use to create a neural network regression model. I usually use PyTorch but the scikit-learn library (also called scikit or sklearn) is based on the Python language and is popular among beginners. The original versions of scikit were built around classical machine learning techniques such as categorical Gaussian naive Bayes classification, kernel ridge regression, and various forms of decision trees. But neural network classifiers and regression systems were added in about 2016 or 2017 when it was clear that neural techniques were the trend of the future of ML.

I used one of my standard synthetic datasets. The goal is to predict annual income from sex (male = -1, female = +1), age (divided by 100), State of residence (Michigan = 100, Nebraska = 010, Oklahoma = 001), and political leaning (conservative = 100, moderate = 010, liberal = 001). The data looks like:

``` 1   0.24   1 0 0   0.2950   0 0 1
-1   0.39   0 0 1   0.5120   0 1 0
1   0.63   0 1 0   0.7580   1 0 0
-1   0.36   1 0 0   0.4450   0 1 0
1   0.27   0 1 0   0.2860   0 0 1
. . .
```

There are 200 training items and 40 test items. The difficult part of any neural network is setting up the many hyperparameters. The demo uses:

```  import numpy as np
from sklearn.neural_network import MLPRegressor

# 2. create network
params = { 'hidden_layer_sizes' : [10,10],
'activation' : 'relu', 'solver' : 'adam',
'alpha' : 0.0, 'batch_size' : 10,
'random_state' : 0, 'tol' : 0.0001,
'nesterovs_momentum' : False,
'learning_rate' : 'constant',
'learning_rate_init' : 0.01,
'max_iter' : 1000, 'shuffle' : True,
'n_iter_no_change' : 50, 'verbose' : False }
```

After setting up the parameters, creating and training the neural network regression prediction system is trivial:

```  print("Creating 8-(10-10)-1 relu neural network ")
net = MLPRegressor(**params)
net.fit(train_x, train_y)
print("Done ")
```

Using the scikit MLPRegressor module is relatively easy (assuming you understand what the hyperparameters mean), but the technique isn’t very flexible. For flexibility, you should use PyTorch but PyTorch requires much more effort.

For children, one of the strongest predictor variables for their adult income is the family structure in which they’re raised. Sadly, there are enormous differences in family structure depending upon race. In some racial groups, over 78% of children are born and raised by single mothers, and in some Census areas (such as parts of Baltimore), that figure is over 95%. I’ve never come across research that explains why this self-destructive behavior occurs. Maybe AI/ML can help somehow.

This entry was posted in Scikit. Bookmark the permalink.