Example of Kernel Ridge Regression Using the scikit Library

A regression problem is one where the goal is to predict a single numeric value. For example, you might want to predict the income of a person based on their sex, age, State, and political leaning. (Note: Somewhat confusingly, “logistic regression” is a binary classification technique in spite of its name).

The scikit (short for scikit-learn or sklearn) library has a Kernel Ridge Regression (KRR) module to predict a numeric value. KRR is an advanced version of basic linear regression. The “Kernel” in KRR means the technique uses the kernel trick which allows KRR to deal with complex data that’s not linearly separable. The “Ridge” indicates KRR uses ridge regularization to limit model overfitting. I hadn’t looked at KRR in a long time so I decided to code up a quick demo.

I used one of my standard demo datasets that looks like:

```# sex age   state   income   politics
-1  0.27  0 1 0   0.7610   0 0 1
+1  0.19  0 0 1   0.6550   1 0 0
. . .
```

The goal is to predict income from sex, age, State and politics. The sex column is encoded as Male = -1, Female = +1. Ages are divided by 100. The States are Michigan = 100, Nebraska = 010, Oklahoma = 001. Incomes are divided by \$100,000. The politics are conservative = 100, moderate = 010, liberal = 001.

I made a training file with 200 items and a test file with 40 items. The complete data is at: jamesmccaffrey.wordpress.com/2022/10/10/regression-people-income-using-pytorch-1-12-on-windows-10-11/.

Kernel ridge regression is difficult to explain. The technique is based on simple linear regression where each predictor value is multiplied by a weight. But the technique uses a kernel method where a kernel function is applied to each training item and the item to predict. This allows the technique to deal with data that isn’t linearly separable.

The ridge part of the KRR name means that L2 regularization is applied to prevent model overfitting, which kernel techniques are often highly vulnerable to.

After loading the training data into memory, the key statements in my demo program are:

```print("Creating and training KRR poly(4) model ")
model = KernelRidge(alpha=1.0, kernel='poly', degree=4)
model.fit(train_X, train_y)
```

The parameters to the KernelRidge class would take forever to explain in detail, and this is one of the difficulties with using KRR. The kernel function can be one of ‘additive_chi2’, ‘chi2’, ‘linear’, ‘poly’, ‘polynomial’, ‘rbf’, ‘laplacian’, ‘sigmoid’, ‘cosine’ and a good one must be determined by trial and error. The two most common are ‘polynomial’ and ‘rbf’ (radial basis function), but weirdly the default is ‘linear’.

One issue with regression problems is that you must implement a program-defined accuracy function. For a classification problem, a prediction is either correct or wrong. But with regression, when you predict a numeric value, you must specify what is correct prediction is. I defined an accuracy function where a prediction that is within 10% of the true value is considered correct.

I haven’t seen kernel ridge regression used very much. Neural networks are more powerful than KRR, but neural networks require lots of training data and neural networks are more difficult to fine tune.

Kernel ridge regression has been around for a long time — since about 1970. St. Patrick’s Day has been celebrated on March 17 since 1631. Here are three examples of St. Patrick’s Day garb that have varying degrees of sophistication.

Demo code. Replace “lt” (less-than) with Boolean operator symbol.

```# kernel_ridge_regression.py
# PyTorch 1.12.1-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10/11
# scikit / sklearn 0.22.1

# predict income from sex, age, State, politics

import numpy as np
from sklearn.kernel_ridge import KernelRidge
import pickle

# sex age   state   income   politics
# -1  0.27  0 1 0   0.7610   0 0 1
# +1  0.19  0 0 1   0.6550   1 0 0

# -----------------------------------------------------------

def accuracy(model, data_X, data_y, pct_close):
# correct within pct of true income
n_correct = 0; n_wrong = 0

for i in range(len(data_X)):
X = data_X[i].reshape(1, -1)  # make one-item batch
y = data_y[i]
pred = model.predict(X)       # predicted income

if np.abs(pred - y) "lt" np.abs(pct_close * y):
n_correct += 1
else:
n_wrong += 1
acc = (n_correct * 1.0) / (n_correct + n_wrong)
return acc

# -----------------------------------------------------------

def main():
print("\nBegin kernel ridge regression using scikit demo ")
print("Predict income from sex, age, State, political ")

# 0. prepare
np.random.seed(1)

train_file = ".\\Data\\people_train.txt"
dtype=np.float32)
train_X = train_xy[:,[0,1,2,3,4,6,7,8]]
train_y = train_xy[:,5].flatten()  # 1D required

print("\nX = ")
print(train_X[0:4,:])
print(" . . . ")
print("\ny = ")
print(train_y[0:4])
print(" . . . ")

test_file = ".\\Data\\people_test.txt"
dtype=np.float32)
test_X = test_xy[:,[0,1,2,3,4,6,7,8]]
test_y = test_xy[:,5].flatten()  # 1D required

# -----------------------------------------------------------

# 2. create and train KRR model
print("\nCreating and training KRR poly(4) model ")
# KernelRidge(alpha=1.0, *, kernel='linear', gamma=None,
#   degree=3, coef0=1, kernel_params=None
# ['additive_chi2', 'chi2', 'linear', 'poly', 'polynomial',
#  'rbf', 'laplacian', 'sigmoid', 'cosine']
model = KernelRidge(alpha=1.0, kernel='poly', degree=4)
model.fit(train_X, train_y)

# 3. compute model accuracy
acc_train = accuracy(model, train_X, train_y, 0.10)
print("\nAccuracy on train data = %0.4f " % acc_train)
acc_test = accuracy(model, test_X, test_y, 0.10)
print("Accuracy on test data = %0.4f " % acc_test)

# 4. make a prediction
print("\nPredicting income for M 34 Oklahoma moderate: ")
X = np.array([[-1, 0.34, 0,0,1,  0,1,0]],
dtype=np.float32)
pred_inc = model.predict(X)
print("\$%0.2f" % (pred_inc * 100_000))  # un-normalized

# 5. save model
print("\nSaving model ")
fn = ".\\Models\\krr_model.pkl"
with open(fn,'wb') as f:
pickle.dump(model, f)