## A Simplified Approach for Ordinal Classification

In a standard classification problem, the goal is to predict a class label. For example, in the Iris Dataset problem, the goal is to predict a species of flower: 0 = “setosa”, 1 = “versicolor”, 2 = “virginica”. Here the class labels are just labels wthout any meaning attache to the order. In an ordinal classification problem (also called ordinal regression), the class labels have order. For example, you might want to predict the median price of a house in one of 506 towns, where price can be 0 = very low, 1 = low, 2 = medium, 3 = high, 4 = very high. For an ordinal classification problem, you could just use standard classification, but that approach doesn’t take advantage of the ordering information in the training data. I coded up a demo of a simple technique using the PyTorch code library. The same technique can be used with Keras/TensorFlow too.

I used a modified version of the Boston Housing dataset. There are 506 data items. Each item is a town near Boston. There are 13 predictor variables — crime rate in town, tax rate in town, proportion of Black residents in town, and so on. The original Boston dataset contains the median price of a house in each town, divided by \$1,000 — like 35.00 for \$35,000 (the data is from the 1970s when house prices were low). To convert the data to an ordinal classification problem, I mapped the house prices like so:

```       price          class  count
[\$0      to \$10,000)    0      24
[\$10,000 to \$20,000)    1     191
[\$20,000 to \$30,000)    2     207
[\$30,000 to \$40,000)    3      53
[\$40,000 to \$50,000]    4      31
---
506

```

I normalized the numeric predictor values by dividing by a constant so that each normalized value is between -1.0 and +1.0. I encoded the single Boolean predictor value (does town border the Charles River) as -1 (no), +1 (yes).

The technique I used for ordinal classification is something I invented myself, at least as far as I know. I’ve never seen the technique I used anywhere else, but it’s not too complicated and so it could exist under an obscure name of some sort.

For the modified Boston Housing dataset there are k = 5 classes. The class target values in the training data are (0, 1, 2, 3, 4). My neural network system outputs a single numeric value between 0.0 and 1.0 — for example 0.2345. The class target values of (0, 1, 2, 3, 4) generate associated floating point sub-targets of (0.1, 0.3, 0.5, 0.7, 0.9). When I read the data into memory as a PyTorch Dataset object, I map each ordinal class label to the associated floating point target. Then I use standard MSELoss() to train the network.

Suppose a data item has class label = 3 (high price). The target value for that item is stored as 0.7. The computed predicted price will be something like 0.66 (close to target, so low MSE error and a correct prediction) or maybe 0.23 (far from target, so high MSE error and a wrong prediction). With this scheme, the ordering information is used.

For implementation, most of the work is done inside the Dataset object:

```class BostonDataset(T.utils.data.Dataset):
# features are in cols [0,12], median price as int in 

def __init__(self, src_file, k):
# k is for class_to_target_program()

n = len(tmp_y)
float_targets = np.zeros(n, dtype=np.float32)  # 1D

for i in range(n):  # hard-coded is easy to understand
if tmp_y[i] == 0: float_targets[i] = 0.1
elif tmp_y[i] == 1: float_targets[i] = 0.3
elif tmp_y[i] == 2: float_targets[i] = 0.5
elif tmp_y[i] == 3: float_targets[i] = 0.7
elif tmp_y[i] == 4: float_targets[i] = 0.9
else: print("Fatal logic error ")

float_targets = np.reshape(float_targets, (-1,1))  # 2D

self.x_data = \
T.tensor(tmp_x, dtype=T.float32).to(device)
self.y_data = \
T.tensor(float_targets, dtype=T.float32).to(device)

def __len__(self):
return len(self.x_data)

def __getitem__(self, idx):
preds = self.x_data[idx]  # all cols
price = self.y_data[idx]  # all cols
return (preds, price)     # tuple of two matrices
```

There are a few minor, but very tricky details. They’d take much too long too explain in a blog post, so I’ll just say that if you’re interested, examine the code very carefully. I don’t think it’s possible to assign a strictly numeric value to art. Here are two clever illustrations by artist Casimir Lee. I like the bright colors and combination of 1920s art deco style with 1960s psychedelic style.

Code below. Long.

```# boston_ordinal_simplified.py
# ordinal regression on Boston Housing dataset
# data class labels are 0,1,2,3,4 - on fly convert to
#  targets 0.1, 0.3, 0.5, 0.7, 0.9 -- 1/(k+1)

# PyTorch 1.9.0-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10

import numpy as np
import torch as T
device = T.device("cpu")

# -----------------------------------------------------------
# crime zoning indus river nox rooms oldness dist access
#  0     1      2     3     4   5     6       7    8
# tax pup_tch black low_stat med_val
#   9   10      11    12       13

class BostonDataset(T.utils.data.Dataset):
# features are in cols [0,12], median price as int in 

def __init__(self, src_file, k):
# k is for class_to_target_program()

n = len(tmp_y)
float_targets = np.zeros(n, dtype=np.float32)  # 1D

for i in range(n):  # hard-coded is easy to understand
if tmp_y[i] == 0: float_targets[i] = 0.1
elif tmp_y[i] == 1: float_targets[i] = 0.3
elif tmp_y[i] == 2: float_targets[i] = 0.5
elif tmp_y[i] == 3: float_targets[i] = 0.7
elif tmp_y[i] == 4: float_targets[i] = 0.9
else: print("Fatal logic error ")

float_targets = np.reshape(float_targets, (-1,1))  # 2D

self.x_data = \
T.tensor(tmp_x, dtype=T.float32).to(device)
self.y_data = \
T.tensor(float_targets, dtype=T.float32).to(device)

def __len__(self):
return len(self.x_data)

def __getitem__(self, idx):
preds = self.x_data[idx]  # all cols
price = self.y_data[idx]  # all cols
return (preds, price)     # tuple of two matrices

# -----------------------------------------------------------

class Net(T.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.hid1 = T.nn.Linear(13, 10)  # 13-(10-10)-1
self.hid2 = T.nn.Linear(10, 10)
self.oupt = T.nn.Linear(10, 1)

T.nn.init.xavier_uniform_(self.hid1.weight)  # glorot
T.nn.init.zeros_(self.hid1.bias)
T.nn.init.xavier_uniform_(self.hid2.weight)
T.nn.init.zeros_(self.hid2.bias)
T.nn.init.xavier_uniform_(self.oupt.weight)
T.nn.init.zeros_(self.oupt.bias)

def forward(self, x):
z = T.relu(self.hid1(x))  # or T.nn.Tanh()
z = T.relu(self.hid2(z))
z = T.sigmoid(self.oupt(z))
return z

# -----------------------------------------------------------

def class_to_target(c, k):
if c == 0: return 0.1
elif c == 1: return 0.3
elif c == 2: return 0.5
elif c == 3: return 0.6
elif c == 4: return 0.9

def class_to_target_program(c, k):
# mildly inefficient to compute targets every time
targets = np.zeros(k, dtype=np.float32)
start = 1.0 / (2 * k)
delta = 1.0 / k
for i in range(k):
targets[i] = start + (i * delta)
return targets[c]

# ----------------------------------------------------------

def oupt_to_class(oupt, k):
if oupt "gte" 0.0 and oupt "lt" 0.2: return 0
elif oupt "gte" 0.2 and oupt "lt" 0.4: return 1
elif oupt "gte" 0.4 and oupt "lt" 0.6: return 2
elif oupt "gte" 0.6 and oupt "lt" 0.8: return 3
elif oupt "gte" 0.8 and oupt "lte" 1.0: return 4

def oupt_to_class_program(oupt, k):
# mildly inefficient to compute end_pts every time
end_pts = np.zeros(k+1, dtype=np.float32)
delta = 1.0 / k
for i in range(k):
end_pts[i] = i * delta
end_pts[k] = 1.0
# [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]

for i in range(k):
if oupt "gte" end_pts[i] and oupt "lte" end_pts[i+1]:
return i
return -1  # fatal error

# -----------------------------------------------------------

def accuracy(model, ds, k):
n_correct = 0; n_wrong = 0
delta = (1.0 / k) / 2

for i in range(len(ds)):    # each input
(X, y) = ds[i]            # (predictors, target)
with T.no_grad():         # y target is like 0.3
oupt = model(X)         # oupt is in [0.0, 1.0]

if T.abs(oupt - y) <= delta:
n_correct += 1
else:
n_wrong += 1

acc = (n_correct * 1.0) / (n_correct + n_wrong)
return acc

# -----------------------------------------------------------

def train(net, ds, bs, lr, me, le):
# network, dataset, batch_size, learn_rate,
# max_epochs, log_every
batch_size=bs, shuffle=True)
loss_func = T.nn.MSELoss()

for epoch in range(0, me):
# T.manual_seed(1+epoch)  # recovery reproducibility
epoch_loss = 0  # for one full epoch

for (b_idx, batch) in enumerate(train_ldr):
(X, y) = batch           # (predictors, targets)
oupt = net(X)            # predicted prices

loss_val = loss_func(oupt, y)  # a tensor
epoch_loss += loss_val.item()  # accumulate
opt.step()           # update weights

if epoch % le == 0:
print("epoch = %4d   loss = %0.4f" % \
(epoch, epoch_loss))
# TODO: save checkpoint

# -----------------------------------------------------------

def main():
# 0. get started
print("\nBegin predict Boston ordinal regression (price) ")
print("Simplified version computes float targets on fly ")
T.manual_seed(1)
np.random.seed(1)

print("Creating Boston Dataset object ")
train_file = ".\\Data\\boston_ordinal.txt"
train_ds = BostonDataset(train_file, k=5)  # 5 classes

# 2. create network
net = Net().to(device)
net.train()  # set mode

# 3. train model
bat_size = 10
lrn_rate = 0.010
max_epochs = 500
log_every = 100

print("\nbat_size = %3d " % bat_size)
print("lrn_rate = %0.3f " % lrn_rate)
print("loss = MSELoss ")
print("max_epochs = %3d " % max_epochs)

print("\nStarting training ")
train(net, train_ds, bat_size, lrn_rate,
max_epochs, log_every)
print("Training complete ")

# 4. evaluate model accuracy
print("\nComputing model accuracy")
net.eval()
acc_train = accuracy(net, train_ds, k=5)
print("Accuracy on train data = %0.4f" % \
acc_train)

# 5. use model to make a prediction
np.set_printoptions(precision=6,
suppress=True, sign=" ")
x = np.array([[0.000063, 0.18, 0.0231, -1,
0.538, 0.6575, 0.652, 0.0409,
0.0100, 0.296, 0.153, 0.3969,
0.0498]], dtype=np.float32)  # '2'
print("\nPredicting house price for: ")
print(x)
x = T.tensor(x, dtype=T.float32)