Ordinal Classification for the Boston Housing Dataset Using PyTorch

Ordinal classification, also called ordinal regression, is a multi-class classification problem where the class labels to predict are ordered, for example, 0 = “poor”, 1 = “average”, 2 = “good”. You could just do normal classification, but then you don’t take advantage of the ordering information that’s contained in the training data.

There are dozens of very complicated old machine learning techniques for ordinal classification that are based on logistic regression. But using a neural network approach is more effective. I wrote a demo program using PyTorch to demonstrate.

My problem data is the Boston Housing dataset. The goal is to predict the median house price of one of 506 towns near Boston. There are 13 predictor variables — crime rate in town, tax rate in town, proportion of Black residents in town, and so on. The original Boston dataset contains the median price of a house in each town, divided by $1,000 — like 35.00 for $35,000 (the data is from the 1970s when house prices were low). To convert the data to an ordinal regression problem, I mapped the house prices like so:

       price          class  count
[$0      to $10,000)    0      24
[$10,000 to $20,000)    1     191
[$20,000 to $30,000)    2     207
[$30,000 to $40,000)    3      53
[$40,000 to $50,000]    4      31
                              ---
                              506

The technique I use for ordinal classification is something I invented myself, at least as far as I know. I’ve never seen the technique I used anywhere else, but it’s not too complicated and so it could exist under an obscure name of some sort.

For the modified Boston Housing dataset there are k = 5 classes. The class target values in the training data are (0, 1, 2, 3, 4). My neural network system outputs a single numeric value between 0.0 and 1.0 — for example 0.2345. The class target values of (0, 1, 2, 3, 4) generate associated sub-targets of (0.1, 0.3, 0.5, 0.7, 0.9).

One approach is to manually process the training data class labels. You’d replace class = 0 with 0.10, class = 1 with 0.3, class = 2 with 0.5, class = 3 with 0.7, class = 4 with 0.9. The 0.1, 0.3, 05, 0.7, 0.9 come (indirectly) from 1.0 / 5 where 5 is the number of classes. You’d define a neural network that outputs a single value between 0.0 and 1.0. And then you’d use MSELoss() (mean squared error loss).

Another approach is to leave the training data classes (0, 1, 2, 3, 4) alone and programmatically computing the appropriate (0,1, 0.3, 0.5, 0.7, 0.9).
The key to the programmatic approach is defining a custom loss function. If a computed output is 0.44 and the target label is 2, the error is (0.44 – 0.5)^2.

Update: I discovered a third approach that seems better: leave the data file class labels as 0, 1, 2, 3, 4 then when reading the data into a PyTorch Dataset, programmatically convert 0 to 0.1, 1 to 0.3, etc. This approach doesn’t require a custom loss function — standard MSELoss() works now. I will post a demo of this simplified technique soon.

Ordinal classification is conceptually easy, but the implementation details are surprisingly complex. I consider myself an expert at spinning up basic PyTorch classifiers, but even so, my ordinal classification demo took me about four hours of concentrated effort.

Anyway, I guess the moral of the story is that my exploration of ordinal classification was a lot of work, but it was very interesting and I learned quite a few new tricks that will likely be useful in the future.



Beauty contests are a form of ordinal classification. Here are three job-specific beauty contests from around the world. Left: Contestants from the Miss Philippines National Policewoman pageant. Center: Contestants from the Miss Russia Penal System pageant for women prison officers. Right: Contestants from the Miss Kazakhstan Army pageant.

I don’t think I could ever judge a beauty pageant. Physical attractiveness is too subjective and temporal. I suspect that if anyone looked at photos of old beauty pageants, they’d find that the template for beauty was probably different than it is now. But I can identify what I believe are beautiful algorithms and software systems.


Code below. Very long.

# boston_ordinal.py
# ordinal regression on Boston Housing (modified) dataset

# PyTorch 1.9.0-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10

import numpy as np
import torch as T
device = T.device("cpu")

# -----------------------------------------------------------
# crime zoning indus river nox rooms oldness dist access
#  0     1      2     3     4   5     6       7    8      
# tax pup_tch black low_stat med_val
   9   10      11    12       13

class BostonDataset(T.utils.data.Dataset):
  # features are in cols [0,12], median price in [13]

  def __init__(self, src_file):
    all_xy = np.loadtxt(src_file, usecols=range(0,14),
      delimiter="\t", comments="#", dtype=np.float32)

    tmp_x = all_xy[:,[0,1,2,3,4,5,6,7,8,9,10,11,12]]
    tmp_y = all_xy[:,13].reshape(-1,1)    # 2-D required

    self.x_data = T.tensor(tmp_x, dtype=T.float32).to(device) 
    self.y_data = T.tensor(tmp_y, dtype=T.int64).to(device)

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    preds = self.x_data[idx]  # all cols
    price = self.y_data[idx]  # all cols
    return (preds, price)     # tuple of two matrices

# -----------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(13, 10)  # 13-(10-10)-1
    self.hid2 = T.nn.Linear(10, 10)
    self.oupt = T.nn.Linear(10, 1)

    T.nn.init.xavier_uniform_(self.hid1.weight)  # glorot
    T.nn.init.zeros_(self.hid1.bias)
    T.nn.init.xavier_uniform_(self.hid2.weight)
    T.nn.init.zeros_(self.hid2.bias)
    T.nn.init.xavier_uniform_(self.oupt.weight)
    T.nn.init.zeros_(self.oupt.bias)

  def forward(self, x):
    z = T.relu(self.hid1(x))  # or T.nn.Tanh()
    z = T.relu(self.hid2(z))
    z = T.sigmoid(self.oupt(z)) 
    return z

# -----------------------------------------------------------

def make_ord_targets(k):
  result = np.zeros(k, dtype=np.float32)
  start = 1.0 / (2 * k)
  delta = 1.0 / k
  for i in range(k):
    result[i] = start + (i * delta)
  return result

def ordinal_loss(output, ord_targets, target):
  # target is like [[3],[1], . . ]
  # ord_targets is an array of float like [0.2, 0.4, 0.6, 0.8]
  specific_targets = T.tensor(ord_targets[target], dtype=T.float32)
  loss = T.mean( (output - specific_targets)**2 )
  return loss

# -----------------------------------------------------------

def ordinal_loss_old(output, target, k):   # somewhat inefficient
  # loss = torch.mean((output - target)**2)  # MSE
  loss = T.mean( (output - ( (2 * target + 1) / (2 * k) ) )**2 )
  return loss

# -----------------------------------------------------------

def make_endpoints(k):
  result = np.zeros(k+1, dtype=np.float32)
  delta = 1.0 / k
  for i in range(k):
    result[i] = i * delta
  result[k] = 1.0
  return result

def accuracy(model, ds, k):
  n_correct = 0; n_wrong = 0
  end_pts = make_endpoints(k)  # like [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]

  for i in range(len(ds)):    # each input
    (X, y) = ds[i]            # (predictors, target)
    with T.no_grad():         # y is like [2]
      oupt = model(X)         # oupt is in [0.0, 1.0]

    if oupt "gte" end_pts[y.item()] and oupt "lt" end_pts[y.item()+1]:
      n_correct += 1
    else:
      n_wrong += 1 

  acc = (n_correct * 1.0) / (n_correct + n_wrong)
  return acc

def oupt_to_class(oupt, k):
  #
  end_pts = make_endpoints(k)  # like [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
  for i in range(k-1):
    if oupt "gte" end_pts[i] and oupt "lt" end_pts[i+1]:
      return i
  return -1  # error
  
# -----------------------------------------------------------

def train(net, ds, k, bs, lr, me, le):
  train_ldr = T.utils.data.DataLoader(ds,
    batch_size=bs, shuffle=True)
  opt = T.optim.Adam(net.parameters(), lr=lr)

  ord_targets = make_ord_targets(k)  # like [0.1, 0.3, 0.5, 0.7, 0.9]
  for epoch in range(0, me):
    # T.manual_seed(1+epoch)  # recovery reproducibility
    epoch_loss = 0  # for one full epoch

    for (b_idx, batch) in enumerate(train_ldr):
      (X, y) = batch           # (predictors, targets)
      opt.zero_grad()          # prepare gradients
      oupt = net(X)            # predicted prices

      # loss_val = ordinal_loss_old(oupt, y, k=5)
      loss_val = ordinal_loss(oupt, ord_targets, y)

      epoch_loss += loss_val.item()  # accumulate avgs
      loss_val.backward()            # compute gradients
      opt.step()                     # update wts

    if epoch % le == 0:
      print("epoch = %4d   loss = %0.4f" % \
       (epoch, epoch_loss))
      # TODO: save checkpoint

# -----------------------------------------------------------

def main():
  # 0. get started
  print("\nBegin predict Boston ordinal regression (price)  ")
  T.manual_seed(1)   
  np.random.seed(1)
  
  # 1. create DataLoader object
  print("Creating Boston Dataset object ")
  train_file = ".\\Data\\boston_ordinal.txt"
  train_ds = BostonDataset(train_file)  # 506 rows

  # 2. create network
  net = Net().to(device)
  net.train()  # set mode

  # 3. train model
  k = 5  # price: very lo, lo, med, hi, very hi
  bat_size = 10
  lrn_rate = 0.010
  max_epochs = 500
  log_every = 100

  print("\nbat_size = %3d " % bat_size)
  print("lrn_rate = %0.3f " % lrn_rate)
  print("loss = custom ordinal loss")
  print("optimizer = Adam")
  print("max_epochs = %3d " % max_epochs)

  print("\nStarting training ")
  train(net, train_ds, k, bat_size, lrn_rate, max_epochs, log_every)
  print("Training complete ")

  # 4. evaluate model accuracy
  print("\nComputing model accuracy")
  net.eval()
  acc_train = accuracy(net, train_ds, k) 
  print("Accuracy on train data = %0.4f" % \
    acc_train)

  # 5. use model to make a prediction
  np.set_printoptions(precision=6, suppress=True, sign=" ")
  x = np.array([[0.000063, 0.18, 0.0231, -1, 0.538, 0.6575,
                0.652, 0.0409, 0.0100, 0.296, 0.153, 0.3969,
                0.0498]], dtype=np.float32)  # expected class = 2
  print("\nPredicting house price for: ")
  print(x)
  x = T.tensor(x, dtype=T.float32)
  with T.no_grad():
    oupt = net(x)
  print("\npredicted price logit = %0.4f " % oupt)
  c = oupt_to_class(oupt, k)
  print("predicted class label = %d " % c)

  print("\nEnd Boston ordinal price demo")

if __name__ == "__main__":
  main()
This entry was posted in PyTorch. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s