Ordinal classification, also called ordinal regression, is a multi-class classification problem where the class labels to predict are ordered, for example, 0 = “poor”, 1 = “average”, 2 = “good”. You could just do normal classification, but then you don’t take advantage of the ordering information that’s contained in the training data.

There are dozens of very complicated old machine learning techniques for ordinal classification that are based on logistic regression. But using a neural network approach is more effective. I wrote a demo program using PyTorch to demonstrate.

My problem data is the Boston Housing dataset. The goal is to predict the median house price of one of 506 towns near Boston. There are 13 predictor variables — crime rate in town, tax rate in town, proportion of Black residents in town, and so on. The original Boston dataset contains the median price of a house in each town, divided by $1,000 — like 35.00 for $35,000 (the data is from the 1970s when house prices were low). To convert the data to an ordinal regression problem, I mapped the house prices like so:

price class count [$0 to $10,000) 0 24 [$10,000 to $20,000) 1 191 [$20,000 to $30,000) 2 207 [$30,000 to $40,000) 3 53 [$40,000 to $50,000] 4 31 --- 506

The technique I use for ordinal classification is something I invented myself, at least as far as I know. I’ve never seen the technique I used anywhere else, but it’s not too complicated and so it could exist under an obscure name of some sort.

For the modified Boston Housing dataset there are k = 5 classes. The class target values in the training data are (0, 1, 2, 3, 4). My neural network system outputs a single numeric value between 0.0 and 1.0 — for example 0.2345. The class target values of (0, 1, 2, 3, 4) generate associated sub-targets of (0.1, 0.3, 0.5, 0.7, 0.9).

One approach is to manually process the training data class labels. You’d replace class = 0 with 0.10, class = 1 with 0.3, class = 2 with 0.5, class = 3 with 0.7, class = 4 with 0.9. The 0.1, 0.3, 05, 0.7, 0.9 come (indirectly) from 1.0 / 5 where 5 is the number of classes. You’d define a neural network that outputs a single value between 0.0 and 1.0. And then you’d use MSELoss() (mean squared error loss).

Another approach is to leave the training data classes (0, 1, 2, 3, 4) alone and programmatically computing the appropriate (0,1, 0.3, 0.5, 0.7, 0.9).

The key to the programmatic approach is defining a custom loss function. If a computed output is 0.44 and the target label is 2, the error is (0.44 – 0.5)^2.

*Update: I discovered a third approach that seems better: leave the data file class labels as 0, 1, 2, 3, 4 then when reading the data into a PyTorch Dataset, programmatically convert 0 to 0.1, 1 to 0.3, etc. This approach doesn’t require a custom loss function — standard MSELoss() works now. I will post a demo of this simplified technique soon.*

Ordinal classification is conceptually easy, but the implementation details are surprisingly complex. I consider myself an expert at spinning up basic PyTorch classifiers, but even so, my ordinal classification demo took me about four hours of concentrated effort.

Anyway, I guess the moral of the story is that my exploration of ordinal classification was a lot of work, but it was very interesting and I learned quite a few new tricks that will likely be useful in the future.

*Beauty contests are a form of ordinal classification. Here are three job-specific beauty contests from around the world. Left: Contestants from the Miss Philippines National Policewoman pageant. Center: Contestants from the Miss Russia Penal System pageant for women prison officers. Right: Contestants from the Miss Kazakhstan Army pageant.*

*I don’t think I could ever judge a beauty pageant. Physical attractiveness is too subjective and temporal. I suspect that if anyone looked at photos of old beauty pageants, they’d find that the template for beauty was probably different than it is now. But I can identify what I believe are beautiful algorithms and software systems.
*

Code below. Very long.

# boston_ordinal.py # ordinal regression on Boston Housing (modified) dataset # PyTorch 1.9.0-CPU Anaconda3-2020.02 Python 3.7.6 # Windows 10 import numpy as np import torch as T device = T.device("cpu") # ----------------------------------------------------------- # crime zoning indus river nox rooms oldness dist access # 0 1 2 3 4 5 6 7 8 # tax pup_tch black low_stat med_val 9 10 11 12 13 class BostonDataset(T.utils.data.Dataset): # features are in cols [0,12], median price in [13] def __init__(self, src_file): all_xy = np.loadtxt(src_file, usecols=range(0,14), delimiter="\t", comments="#", dtype=np.float32) tmp_x = all_xy[:,[0,1,2,3,4,5,6,7,8,9,10,11,12]] tmp_y = all_xy[:,13].reshape(-1,1) # 2-D required self.x_data = T.tensor(tmp_x, dtype=T.float32).to(device) self.y_data = T.tensor(tmp_y, dtype=T.int64).to(device) def __len__(self): return len(self.x_data) def __getitem__(self, idx): preds = self.x_data[idx] # all cols price = self.y_data[idx] # all cols return (preds, price) # tuple of two matrices # ----------------------------------------------------------- class Net(T.nn.Module): def __init__(self): super(Net, self).__init__() self.hid1 = T.nn.Linear(13, 10) # 13-(10-10)-1 self.hid2 = T.nn.Linear(10, 10) self.oupt = T.nn.Linear(10, 1) T.nn.init.xavier_uniform_(self.hid1.weight) # glorot T.nn.init.zeros_(self.hid1.bias) T.nn.init.xavier_uniform_(self.hid2.weight) T.nn.init.zeros_(self.hid2.bias) T.nn.init.xavier_uniform_(self.oupt.weight) T.nn.init.zeros_(self.oupt.bias) def forward(self, x): z = T.relu(self.hid1(x)) # or T.nn.Tanh() z = T.relu(self.hid2(z)) z = T.sigmoid(self.oupt(z)) return z # ----------------------------------------------------------- def make_ord_targets(k): result = np.zeros(k, dtype=np.float32) start = 1.0 / (2 * k) delta = 1.0 / k for i in range(k): result[i] = start + (i * delta) return result def ordinal_loss(output, ord_targets, target): # target is like [[3],[1], . . ] # ord_targets is an array of float like [0.2, 0.4, 0.6, 0.8] specific_targets = T.tensor(ord_targets[target], dtype=T.float32) loss = T.mean( (output - specific_targets)**2 ) return loss # ----------------------------------------------------------- def ordinal_loss_old(output, target, k): # somewhat inefficient # loss = torch.mean((output - target)**2) # MSE loss = T.mean( (output - ( (2 * target + 1) / (2 * k) ) )**2 ) return loss # ----------------------------------------------------------- def make_endpoints(k): result = np.zeros(k+1, dtype=np.float32) delta = 1.0 / k for i in range(k): result[i] = i * delta result[k] = 1.0 return result def accuracy(model, ds, k): n_correct = 0; n_wrong = 0 end_pts = make_endpoints(k) # like [0.0, 0.2, 0.4, 0.6, 0.8, 1.0] for i in range(len(ds)): # each input (X, y) = ds[i] # (predictors, target) with T.no_grad(): # y is like [2] oupt = model(X) # oupt is in [0.0, 1.0] if oupt "gte" end_pts[y.item()] and oupt "lt" end_pts[y.item()+1]: n_correct += 1 else: n_wrong += 1 acc = (n_correct * 1.0) / (n_correct + n_wrong) return acc def oupt_to_class(oupt, k): # end_pts = make_endpoints(k) # like [0.0, 0.2, 0.4, 0.6, 0.8, 1.0] for i in range(k-1): if oupt "gte" end_pts[i] and oupt "lt" end_pts[i+1]: return i return -1 # error # ----------------------------------------------------------- def train(net, ds, k, bs, lr, me, le): train_ldr = T.utils.data.DataLoader(ds, batch_size=bs, shuffle=True) opt = T.optim.Adam(net.parameters(), lr=lr) ord_targets = make_ord_targets(k) # like [0.1, 0.3, 0.5, 0.7, 0.9] for epoch in range(0, me): # T.manual_seed(1+epoch) # recovery reproducibility epoch_loss = 0 # for one full epoch for (b_idx, batch) in enumerate(train_ldr): (X, y) = batch # (predictors, targets) opt.zero_grad() # prepare gradients oupt = net(X) # predicted prices # loss_val = ordinal_loss_old(oupt, y, k=5) loss_val = ordinal_loss(oupt, ord_targets, y) epoch_loss += loss_val.item() # accumulate avgs loss_val.backward() # compute gradients opt.step() # update wts if epoch % le == 0: print("epoch = %4d loss = %0.4f" % \ (epoch, epoch_loss)) # TODO: save checkpoint # ----------------------------------------------------------- def main(): # 0. get started print("\nBegin predict Boston ordinal regression (price) ") T.manual_seed(1) np.random.seed(1) # 1. create DataLoader object print("Creating Boston Dataset object ") train_file = ".\\Data\\boston_ordinal.txt" train_ds = BostonDataset(train_file) # 506 rows # 2. create network net = Net().to(device) net.train() # set mode # 3. train model k = 5 # price: very lo, lo, med, hi, very hi bat_size = 10 lrn_rate = 0.010 max_epochs = 500 log_every = 100 print("\nbat_size = %3d " % bat_size) print("lrn_rate = %0.3f " % lrn_rate) print("loss = custom ordinal loss") print("optimizer = Adam") print("max_epochs = %3d " % max_epochs) print("\nStarting training ") train(net, train_ds, k, bat_size, lrn_rate, max_epochs, log_every) print("Training complete ") # 4. evaluate model accuracy print("\nComputing model accuracy") net.eval() acc_train = accuracy(net, train_ds, k) print("Accuracy on train data = %0.4f" % \ acc_train) # 5. use model to make a prediction np.set_printoptions(precision=6, suppress=True, sign=" ") x = np.array([[0.000063, 0.18, 0.0231, -1, 0.538, 0.6575, 0.652, 0.0409, 0.0100, 0.296, 0.153, 0.3969, 0.0498]], dtype=np.float32) # expected class = 2 print("\nPredicting house price for: ") print(x) x = T.tensor(x, dtype=T.float32) with T.no_grad(): oupt = net(x) print("\npredicted price logit = %0.4f " % oupt) c = oupt_to_class(oupt, k) print("predicted class label = %d " % c) print("\nEnd Boston ordinal price demo") if __name__ == "__main__": main()

You must be logged in to post a comment.