Simple Ordinal Classification Using PyTorch

I was chatting with some of my colleagues at work about the topic of ordinal classification, also known as ordinal regression. An ordinal classification problem is a multi-class classification problem where the class labels to predict are ordered, for example, “poor”, “average”, “good”.

The problem scenario is best explained by example. Suppose you want to predict the price of a house, where a house price is an ordinal value (0 = low, 1 = medium, 2 = high, 3 = very high) rather than a numeric value such as $525,000. There are dozens of rather complicated old machine learning techniques for ordinal classification that are based on logistic regression. But using a neural network approach is easy and effective. I wrote a demo program using PyTorch to demonstrate.

Continuing with the ordinal house price example, you define a neural network that has one output node. You use logistic sigmoid activtion on the output node so that a computed output value is between 0.0 and 1.0. Then:

output between 0.00 and 0.25 correspond to class 0 (low price)
output between 0.25 and 0.50 correspond to class 1 (medium)
output between 0.50 and 0.75 correspond to class 2 (high)
output between 0.75 and 1.00 correspond to class 3 (very_high)

Now to train the network, if a training item house is class 0, you want to define a loss function so that the network adjusts its weights to make the output close to the center of its range. This is halfway between 0.00 and 0.25 = 0.125. Similarly:

training label    output node target
     0              0.125
     1              0.375
     2              0.625
     3              0.875


If there are k=4 ordinal classes, and if a training item has class 0 as a target, the computed output of the neural network should be 0.125. And so on.

Therefore the neural network loss function compares the target value from the training data (0 to 3) with the values in the table above. I used mean squared difference. For this example, the number of ordinal classes is k = 4. If t is the target label (0 to 3), the “output node target” values are computed as (2 * t + 1) / (2 * k). For example, if t = 3, then (2 * t + 1) / (2 * k) = (2 * 3 + 1) / (2 * 4) = 7/8 = 0.875 as shown.

I implemented this idea for ordinal classification loss like so:

def ordinal_loss(output, target, k):
  # loss = torch.mean((output - target)**2)  # MSE
  loss = T.mean((output - ((2 * target + 1) / (2 * k)))**2)
  return loss

For a specific problem, the number of class labels will be fixed, so you could just hard-code the target values in an array in the loss function, such as targets = np.array([0.125, 0.375, 0.625, 0.875]).

My demo program used 200 synthetic data items for training. The data looks like:

AC    sq. feet  style     price   school
-1    0.3075    1  0  0    3      0  1  0
-1    0.2700    1  0  0    2      0  0  1
 1    0.1700    0  1  0    1      0  0  1
-1    0.1475    1  0  0    1      1  0  0
 1    0.2000    1  0  0    2      1  0  0
-1    0.1100    0  0  1    0      1  0  0
. . .

The predictors are air conditioning (-1 = no, 1 = yes), area in square feet (normalized), style (art_deco, bungalow, colonial), and local elementary school (johnson, kennedy, lincoln).

My results were very good.

However, I have some questions in my mind. I spent quite a bit of time searching the Internet for “ordinal regression” and “ordinal classification” and found all kinds of very complicated techniques, but I didn’t find the idea I used for my demo. This idea was the very first thing that popped into my head, and it’s very obvious. I don’t know why I didn’t find any information about this technique — I thought that someone surely must of investigated the idea.

So, there are three possibilities. First, my idea for ordinal classification could have some fatal logic flaw I’m not seeing and I just got very lucky with my demo. Second, maybe nobody has ever tried my idea before because it requires creating a custom neural loss function, which sounds scary (but isn’t). Third, perhaps the technique has been tried and is well known, but is called by some special name and so I didn’t find it during my Internet research.

I’ll continue exploring ordinal classification to see if I can solve the mysterious situation.



There were a lot of Mysterious movies in the 1930s and 40s.

Left: “Mysterious Mr. Moto” (1938) features a clever Japanese detective played by actor Peter Lorre. Moto infiltrates a gang of assassins to stop an evil plot.

Left Center: “The Mysterious Miss X” (1939) is a story about two out-of-work actors who are mistaken for detectives. They solve the murder of a rich businessman (it was the lawyer) and one finds romance with the dead man’s daughter.

Right Center: “The Mysterious Dr. Fu Manchu” (1929) tells the origin story of the evil Chinese mastermind (played by Warner Oland, who later played detective Charlie Chan throughout the 1930s). Fu Manchu attempts to murder the people he believes are responsible for his wife’s death. He is thwarted by Scotland Yard Inspector Nayland Smith and his assistant Dr. Jack Petrie.

Right: “The Mysterious Mr. M” (1946) takes place after WWII when criminal Anthony Waldron has developed a mind-control drug and he intends to use it to steal plans for a submarine engine. A mysterious villain named Mr. M appears and muscles in on the action. Agent Grant Farrell eventually stops both evil Waldron and evil Mr. M, who turns out to be Waldron’s sister.

This entry was posted in PyTorch. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s