I’ve been using the PyTorch neural code library since it was first released, just over three years ago Recently, I’ve been refactoring a lot of my demo programs to update them to new PyTorch features and best practices.

During model training, it’s not too difficult to compute model error. But it’s surprisingly tricky to compute model classification accuracy. Classification accuracy is just the percentage of correct predictions.

There are many different approaches for computing PyTorch model accuracy but all the techniques fall into one of two categories: analyze the model one data item at a time, or analyze the model using one batch of all the data at once.

The one-item-at-a-time approach is more flexible and allows you to investigate exactly which data items were incorrectly predicted. The all-items-at-once approach has far fewer lines of code but isn’t as flexible.

In pseudo-code, the one-item-at-a-time approach is:

loop each data item X = input # like [2.5, 1.5, 3.0, 4.5] Y = target class # like 2 oupt = model(X) # computed like [0.3, 0.1, 0.6] pc = argmax(oupt) # predicted class # print X, Y, oupt, pc to see what happened if pc == Y num_correct += 1 else num_wrong += 1 end-loop return num_correct / (num_correct + num_wrong)

Translating this simple pseudo-code to working PyTorch code is difficult. Here’s an example I use when working with the well-known Iris Dataset:

def accuracy(model, dataset): model.eval() dataldr = T.utils.data.DataLoader(dataset, batch_size=1, shuffle=False) n_correct = 0; n_wrong = 0 for (_, batch) in enumerate(dataldr): X = batch['predictors'] Y = T.flatten(batch['species']) oupt = model(X) # logits form (big_val, big_idx) = T.max(oupt, dim=1) # print here if necessary if big_idx.item() == Y.item(): n_correct += 1 else: n_wrong += 1 acc = (n_correct * 100.0) / (n_correct + n_wrong) return acc

Because using Dataset and DataLoader objects is now the standard way to process training and test data, I use a Dataset object as the input parameter. I set the mode to eval() — this is a complex topic but briefly you use train() mode when training and use eval() mode at all other times.

The DataLoader object sets batch_size=1 to iterate one data item at a time. An alternative is to just iterate through the input parameter Dataset directly, without using a DataLoader.

Notice that DataLoader returns a Dictionary object so you need to know the keys which are ‘predictors’ and ‘species’. The implication is that you can’t really write a general purpose accuracy function — you need to craft a new accuracy() function for each problem scenario.

The T.max() function is like Python argmax() but T.max() returns both the largest value and the index of the largest value. PyTorch recently (not sure when but it’s within the last few versions) added a T.argmax() function. Notice the dim argument to T.max(). Dealing with Tensor shapes and dimensions is a real nightmare when developing models.

The target class needs to be accessed by Y.item() because Y is a tensor with just one value — another weird quirk of PyTorch that drives beginners crazy.

The batch all-items-at-once version is:

def accuracy_b(model, dataset): model.eval() X = dataset[0:len(dataset)]['predictors'] Y = T.flatten(dataset[0:len(dataset)]['species']) oupt = model(X) # (_, arg_maxs) = T.max(oupt, dim=1) arg_maxs = T.argmax(oupt, dim=1) # argmax() is new num_correct = T.sum(Y==arg_maxs) acc = (num_correct * 100.0 / len(dataset)) return acc.item()

The key concept is the statement:

num_correct = T.sum(Y==arg_maxs)

The == comparison compares all target Y values (no item() needed) with all arg_maxs values, and the T.sum() returns the count where target Y equals arg_max. Working with aggregates like this is something that’s quite difficult for many developers, including me, to get used to.

*Not so accurate school signs.*