The bottom line of this post is: If you use dropout in PyTorch, then you must explicitly set your model into evaluation mode by calling the eval() function mode when computing model output values.
Bear with me here, this is a bit tricky to explain. By default, a PyTorch neural network model is in train() mode. As long as there’s no dropout layer (or batch normalization) in the network, you don’t need to worry about train() mode vs. eval() mode.
But if your network has a dropout layer, then before you use the network to compute output values, you must explicitly set the network into eval() mode. The reason is that during training a dropout layer randomly sets some of its input to zero, which effectively erases them from the network, which makes the final trained network more robust and less prone to overfitting.
For example, suppose a 4-7-3 network for the Iris Dataset is defined like so:
class Net(T.nn.Module): def __init__(self): super(Net, self).__init__() self.hid1 = T.nn.Linear(4, 20) self.drop = T.nn.Dropout(p=0.5) # NOTE self.oupt = T.nn.Linear(20, 3) T.nn.init.xavier_uniform_(self.hid1.weight) T.nn.init.zeros_(self.hid1.bias) T.nn.init.xavier_uniform_(self.oupt.weight) T.nn.init.zeros_(self.oupt.bias) def forward(self, x): z = T.tanh(self.hid1(x)) z = self.drop(z) z = self.oupt(z) # CrossEntropyLoss adds softmax return z
After the network/model has been trained, to use the model to make a prediction you must explicitly set eval() mode like so:
# 6. make a prediction net = net.eval() # NOTE: important! unknown = np.array([[6.1, 3.1, 5.1, 1.1]], dtype=np.float32) unknown = T.tensor(unknown) logits = net(unknown) # values do not sum to 1.0 probs_t = T.softmax(logits, dim=1) # as Tensor probs = probs_t.detach().numpy() # to numpy array print("\nFor inputs equal to:") for x in unknown[0]: print("%0.1f " % x, end="") print("\nPredicted: (setosa, versicolor, virginica)") for p in probs[0]: print("%0.4f " % p, end="")
Here’s a demo run where I correctly set eval() mode before calling accuracy() during training, and also before computing output values after training. In another demo run, I commented out the code that sets the network model into eval() mode and I got different/incorrect results.
The eval() function returns a reference to self so the code could have been written as just net.eval() instead of net = net.eval(). Also, when using dropout in PyTorch, I believe it’s good style to explicitly set train() mode even though that’s the default mode:
# 3. train model net = net.train() # explicitly set lrn_rate = 0.01 bat_size = 16 loss_func = T.nn.CrossEntropyLoss() # . . . etc., etc.
And, if you write a program-defined accuracy() model, you need to remember to set eval() mode. For example:
def accuracy(model, data_x, data_y): model = model.eval() # NOTE X = T.Tensor(data_x) Y = T.LongTensor(data_y) oupt = model(X) (max_vals, arg_maxs) = T.max(oupt.data, dim=1) # arg_maxs is tensor of indices [0, 1, 0, 2, 1, 1 . . ] num_correct = T.sum(Y==arg_maxs) acc = (num_correct * 100.0 / len(data_y)) model = model.train() # NOTE return acc.item() # percentage format
Another scenario where you must remember to set eval() mode is when you save a trained model that has dropout, and then load the model from a different program.
I like PyTorch because it operates at a low level of abstraction, which gives me a lot of control. But there are quite a few pitfalls like this train() vs. eval() issue that can bite you.
Four paintings by artist Viktor Sheleg. I like the level of abstraction of his work – about halfway between photo-realistic and completely abstract.
You must be logged in to post a comment.