One of the dirty little secrets of machine learning research is that in order to get a paper published, it’s almost always necessary to demonstrate improved results of some sort. And by setting the global random number seed to many different values, researchers can significantly adjust experimental results.
Setting the random seed typically has two major effects. First, the seed controls the initial values of a network weights and biases. Second, the seed controls the order in which training data is processed by a DataLoader object.
I put together a short demo experiment to illustrate the point. I used one of my standard multi-class classification demos. I used six different seed values (0, 3, 5, 363, 366, 999) to create and train a neural classifier. In pseudo-code:
loop many times set random seed value reload datasets create net (seed controls initial wts) train net (seed controls processing order) compute overall accuracy, error log results end-loop
Even with a tiny demo dataset of just 200 training items, classification accuracy ranged from 68.50% to 86.00% — a very wide range.
For complex neural systems, such as convolutional NNs for image classification or transformer architecture for natural language processing, the effect of the random number seed can be very large. See the paper “Torch.manual_seed(3407) is All You Need: On the Influence of Random Seeds in Deep Learning Architectures for Computer Vision” by D. Picard.
In research, the correct way to deal with the effect of the random seed is to run an experiment using different seed values then average the results.
I used to play a lot of golf with my pal Paul Ruiz when I lived in California and a.) had a lot of time, b.) had a lot of sunny weather. Now that I’m a.) older and have zero free time, b.) live in rainy Washington, my golf is limited to arcade games like this Williams Mini Golf from the mid 1960s. When I played real golf, my putting was pretty good put my driving was more like a random seed process.
Demo code. The data can be found at https://jamesmccaffrey.wordpress.com/2022/09/01/multi-class-classification-using-pytorch-1-12-1-on-windows-10-11/.
# people_seed_effect.py # predict politics type from sex, age, state, income # effect of different random seed values # PyTorch 1.12.1-CPU Anaconda3-2020.02 Python 3.7.6 # Windows 10/11 import numpy as np import torch as T device = T.device('cpu') # apply to Tensor or Module # ----------------------------------------------------------- class PeopleDataset(T.utils.data.Dataset): # sex age state income politics # -1 0.27 0 1 0 0.7610 2 # +1 0.19 0 0 1 0.6550 0 # sex: -1 = male, +1 = female # state: michigan, nebraska, oklahoma # politics: conservative, moderate, liberal def __init__(self, src_file): all_xy = np.loadtxt(src_file, usecols=range(0,7), delimiter="\t", comments="#", dtype=np.float32) tmp_x = all_xy[:,0:6] # cols [0,6) = [0,5] tmp_y = all_xy[:,6] # 1-D self.x_data = T.tensor(tmp_x, dtype=T.float32).to(device) self.y_data = T.tensor(tmp_y, dtype=T.int64).to(device) # 1-D def __len__(self): return len(self.x_data) def __getitem__(self, idx): preds = self.x_data[idx] trgts = self.y_data[idx] return preds, trgts # as a Tuple # ----------------------------------------------------------- class Net(T.nn.Module): def __init__(self): super(Net, self).__init__() self.hid1 = T.nn.Linear(6, 10) # 6-(10-10)-3 self.hid2 = T.nn.Linear(10, 10) self.oupt = T.nn.Linear(10, 3) T.nn.init.xavier_uniform_(self.hid1.weight) T.nn.init.zeros_(self.hid1.bias) T.nn.init.xavier_uniform_(self.hid2.weight) T.nn.init.zeros_(self.hid2.bias) T.nn.init.xavier_uniform_(self.oupt.weight) T.nn.init.zeros_(self.oupt.bias) def forward(self, x): z = T.tanh(self.hid1(x)) z = T.tanh(self.hid2(z)) z = T.log_softmax(self.oupt(z), dim=1) # NLLLoss() return z # ----------------------------------------------------------- def accuracy(model, dataset): # assumes model.eval() X = dataset[0:len(dataset)][0] Y = dataset[0:len(dataset)][1] with T.no_grad(): oupt = model(X) # [40,3] logits arg_maxs = T.argmax(oupt, dim=1) # argmax() is new num_correct = T.sum(Y==arg_maxs) acc = (num_correct * 1.0 / len(dataset)) return acc.item() # ----------------------------------------------------------- def overall_loss(model, ds, n_class): # MSE all-at-once version X = ds[0:len(ds)][0] # all X values Y = ds[0:len(ds)][1] # all targets, ordinal form with T.no_grad(): oupt = T.exp(model(X)) # pseudo-probs form YY = T.nn.functional.one_hot(Y, num_classes=n_class) delta = YY - oupt delta_sq = T.multiply(delta, delta) # not dot() sum_sq = T.sum(delta_sq, dim=1) # process rows mse = T.mean(sum_sq) return mse # ----------------------------------------------------------- def train(net, ds, opt, lr, bs, me): train_ldr = T.utils.data.DataLoader(ds, batch_size=bs, shuffle=True) loss_func = T.nn.NLLLoss() # assumes log_softmax() if opt == 'sgd': optimizer = T.optim.SGD(net.parameters(), lr=lr) elif opt == 'adam': optimizer = T.optim.Adam(net.parameters(), lr=lr) # else error for epoch in range(0, me): for (batch_idx, batch) in enumerate(train_ldr): X = batch[0] # inputs Y = batch[1] # correct class/label/politics optimizer.zero_grad() oupt = net(X) loss_val = loss_func(oupt, Y) # a tensor loss_val.backward() optimizer.step() return net # ----------------------------------------------------------- def main(): print("\nBegin demo effects of random seed \n") seeds = np.array([0, 3, 5, 363, 366, 999], dtype=np.int64) for i in range(len(seeds)): print("======================== ") # 0. set random seed seed = seeds[i] T.manual_seed(seed) np.random.seed(seed) # 1. create DataLoader objects print("Creating training Dataset ") train_file = ".\\Data\\people_train.txt" train_ds = PeopleDataset(train_file) # 200 rows # 2. create network print("Creating 6-(10-10)-3 neural network ") net = Net().to(device) net.train() # 3. train model bat_size = 10 max_epochs = 1000 lrn_rate = 0.01 print("Starting training . . . ", end="") train(net, train_ds, 'sgd', lrn_rate, bat_size, max_epochs) print("Done ") # 4. evaluate model loss and accuracy net.eval() acc_train = accuracy(net, train_ds) loss_train = overall_loss(net, train_ds, n_class=3) # 5. log results print("seed = %4d | acc = %0.4f | loss = %0.4f " % \ (seed, acc_train, loss_train)) # end-loop each seed print("\nEnd demo") # ----------------------------------------------------------- if __name__ == "__main__": main()
An extremely insightful blog post that demonstrates the challenges of working with seeds on a concrete example. This example shows the problem more extreme than any example I have seen before, thank you. I would gladly read more about “the dirty little secrets of machine learning.
The next escalation step of this problem could be the usage of multiple cores.
Let’s assume that after this test, 10% of the weights are affected by this problem, which occurs all the time but becomes more problematic during batch training. This will lead to non-reproducible results when using more than one core.
If your test was expensive and I have the chance to use a GPU with thousands of cores, this problem may become even more common, causing me to question why my performance differs so hard from yours.
Floating point issues occur all the time, but on a single core they happen consistently, allowing us to achieve the same results. We can then find a sweet spot of optimal performance. However, when using multiple cores, it seems that we may obtain more inaccurate results, particularly when trying to reach a top performance
Presumably, the bad seed results would then be better, and the better seeds would probably be worse, rather in the direction of what the average accuracy would be. My solution is more training, even though it may not be the best.