A regular neural network has a set of numeric constants called weights which determine the network output. If you feed the same input to a regular trained neural network, you will get the same output every time.

In a Bayesian neural network, each weight is probability distribution instead of a fixed value. Each time you feed an input to a Bayesian network, the weight will be slightly different and so you get slightly different output each time, even for the same input.

*A Bayesian neural network for the Iris dataset. The demo predicts the class probabilities three times for input = [5.0, 2.0, 3.0, 2.0] and gets three slightly different results because the weights are distributions instead of fixed values.*

At first thought this doesn’t seem useful at all. There are two advantages to a Bayesian neural network. First, the weights variability greatly deters model overfitting. Second, if you look at multiple output values from one input, and you see very different results, this means the network is not sure of its prediction, and you can deal with such “I don’t know” predictions.

The two main disadvantages of Bayesian neural networks are 1.) they are extremely complicated to implement, and 2.) they are more difficult to train.

The most common approach for creating a Bayesian neural network is to use a standard neural library, such as PyTorch or Keras, plus a Bayesian library such as Pyro. These Bayesian libraries are complex and have a steep learning curve. I recently stumbled across a lightweight Bayesian network library for PyTorch that allowed me to explore Bayesian neural networks. The library was created by a single guy, “Harry24k”, and is very, very impressive. The library is called torchbnn and was at: https://github.com/Harry24k/bayesian-neural-network-pytorch.

I installed the torchbnn library via pip without trouble. The torchbnn GitHub repository had a nice, simple example in the documentation that worked first time — a minor miracle when working with complex Python libraries. No, I take that back — it’s a major miracle.

I refactored the simple documentation example because that’s how I learn best. The example creates a classifier for the Iris dataset. The key code for the neural network definition is:

import numpy as np import torch as T import torchbnn as bnn device = T.device("cpu") class BayesianNet(T.nn.Module): def __init__(self): # 4-100-3 super(BayesianNet, self).__init__() self.hid1 = bnn.BayesLinear(prior_mu=0, prior_sigma=0.1, in_features=4, out_features=100) self.oupt = bnn.BayesLinear(prior_mu=0, prior_sigma=0.1, in_features=100, out_features=3) def forward(self, x): z = T.relu(self.hid1(x)) z = self.oupt(z) # no softmax: CrossEntropyLoss() return z

The network is 4-100-3 (four inputs — sepal length and width, and petal length and width), 100 hidden units, three outputs — setosa, versicolor, virginica). Instead of using standard torch.nn.Linear() layers, you use torchbnn.BayesLinear() layers. This gives you weights and biases that are distributions instead of regular tensors. You must specify the initial distribution mean (mu) and standard deviation (sigma).

When training the Bayesian neural network, the key code is:

X = batch['predictors'] # inputs Y = batch['species'] # targets optimizer.zero_grad() oupt = net(X) # outputs cel = ce_loss(oupt, Y) # regular loss kll = kl_loss(net) # distribution loss tot_loss = cel + (0.10 * kll) tot_loss.backward() # compute gradients optimizer.step() # update wt distributions

Bayesian neural networks have been around for a long time. But they aren’t used very often in practice. I strongly suspect the main reason why they’re not used often is that they’re just too difficult to work with. But if relatively simple libraries like the torchbnn one I found were more common, I think that Bayesian neural networks might gain greater popularity.

Loosely speaking, the term Bayesian means “based on probability”. The entire city of Las Vegas is based on probability. Left: Western Airlines (1926-1987). Center: Bonanza Airlines (1945-1968). Right: National Airlines (1934-1980). All three were major, successful airlines, but are gone now. A cautionary note to all major, successful companies.

Code below. Very long.

# iris_bayesian_01b.py # uses Bayesian library from: # https://github.com/Harry24k/bayesian- # neural-network-pytorch/blob/master/demos/ # Bayesian%20Neural%20Network%20Classification.ipynb # pip install torchbnn import numpy as np import torch as T import torchbnn as bnn device = T.device("cpu") # ----------------------------------------------------------- class IrisDataset(T.utils.data.Dataset): def __init__(self, src_file, num_rows=None): # like 5.0, 3.5, 1.3, 0.3, 0 tmp_x = np.loadtxt(src_file, max_rows=num_rows, usecols=range(0,4), delimiter=",", skiprows=0, dtype=np.float32) tmp_y = np.loadtxt(src_file, max_rows=num_rows, usecols=4, delimiter=",", skiprows=0, dtype=np.int64) self.x_data = T.tensor(tmp_x, dtype=T.float32) self.y_data = T.tensor(tmp_y, dtype=T.int64) def __len__(self): return len(self.x_data) def __getitem__(self, idx): if T.is_tensor(idx): idx = idx.tolist() preds = self.x_data[idx] spcs = self.y_data[idx] sample = { 'predictors' : preds, 'species' : spcs } return sample # ----------------------------------------------------------- class BayesianNet(T.nn.Module): def __init__(self): # 4-100-3 super(BayesianNet, self).__init__() self.hid1 = bnn.BayesLinear(prior_mu=0, prior_sigma=0.1, in_features=4, out_features=100) self.oupt = bnn.BayesLinear(prior_mu=0, prior_sigma=0.1, in_features=100, out_features=3) def forward(self, x): z = T.relu(self.hid1(x)) z = self.oupt(z) # no softmax: CrossEntropyLoss() return z # ----------------------------------------------------------- def accuracy(model, dataset): # assumes model.eval() dataldr = T.utils.data.DataLoader(dataset, batch_size=1, shuffle=False) n_correct = 0; n_wrong = 0 for (_, batch) in enumerate(dataldr): X = batch['predictors'] Y = batch['species'] # already flattened by Dataset with T.no_grad(): oupt = model(X) # logits form big_idx = T.argmax(oupt) # if big_idx.item() == Y.item(): if big_idx == Y: n_correct += 1 else: n_wrong += 1 acc = (n_correct * 1.0) / (n_correct + n_wrong) return acc # ----------------------------------------------------------- def accuracy_quick(model, dataset): n = len(dataset) X = dataset[0:n]['predictors'] # all X Y = T.flatten(dataset[0:n]['species']) # 1-D with T.no_grad(): oupt = model(X) arg_maxs = T.argmax(oupt, dim=1) # collapse cols num_correct = T.sum(Y==arg_maxs) acc = (num_correct * 1.0 / len(dataset)) return acc.item() # ----------------------------------------------------------- def main(): print("\nBegin Bayesian neural network Iris demo ") # 0. prepare np.random.seed(1) T.manual_seed(1) np.set_printoptions(precision=4, suppress=True, sign=" ") np.set_printoptions(formatter={'float': '{: 0.4f}'.format}) # 1. load training data print("\nCreating Iris train Dataset and DataLoader ") train_file = ".\\Data\\iris_train.txt" train_ds = IrisDataset(train_file, num_rows=120) bat_size = 4 train_ldr = T.utils.data.DataLoader(train_ds, batch_size=bat_size, shuffle=True) # 2. create network net = BayesianNet().to(device) # 3. train model (could put this into a train() function) max_epochs = 100 ep_log_interval = 10 ce_loss = T.nn.CrossEntropyLoss() # applies softmax() kl_loss = bnn.BKLLoss(reduction='mean', last_layer_only=False) optimizer = T.optim.Adam(net.parameters(), lr=0.01) print("\nbat_size = %3d " % bat_size) print("loss = highly customized ") print("optimizer = Adam 0.01") print("max_epochs = %3d " % max_epochs) print("\nStarting training") net.train() for epoch in range(0, max_epochs): epoch_loss = 0 # for one full epoch num_lines_read = 0 for (batch_idx, batch) in enumerate(train_ldr): X = batch['predictors'] # [10,4] Y = batch['species'] # alreay flattened optimizer.zero_grad() oupt = net(X) cel = ce_loss(oupt, Y) kll = kl_loss(net) tot_loss = cel + (0.10 * kll) epoch_loss += tot_loss.item() # accumulate tot_loss.backward() # update wt distribs optimizer.step() if epoch % ep_log_interval == 0: print("epoch = %4d loss = %0.4f" % (epoch, epoch_loss)) print("Training done ") # 4. evaluate model accuracy print("\nComputing Bayesian network model accuracy") net.eval() acc = accuracy_quick(net, train_ds) # item-by-item print("Accuracy on train data = %0.4f" % acc) # 5. make a prediction print("\nPredicting species for [5.0, 2.0, 3.0, 2.0]: ") x = np.array([[5.0, 2.0, 3.0, 2.0]], dtype=np.float32) x = T.tensor(x, dtype=T.float32).to(device) for i in range(3): with T.no_grad(): logits = net(x).to(device) # values do not sum to 1.0 probs = T.softmax(logits, dim=1).to(device) print(probs.numpy()) print("\nEnd Bayesian network demo ") if __name__ == "__main__": main()

You must be logged in to post a comment.