An Example of a Bayesian Neural Network Using PyTorch

A regular neural network has a set of numeric constants called weights which determine the network output. If you feed the same input to a regular trained neural network, you will get the same output every time.

In a Bayesian neural network, each weight is probability distribution instead of a fixed value. Each time you feed an input to a Bayesian network, the weight will be slightly different and so you get slightly different output each time, even for the same input.


A Bayesian neural network for the Iris dataset. The demo predicts the class probabilities three times for input = [5.0, 2.0, 3.0, 2.0] and gets three slightly different results because the weights are distributions instead of fixed values.

At first thought this doesn’t seem useful at all. There are two advantages to a Bayesian neural network. First, the weights variability greatly deters model overfitting. Second, if you look at multiple output values from one input, and you see very different results, this means the network is not sure of its prediction, and you can deal with such “I don’t know” predictions.

The two main disadvantages of Bayesian neural networks are 1.) they are extremely complicated to implement, and 2.) they are more difficult to train.

The most common approach for creating a Bayesian neural network is to use a standard neural library, such as PyTorch or Keras, plus a Bayesian library such as Pyro. These Bayesian libraries are complex and have a steep learning curve. I recently stumbled across a lightweight Bayesian network library for PyTorch that allowed me to explore Bayesian neural networks. The library was created by a single guy, “Harry24k”, and is very, very impressive. The library is called torchbnn and was at: https://github.com/Harry24k/bayesian-neural-network-pytorch.

I installed the torchbnn library via pip without trouble. The torchbnn GitHub repository had a nice, simple example in the documentation that worked first time — a minor miracle when working with complex Python libraries. No, I take that back — it’s a major miracle.

I refactored the simple documentation example because that’s how I learn best. The example creates a classifier for the Iris dataset. The key code for the neural network definition is:

import numpy as np
import torch as T
import torchbnn as bnn
device = T.device("cpu")

class BayesianNet(T.nn.Module):
  def __init__(self):            # 4-100-3
    super(BayesianNet, self).__init__()
    self.hid1 = bnn.BayesLinear(prior_mu=0, prior_sigma=0.1,
      in_features=4, out_features=100)
    self.oupt = bnn.BayesLinear(prior_mu=0, prior_sigma=0.1,
      in_features=100, out_features=3)

  def forward(self, x):
    z = T.relu(self.hid1(x))
    z = self.oupt(z)  # no softmax: CrossEntropyLoss() 
    return z

The network is 4-100-3 (four inputs — sepal length and width, and petal length and width), 100 hidden units, three outputs — setosa, versicolor, virginica). Instead of using standard torch.nn.Linear() layers, you use torchbnn.BayesLinear() layers. This gives you weights and biases that are distributions instead of regular tensors. You must specify the initial distribution mean (mu) and standard deviation (sigma).

When training the Bayesian neural network, the key code is:

X = batch['predictors']  # inputs
Y = batch['species']     # targets
optimizer.zero_grad()
oupt = net(X)            # outputs

cel = ce_loss(oupt, Y)   # regular loss
kll = kl_loss(net)       # distribution loss
tot_loss = cel + (0.10 * kll)

tot_loss.backward()      # compute gradients
optimizer.step()         # update wt distributions

Bayesian neural networks have been around for a long time. But they aren’t used very often in practice. I strongly suspect the main reason why they’re not used often is that they’re just too difficult to work with. But if relatively simple libraries like the torchbnn one I found were more common, I think that Bayesian neural networks might gain greater popularity.



Loosely speaking, the term Bayesian means “based on probability”. The entire city of Las Vegas is based on probability. Left: Western Airlines (1926-1987). Center: Bonanza Airlines (1945-1968). Right: National Airlines (1934-1980). All three were major, successful airlines, but are gone now. A cautionary note to all major, successful companies.


Code below. Very long.

# iris_bayesian_01b.py

# uses Bayesian library from:
# https://github.com/Harry24k/bayesian-
# neural-network-pytorch/blob/master/demos/
# Bayesian%20Neural%20Network%20Classification.ipynb
# pip install torchbnn

import numpy as np
import torch as T
import torchbnn as bnn

device = T.device("cpu")

# -----------------------------------------------------------

class IrisDataset(T.utils.data.Dataset):
  def __init__(self, src_file, num_rows=None):
    # like 5.0, 3.5, 1.3, 0.3, 0
    tmp_x = np.loadtxt(src_file, max_rows=num_rows,
      usecols=range(0,4), delimiter=",", skiprows=0,
      dtype=np.float32)
    tmp_y = np.loadtxt(src_file, max_rows=num_rows,
      usecols=4, delimiter=",", skiprows=0,
      dtype=np.int64)

    self.x_data = T.tensor(tmp_x, dtype=T.float32)
    self.y_data = T.tensor(tmp_y, dtype=T.int64)

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    if T.is_tensor(idx):
      idx = idx.tolist()
    preds = self.x_data[idx]
    spcs = self.y_data[idx] 
    sample = { 'predictors' : preds, 'species' : spcs }
    return sample

# -----------------------------------------------------------

class BayesianNet(T.nn.Module):
  def __init__(self):            # 4-100-3
    super(BayesianNet, self).__init__()
    self.hid1 = bnn.BayesLinear(prior_mu=0, prior_sigma=0.1,
      in_features=4, out_features=100)
    self.oupt = bnn.BayesLinear(prior_mu=0, prior_sigma=0.1,
      in_features=100, out_features=3)

  def forward(self, x):
    z = T.relu(self.hid1(x))
    z = self.oupt(z)  # no softmax: CrossEntropyLoss() 
    return z

# -----------------------------------------------------------

def accuracy(model, dataset):
  # assumes model.eval()
  dataldr = T.utils.data.DataLoader(dataset, batch_size=1,
    shuffle=False)
  n_correct = 0; n_wrong = 0
  for (_, batch) in enumerate(dataldr):
    X = batch['predictors'] 
    Y = batch['species']  # already flattened by Dataset
    with T.no_grad():
      oupt = model(X)  # logits form

    big_idx = T.argmax(oupt)
    # if big_idx.item() == Y.item():
    if big_idx == Y:
      n_correct += 1
    else:
      n_wrong += 1

  acc = (n_correct * 1.0) / (n_correct + n_wrong)
  return acc

# -----------------------------------------------------------

def accuracy_quick(model, dataset):
  n = len(dataset)
  X = dataset[0:n]['predictors']  # all X 
  Y = T.flatten(dataset[0:n]['species'])  # 1-D

  with T.no_grad():
    oupt = model(X)
  arg_maxs = T.argmax(oupt, dim=1)  # collapse cols
  num_correct = T.sum(Y==arg_maxs)
  acc = (num_correct * 1.0 / len(dataset))
  return acc.item()

# -----------------------------------------------------------

def main():
  print("\nBegin Bayesian neural network Iris demo ")
  # 0. prepare
  np.random.seed(1)
  T.manual_seed(1)
  np.set_printoptions(precision=4, suppress=True, sign=" ")
  np.set_printoptions(formatter={'float': '{: 0.4f}'.format})

  # 1. load training data
  print("\nCreating Iris train Dataset and DataLoader ")
  train_file = ".\\Data\\iris_train.txt"
  train_ds = IrisDataset(train_file, num_rows=120)

  bat_size = 4
  train_ldr = T.utils.data.DataLoader(train_ds,
    batch_size=bat_size, shuffle=True)

  # 2. create network
  net = BayesianNet().to(device)

  # 3. train model (could put this into a train() function)
  max_epochs = 100
  ep_log_interval = 10

  ce_loss = T.nn.CrossEntropyLoss()   # applies softmax()
  kl_loss = bnn.BKLLoss(reduction='mean', last_layer_only=False)
  optimizer = T.optim.Adam(net.parameters(), lr=0.01)

  print("\nbat_size = %3d " % bat_size)
  print("loss = highly customized ")
  print("optimizer = Adam 0.01")
  print("max_epochs = %3d " % max_epochs)

  print("\nStarting training")
  net.train()
  for epoch in range(0, max_epochs):
    epoch_loss = 0  # for one full epoch
    num_lines_read = 0

    for (batch_idx, batch) in enumerate(train_ldr):
      X = batch['predictors']  # [10,4]
      Y = batch['species']  # alreay flattened
      optimizer.zero_grad()
      oupt = net(X)

      cel = ce_loss(oupt, Y)
      kll = kl_loss(net)
      tot_loss = cel + (0.10 * kll)

      epoch_loss += tot_loss.item()  # accumulate
      tot_loss.backward()  # update wt distribs
      optimizer.step()

    if epoch % ep_log_interval == 0:
      print("epoch = %4d   loss = %0.4f" % (epoch, epoch_loss))
  print("Training done ")

  # 4. evaluate model accuracy
  print("\nComputing Bayesian network model accuracy")
  net.eval()
  acc = accuracy_quick(net, train_ds)  # item-by-item
  print("Accuracy on train data = %0.4f" % acc)

  # 5. make a prediction
  print("\nPredicting species for [5.0, 2.0, 3.0, 2.0]: ")
  x = np.array([[5.0, 2.0, 3.0, 2.0]], dtype=np.float32)
  x = T.tensor(x, dtype=T.float32).to(device) 

  for i in range(3):
    with T.no_grad():
      logits = net(x).to(device)  # values do not sum to 1.0
    probs = T.softmax(logits, dim=1).to(device)
    print(probs.numpy())

  print("\nEnd Bayesian network demo ")

if __name__ == "__main__":
  main()
This entry was posted in PyTorch. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s