I contributed to an article titled “Researchers Explore Bayesian Neural Networks” on the Pure AI web site. See https://pureai.com/articles/2021/09/07/bayesian-neural-networks.aspx.
The agenda of the recently completed 2021 International Conference on Machine Learning (ICML) listed over 30 presentations related to the topic of Bayesian neural networks. The article explains what Bayesian neural networks are and why is there such great interest in them.
The term “Bayesian” loosely means “based on probability”. A Bayesian neural network (BNN) has weights and biases that are probability distributions instead of single fixed values. Each time a Bayesian neural network computes output, the values of the weights and biases will change slightly, and so the computed output will be slightly different every time. To make a prediction using a BNN, one approach is to feed the input to the BNN several times and average the results.
At first thought, Bayesian neural networks don’t seem to make much sense. However, BNNs have two advantages over standard neural networks. First, the built-in variability in BNNs makes them resistant to model overfitting. Model overfitting occurs when a neural network is trained too well. Even though the trained model predicts with high accuracy on the training data, when presented with new previously unseen data, the overfitted model predicts poorly. A second advantage of Bayesian neural networks over standard neural networks is that you can identify inputs where the model is uncertain of its prediction. For example, if you feed an input to a Bayesian neural network five times and you get five very different prediction results, you can treat the prediction as an “I’m not sure” result.
The screenshot shows an example of a Bayesian neural network in action on the well-known Iris Dataset. The goal is to predict the species (0 = setosa, 1 = versicolor, 2 = virginica) of an iris flower based on sepal length and width, and petal length and width. A sepal is a leaf-like structure. After the Bayesian neural network was trained, it was fed an input of [5.0, 2.0, 3.0, 2.0] three times. The first output was [0.0073, 0.8768, 0.1159]. These are probabilities of each class. Because the largest probability value is 0.8768 at index , the prediction is class 1 = versicolor.
Even though I didn’t say so in the article, I’m mildly skeptical about Bayesian neural networks. The idea has a feel of a solution in search of a problem — something that’s very common in research. But this isn’t completely bad. Research needs to work in two ways: 1.) start with a problem and then find a way to solve it, and 2.) start with an idea and then find a problem that can be solved with it.
Gambling is Bayesian. I always enjoy gambling scenes in science fiction. Left: A scene from the “Star Trek: The Next Generation” TV show (1987-1994). Center: A scene in the casino town of Canto Bight from “Star Wars: The Last Jedi” (2017). Right: Actor Justin Timberlake plays poker for his life in “In Time” (2011).