Researchers Explore Bayesian Neural Networks on Pure AI

I contributed to an article titled “Researchers Explore Bayesian Neural Networks” on the Pure AI web site. See https://pureai.com/articles/2021/09/07/bayesian-neural-networks.aspx.

The agenda of the recently completed 2021 International Conference on Machine Learning (ICML) listed over 30 presentations related to the topic of Bayesian neural networks. The article explains what Bayesian neural networks are and why is there such great interest in them.

The term “Bayesian” loosely means “based on probability”. A Bayesian neural network (BNN) has weights and biases that are probability distributions instead of single fixed values. Each time a Bayesian neural network computes output, the values of the weights and biases will change slightly, and so the computed output will be slightly different every time. To make a prediction using a BNN, one approach is to feed the input to the BNN several times and average the results.

At first thought, Bayesian neural networks don’t seem to make much sense. However, BNNs have two advantages over standard neural networks. First, the built-in variability in BNNs makes them resistant to model overfitting. Model overfitting occurs when a neural network is trained too well. Even though the trained model predicts with high accuracy on the training data, when presented with new previously unseen data, the overfitted model predicts poorly. A second advantage of Bayesian neural networks over standard neural networks is that you can identify inputs where the model is uncertain of its prediction. For example, if you feed an input to a Bayesian neural network five times and you get five very different prediction results, you can treat the prediction as an “I’m not sure” result.

The screenshot shows an example of a Bayesian neural network in action on the well-known Iris Dataset. The goal is to predict the species (0 = setosa, 1 = versicolor, 2 = virginica) of an iris flower based on sepal length and width, and petal length and width. A sepal is a leaf-like structure. After the Bayesian neural network was trained, it was fed an input of [5.0, 2.0, 3.0, 2.0] three times. The first output was [0.0073, 0.8768, 0.1159]. These are probabilities of each class. Because the largest probability value is 0.8768 at index [1], the prediction is class 1 = versicolor.

Even though I didn’t say so in the article, I’m mildly skeptical about Bayesian neural networks. The idea has a feel of a solution in search of a problem — something that’s very common in research. But this isn’t completely bad. Research needs to work in two ways: 1.) start with a problem and then find a way to solve it, and 2.) start with an idea and then find a problem that can be solved with it.



Gambling is Bayesian. I always enjoy gambling scenes in science fiction. Left: A scene from the “Star Trek: The Next Generation” TV show (1987-1994). Center: A scene in the casino town of Canto Bight from “Star Wars: The Last Jedi” (2017). Right: Actor Justin Timberlake plays poker for his life in “In Time” (2011).

This entry was posted in Machine Learning. Bookmark the permalink.

1 Response to Researchers Explore Bayesian Neural Networks on Pure AI

  1. Thorsten Kleppe says:

    Bayesian neural networks sound very interesting, will you show us the way back?

    This all reminds me of an article I read recently, “Improving the Performance of a Neural Network”. I’m not sure if it is directly related to the topic, but the idea was as follows:

    Instead of better trained networks:
    Ground Truth: 1111111111
    Classifier 1: 1111111100 = 80% accuracy
    Classifier 2: 1111111100 = 80% accuracy
    Classifier 3: 1011111100 = 70% accuracy
    Ensemble Result: 1111111100 = 80% accuracy

    A series of networks that make more distributed predictions reach better accuracy:
    Ground Truth: 1111111111
    Classifier 1: 1111111100 = 80% accuracy
    Classifier 2: 0111011101 = 70% accuracy
    Classifier 3: 1000101111 = 60% accuracy
    Ensemble Result: 1111111101 = 90% accuracy

    Since BNN’s work with probabilities we could perhaps develop a much better model this way? Crazy stuff, on the other hand I understand that you are skeptical, my consideration would go in the direction that a neural network with some noise would produce a similar effect. Anyway, much thanks for one more topic with brilliant content.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s