“Multinomial Naive Bayes Classification Using the scikit Library” in Visual Studio Magazine

I wrote an article titled “Multinomial Naive Bayes Classification Using the scikit Library” in the April 2023 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2023/04/17/multinomial-naive-bayes.aspx.

Naive Bayes classification is a classical machine learning technique to predict a discrete value. There are several variations of naive Bayes (NB) including Categorical NB, Bernoulli NB, Gaussian NB and Multinomial NB. The different NB variations are used for different types of predictor data.

My article explains multinomial naive Bayes where the predictor variables represent counts. Specifically, the data looks like:

5,7,12,6,4,math
1,6,10,3,0,math
0,9,12,2,1,math
8,8,10,3,2,psychology
7,14,8,0,0,psychology
5,12,9,1,3,psychology
2,16,7,0,2,psychology
3,11,5,4,4,history
5,9,7,4,2,history
8,6,8,0,1,history

Each line represents a college course. The first five values are the number of A grades received by students in the course, the number of B grades, the number of C grades, the number of D grades and the number of F grades. The sixth value on each line is the course type: math, psychology or history. The goal is to predict course type from the counts of each grade. For example, if an unknown course has grade counts of (7,8,7,3,1), what type of course is it?

Suppose you want to predict the course type (history, math, or psychology) for these grade counts: (7, 8, 7, 3, 1). Multinomial NB examines each grade count using non-trivial Bayesian probability. It turns out that the 7 A grades in the unknown course suggest the unknown course is history. The 8 B counts suggest math. The 7 C counts suggest history. The 3 D counts suggest history. The 1 F count suggests psychology. Taken together, the counts suggest the unknown course is history.

In my article I point out that if you search the internet for an example of a Multinomial naive Bayes classification, you’ll find dozens of the exact same example repeated over and over where naive Bayes is used for document classification. For example, the possible document types might be business = 0, crime = 1, finance = 2, politics = 3, science = 4, sports = 5.

The predictors are the counts of each word in the English language — “a,” “an,” “at,” “and” and so on — in each document. This is an annoyingly huge dataset that obscures what is going on with Multinomial naive Bayes and might give you the impression that Multinomial NB can only be used for document classification, rather than used for many prediction problems where the predictor values are counts.



Many years ago, colleges published annual yearbooks where every student could display a message under their photo. Here are three clever high school yearbook photos and messages.


This entry was posted in Scikit. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s