Data Clustering Using Naive Bayes Inference

I wrote an article titled “Data Clustering Using Naive Bayes Inference” in the March 2013 issue of MSDN Magazine. See Naive Bayes inference is used in several ways in machine learning. (In my mind ML is a very general term that means exploring data to find patterns and make predictions).

Naive Bayes is most often used for classification and prediction of categorical data. For example, see my February 2013 article at But as my March article describes, naive Bayes can also be used for clustering categorical data. Clustering is an unsupervised ML technique (meaning no so-called training data is necessary). The image below should make the ideas of clustering using naive Bayes inference clear.

In the article, I brush off the process where you start with numeric data (such as people’s heights in inches) and convert that data into categorical data (for example, ‘short’, ‘medium’, ‘tall’). This binning process (usually called discretization of continuous data) is surprisingly tricky and interesting, and I’ll address it in a future article.


This entry was posted in Machine Learning. Bookmark the permalink.