How to Do Naive Bayes with Numeric Data Using C#

I wrote an article titled, “How to Do Naive Bayes with Numeric Data Using C#” in the November 2019 edition of Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2019/11/12/naive-bayes-csharp.aspx.

The Naive Bayes technique can be used for binary classification, for example predicting if a person is male or female based on predictors such as age, height, weight, and so on), or for multiclass classification, for example predicting if a person is politically conservative, moderate or liberal based on predictors such as annual income, sex, and so on. Naive Bayes classification can be used with numeric predictor values, such as a height of 5.75 feet, or with categorical predictor values such as a color of “red”.

In the article I explain how to create a naive Bayes classification system when the predictor values are numeric, using the C# language without any special code libraries. In particular, the goal of the demo program was to predict the gender of a person (male = 0, female = 1) based on their height, weight, and foot length. After creating a prediction model, the demo set up a new data item to classify, with predictor values of height = 5.60, weight = 150, foot = 8.

The probability that the unknown person is male was 0.62 and the probability of female was 0.38, therefore the conclusion was the unknown person is most likely male.

The naive Bayes classification technique has “naive” in its name because it assumes that each predictor value is mathematically independent. Naive Bayes classification with numeric data makes the additional assumption that all predictor variables are Gaussian distributed. This assumption is sometimes not true. For example, the ages of people in a particular profession could be significantly skewed or even bimodal. In spite of these assumptions, naive Bayes classification often works quite well.



From an Internet search for naive characters in film. Princess Giselle in “Enchanted” (2007), Lorelei in “Gentlemen Prefer Blondes” (1953), Jade in “The Hangover” (2009), Cher in “Clueless” (1995).

Posted in Machine Learning | Leave a comment

NFL 2019 Week 11 Predictions – Zoltar Likes Four Underdogs

Zoltar is my NFL prediction computer program. It uses a deep neural network and reinforcement learning. Here are Zoltar’s predictions for week #11 of the 2019 NFL season:

Zoltar:    steelers  by    0  dog =      browns    Vegas:      browns  by  2.5
Zoltar:    panthers  by    6  dog =     falcons    Vegas:    panthers  by    6
Zoltar:       colts  by    6  dog =     jaguars    Vegas:       colts  by    3
Zoltar:     cowboys  by    0  dog =       lions    Vegas:     cowboys  by    4
Zoltar:       bills  by    0  dog =    dolphins    Vegas:       bills  by  5.5
Zoltar:     vikings  by    9  dog =     broncos    Vegas:     vikings  by   10
Zoltar:      ravens  by    4  dog =      texans    Vegas:      ravens  by    4
Zoltar:      saints  by    7  dog =  buccaneers    Vegas:      saints  by    5
Zoltar:    redskins  by    5  dog =        jets    Vegas:    redskins  by  1.5
Zoltar: fortyniners  by   10  dog =   cardinals    Vegas: fortyniners  by 13.5
Zoltar:    patriots  by    0  dog =      eagles    Vegas:    patriots  by  3.5
Zoltar:     raiders  by    9  dog =     bengals    Vegas:     raiders  by 10.5
Zoltar:        rams  by    6  dog =       bears    Vegas:        rams  by  6.5
Zoltar:      chiefs  by    3  dog =    chargers    Vegas:      chiefs  by  3.5

Zoltar theoretically suggests betting when the Vegas line is more than 3.0 points different from Zoltar’s prediction. For week #11 Zoltar has five hypothetical suggestions.

1. Zoltar likes the Vegas underdog Lions against the Cowboys. Zoltar thinks the two teams are evenly matched but Vegas has the Cowboys favored by 4.0 points. A bet on the Lions will pay off if the Lions win by any score or if the Cowboys win but by less than 4 points (i.e., 3 points or less). If the Cowboys win by exactly 4 points the bet is a push.

2. Zoltar likes the Vegas underdog Dolphins against the Bills. Zoltar thinks the two teams are evenly matched but Vegas has the Bills favored by 5.5 points.

3. Zoltar likes the Vegas underdog Cardinals against the 49ers. Zoltar thinks the 49ers are a big 10 points better than the Cardinals but Vegas thinks the 49ers are a very big 13.5 points better.

4. Zoltar likes the Vegas underdog Eagles against the Patriots. Zoltar thinks the two teams are evenly matched but Vegas has the Patriots favored by 3.5 points.

5. Zoltar likes the Vegas favorite Redskins against the Jets. Zoltar thinks the Redskins are 5 points better than the Jets but Vegas thinks the Redskins are only 1.5 points better. A bet on the Redskins will pay off only if the Redskins win by more than 1.5 points.

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better.

Zoltar did OK in week #10. Against the Vegas point spread, Zoltar was a reasonable 2-1. Zoltar correctly liked Vegas underdogs Steelers and Seahawks, both of whom won outright. Zoltar missed by predicting the Saints would cover the spread but the Saints lost badly to the Falcons.

For the 2019 season, through week #10, Zoltar is 33-20 (62% accuracy) against the Vegas spread.

Just for fun, I track how well Zoltar and Las Vegas do when just trying to predict only which team will win (but not by how much). This isn’t useful except for parlay betting.

Just predicting winners, Zoltar was a weak 8-5. But Vegas was even worse at 5-8 just predicting winners.

For the season Zoltar is a pretty decent 97-50 (66%) just picking winners and Vegas is at 92-52 (64%).

Note: Vegas has had three pick’em games so far and there has been one tie game.



My system is named after the Zoltar fortune telling machine you can find in arcades. Here are Zoltar and three nice art nouveau style paintings of gypsy fortune tellers.

Posted in Zoltar | Leave a comment

Traversing a Tree Data Structure Implemented as a List

I don’t use tree data structures very often. But when I do need a tree (typically a decision tree classifier) , I avoid using a recursion approach and use a List data structure. Recursive data structures are cool and mysterious, but in a production environment simplicity is always better than coolness.

With a List data structure it’s easy to know exactly where any child or parent node is. Suppose you have a tree with seven nodes:

      0
  1       2
3   4   5   6

If each node has an ID i, where root = 0, left child of root = 1, right child of root = 2 and so on then:

The left child of i is located at index [2i + 1]
The right child of i is located at index [2i + 2]

If i is an odd number (i % 2 != 0) the node is a left child.
If i is even the node is a right child

A left child parent is at [(i-1) / 2]
A right child parent is at [(i–2) / 2]

Simple, easy, and efficient.

To traverse a tree implemented as a List, you just walk through the List in order:

for i = 0 to numNodes
  display(tree[i])
end-for

The will display the tree level by level: (0, 1, 2, 3, 4, 5, 6). Traversing level-by-level is perfectly fine for most problem scenarios. But suppose you want to traverse/display the tree in what’s called an inorder manner. This is a common ordering because it’s easy to do for a recursive tree:

display(root)
  if root != null
    display(root->left)
    print(root)
    display(root->right
  end-if
end-display

For the seven-node tree above, the nodes would be printed as 3 1 4 0 5 2 6. To print a tree implemented as a List, you need to use a Stack and do a little work. I hadn’t looked at this problem in a long time so I decided to code up a demo to see if I remembered the algorithm. I did.


Displaying a tree implemented using a List in an inorder manner.

When I was a college professor I used to enjoy teaching students how to implement a tree data structure using recursion because the technique is fascinating. But I always told my students that knowing how to use recursion is fine, but in a production environment you should avoid recursion if possible — as a rule of thumb, recursive functions are tricky, error-prone, and difficult to maintain or modify.



Left: Some trees in November on the street where I live — naturally beautiful. Center: An old-style paint-by-numbers painting of trees — oddly attractive. Right: A huge alien tree on an alien world (unknown artist) — very creative.

Posted in Machine Learning | Leave a comment

Naive Bayes Classification for Numeric Data Using C#

I’ve been thinking about naive Bayes classification recently, in part because it’s going to be one of the topics I explain in a hands-on workshop at the upcoming Azure + AI Conference (see https://azureaiconf.com)

Naive Bayes classification can be used for numeric data, such as predicting the sex of a person who has height = 6.00′, weight = 185 lbs, foot = 9 inches. Naive Bayes can also be used for categorical data such predicting the sex of a person who has height = tall, weight = medium, foot = normal. The underlying theory is the same for the numeric data and categorical data scenarios, but the details are quite a bit different.


My demo program uses the data from the Wikipedia page on naive Bayes. There are 8 items. Each item is the height, weight, and foot size of a male or female. The goal is to predict the sex of a person who is 6.00 feet tall, weighs 130 lbs and has foot size 8 inches. The result is P(female) = 0.9999884.

A few day ago I reviewed an example with numeric data on the Wikipedia page on naive Bayes. I verified the Wikipedia calculations by performing the calculations myself, using Excel.

Just for fun I decided to perform the calculations using a C# program. It was an interesting exercise. I didn’t have any major problems because I’m quite familiar with naive Bayes for numeric data. The technique assumes that all data is Gaussian distributed and the technique uses the Gaussian probability distribution function, which I’m also very familiar with.


Here is the same problem, solved using Excel.

While I was reviewing the details of how naive Bayes classification works, I came across a technique called Bayes point machine classification. I spent a couple of hours trying to make sense of the little information I found on the Internet, including the source research paper. As far as I can tell, the Bayes point machine is yet another example of prolific research efforts that are overly complex solutions in search of a problem.

The fact that almost nobody uses the Bayes point machine classification technique suggests that it has no advantages over techniques, such as a shallow neural network, that are much simpler. I could be wrong however. The source research paper is very poorly written, in the sense that the paper was not written so that someone could actually implement the technique. So I’ll need to probe a bit deeper before I’m satisfied that Bayes point machine classification is in fact a dead end.



Robert K. Abbett (1926-2015) was a prolific artist who did the covers of many paperback novels in the 1960s. I like his style of art a lot. I’ve read “Thuvia, Maid of Mars”, by Edgar Rice Burroughs — an excellent novel. I haven’t read the other two books, but I suspect the cover art for them is better than the content. “When she crashed into his house, about all she wore was a guilty look.” Brilliant — modern day Shakespeare.

Posted in Machine Learning | Leave a comment

Understanding Shannon Entropy for Creating a Decision Tree Classifier

Suppose you have this data:

 1.0,  2.0,  3.0,  0
 4.0,  5.0,  6.0,  0
 7.0,  8.0,  9.0,  1
10.0, 11.0, 12.0,  1
13.0, 14.0, 15.0,  1
16.0, 17.0, 18.0,  2
19.0, 20.0, 21.0,  3
22.0, 23.0, 24.0,  3

Each row represents a person. There are 4 classes of people indicated by the last value in each row. There are three predictor variables. The data is artificial but you can imagine the four classes represent job type (engineering, sales, management, operations) and the three predictor variables are sick-days, personal-days-off, and vacation-days. The goal is to predict job type from the predictor values.

There are many machine learning techniques you can use to create a prediction model, including numeric naive Bayes, k-NN, neural network classifiers, etc. One of the most basic techniques is to use a decision tree. The final form of a decision tree will be a set of rules like, “if sick < 15.0 and vacation ≥ 12.0 then job-type = 2”.

Creating a decision tree classifier is not too difficult conceptually but the implementation details are very tricky. (This is the opposite of neural networks which are conceptually extremely deep but implementation is not very difficult).

One of the key ideas when creating a decision tree is repeatedly splitting data into two groups so that the two groups have mostly the same class.

The two most common approaches when splitting data for a decision tree are using Shannon entropy and using Gini impurity. Both are measures of disorder in a set of items. I usually prefer to use impurity but sometimes entropy works slightly better. Suppose you have four classes, 0 to 3, and a set of eight items: (0, 0, 1, 1, 1, 1, 1, 3). The Shannon entropy of a set of items is defined as -1 * Sum[p * log2(p)] where p is the probability of each class. So P(0) = 2/8 = 0.25, P(1) = 4/8 = 0.50, P(2) = 0/8 = 0.00, P(3 = 1/8) = 0.125. The sum of the product of each probability times the log to the base 2 of the probability is:

sum = 0.25  * log2(0.25) +
      0.50  * log2(0.50) +
      0.00  * log2(0.00) +
      0.125 * log2(0.125)

    = 0.25  * -2.0000 +
      0.50  * -0.5000 +
      0.00  *  (na)   +
      0.125 * -0.3750

    = -1.375

entropy = -1 * sum = 1.375

You have to avoid trying to compute log2(0) because log to base anything of zero is negative infinity.

Lower values of entropy mean the data items in a set are mostly the same. Higher values of entropy indicate more disorder (the data items aren’t the same). In the extreme, the entropy for a set of items that are all identical is 0.00 — for decision trees lower entropy is better. The largest possible value of Shannon entropy is log2 of the number of classes. For example, if you had 10 items and they were all different, Shannon entropy is log2(10) = 3.3219.

When creating a decision tree you want to split a set of items into two subsets so that the entropy of the class values is low. You could just try different splits, compute the entropies of the items in each of the two partitions, then take the average. This is OK but has a minor downside that partitions of different size are weighted the same. Therefore, it’s usual to average the two entropy values by the number of items in each partition.

For the dummy data above, suppose you decide to split the eight items into the first three items (0, 0, 1) and the last five items (1, 1, 2, 3, 3). The entropy of the first set is 0.9183. The entropy of the second set is 1.5219. The weighted average is (3/8) * 0.9183 + (5/8) * 1.5219 = 1.2956.

Understanding entropy and disorder is the first step in gaining the knowledge you need to implement a decision tree classifier. Next, you need to understand how to search through your data to find a good split. You can’t try all possible splits because of the combinatorial explosion problem, so you have to use a different approach. I’ll explain in a future post.



I wonder if people tend to favor one hand over the other when making the OK sign. I always use my left hand for an OK.

Posted in Machine Learning | Leave a comment

Another Look at Braess’s Paradox

Several years ago I wrote an article in Microsoft MSDN Magazine about paradoxes related to software testing. Alas, that article disapeared when MSDN Magazine hosed up the storage of archived articles, and I didn’t bother to keep a copy of my article.

One of the topics in my old article was Braess’s Paradox. Briefly, if you have a road network, adding a new road can actually make travel times worse. The same principle applies to computer networks.

There are several common examples used to illustrate Braess’s Paradox. The image below is one:

Cars must travel from A to D. Suppose there are N = 6 cars. Before the new road addition, 3 cars will take the A-B-D route and 3 cars will take the A-C-D route. The time for a car on the upper A-B-D route will be (10 * 3) + (3 + 50) = 83 minutes. The time for a car on the lower A-C-D route will be (3 + 50) + (10 * 3) = 83 minutes.

Notice that no car will switch routes because it will take longer. For example, suppose one of the cars that takes the upper route decides to take the lower route instead. His travel time will be (4 + 50) + (10 * 4) = 94 minutes. When a system like this is stable, it’s said to be in Nash equilibrium.

Now suppose a new road between B and C is added. Weirdly, equilibrium will be reached when 2 cars use route A-B-D, 2 cars use A-B-C-D, and 2 cars use A-C-D.

The travel time for A-B-D is (10 * 4) + (2 + 50) = 92 minutes.

The travel time for A-B-C-D is (10 * 4) + (2 + 10) + (10 * 4) = 92 minutes.

The travel time for A-C-D is (2 + 50) + (10 * 4) = 92 minutes.

And, although it’s not obvious, if any driver changes routes, he will take longer than 92 minutes. So, the effect of adding a new road is to increase the travel time for every car from 83 minutes to 92 minutes. Note that the situation could be avoided if all the drivers cooperated.

Very cool. Very strange!



Three colorful sea slugs. Kind of cool. Very strange. But kind of creepy and scary. Left: Nembrotha kubaryana (“neon slug”). Center: Flabellinopsis iodine (“Spanish shawl”). Right: Hypselodoris apolegma (“purple sea slug”).

Posted in Miscellaneous | Leave a comment

NFL 2019 Week 10 Predictions – Zoltar Loves the Saints

Zoltar is my NFL prediction computer program. It uses a deep neural network and reinforcement learning. Here are Zoltar’s predictions for week #10 of the 2019 NFL season:

Zoltar:    chargers  by    0  dog =     raiders    Vegas:    chargers  by    1
Zoltar:       bears  by    6  dog =       lions    Vegas:       bears  by    3
Zoltar:      ravens  by   10  dog =     bengals    Vegas:      ravens  by   10
Zoltar:       bills  by    0  dog =      browns    Vegas:      browns  by    3
Zoltar:     packers  by    6  dog =    panthers    Vegas:     packers  by    5
Zoltar:      saints  by   18  dog =     falcons    Vegas:      saints  by 12.5
Zoltar:        jets  by    1  dog =      giants    Vegas:      giants  by    2
Zoltar:      chiefs  by    1  dog =      titans    Vegas:      chiefs  by    4
Zoltar:  buccaneers  by    4  dog =   cardinals    Vegas:  buccaneers  by  4.5
Zoltar:       colts  by   10  dog =    dolphins    Vegas:       colts  by 10.5
Zoltar:        rams  by    0  dog =    steelers    Vegas:        rams  by    4
Zoltar:     cowboys  by    4  dog =     vikings    Vegas:     cowboys  by    3
Zoltar:    seahawks  by    0  dog = fortyniners    Vegas: fortyniners  by    6 

Zoltar theoretically suggests betting when the Vegas line is more than 3.0 points different from Zoltar’s prediction. For week #10 Zoltar has three hypothetical suggestions.

1. Zoltar likes the Vegas favorite Saints against the Falcons. Zoltar thinks the Saints are a massive 18 points better than the Falcons but Vegas thinks the Saints are only 12.5 points better. A bet on the Saints will pay off only if the Saints win by more than 12.5 points, in other words 13 points or more.

2. Zoltar likes the Vegas underdog Steelers against the Rams. Zoltar thinks the two teams are evenly matched (taking home field advantage into account) but Vegas believes the Rams are 4.0 points better than the Steelers. A bet on the Steelers will pay off if the Steelers win by any score or if the Rams win but by less than 4.0 points (if the Rams win by exactly 4 points the bet is a push).

3. Zoltar likes the Vegas underdog Seahawks against the 49ers. Zoltar inexplicably thinks the two teams are evenly matched even though the 49ers are undefeated and the game is being played in San Francisco. I might have a bug in the system – I need to double check this.

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better.

Zoltar did very well in week #9. Against the Vegas point spread, Zoltar was a good 5-2. Zoltar correctly liked Vegas underdogs Cardinals, Dolphins, Ravens and Vegas favorites Texans, Seahawks. Zoltar missed with recommendations on underdogs Redskins and Bears. (Zoltar got incredibly lucky on the Seahawks game — a late point spread move, plus a missed short field goal, plus an overtime touchdown.)

For the 2019 season, through week #9, Zoltar is 31-19 (62% accuracy) against the Vegas spread.

Just for fun, I track how well Zoltar and Las Vegas do when just trying to predict only which team will win (but not by how much). This isn’t useful except for parlay betting.

Just predicting winners, Zoltar was an excellent 14-0. Vegas was a so-so 9-5 just predicting winners. For the season Zoltar is a pretty decent 89-45 (66%) just picking winners and Vegas is almost identical at 87-44 (66%) Note: Vegas has had three pick’em games so far and there has been one tie game. Just picking winners, Vegas is significantly more accurate this year than in any of the previous 20 years.



My system is named after the Zoltar fortune teller machine you can find in arcades. That machine is named after the machine from the 1988 movie “Big” starring Tom Hanks. And the movie Zoltar is named after the Zoltan arcade machine from the 1960s.

Posted in Zoltar | Leave a comment