Binary Classification with Logistic Regression Using ML.NET

I’ve been poking around the ML.NET code library. ML.NET is a C# library that can do classical machine learning (but not neural systems). ML.NET is a very large library and just like most things, it can only be learned by practice.

I tackled a simple binary classification problem where the goal is to predict if a person is Male or not based on their Age, Job (mgmt, tech, sale), Income, and job Satisfaction (low, medium, high). I created a small synthetic set of training data with 40 items. I used Visual Studio to create a C# console application that calls into the ML.NET library’s L-BFGS logistic regression functionality.

Logistic regression is one of the simplest techniques for binary classification. The L-BFGS algorithm is one of several techniques that can be used to train a logistic regression model.

After training the model, I made a prediction for a person with Age = 35, Job = tech, Income = $49,000.00, Satisfaction = medium. The prediction is that isMale = True.

One thing that stands out in my mind is that using the ML.NET library has a very different feel to it than using alternatives such as raw Python, scikit-learn, or PyTorch. Anyway, good fun.

Four images by artist Bill Presing. Presing worked on several well-known animated films including “Ratatouille”, and “Up”.

Posted in Machine Learning | Leave a comment

NFL 2019 Week 3 Predictions – Key Player Injuries Confuse Zoltar

Zoltar is my NFL prediction computer program. It uses a deep neural network and reinforcement learning. Here are Zoltar’s predictions for week #3 of the 2019 NFL season:

Zoltar:      titans  by    1  dog =     jaguars    Vegas:      titans  by  1.5
Zoltar:       bills  by    6  dog =     bengals    Vegas:       bills  by    6
Zoltar:       colts  by    6  dog =     falcons    Vegas:       colts  by  2.5
Zoltar:     cowboys  by    8  dog =    dolphins    Vegas:     cowboys  by 21.5
Zoltar:     packers  by    6  dog =     broncos    Vegas:     packers  by    8
Zoltar:      chiefs  by    6  dog =      ravens    Vegas:      chiefs  by  6.5
Zoltar:     vikings  by    6  dog =     raiders    Vegas:     vikings  by    8
Zoltar:    patriots  by   11  dog =        jets    Vegas:    patriots  by   22
Zoltar:      eagles  by    6  dog =       lions    Vegas:      eagles  by    7
Zoltar:    panthers  by    0  dog =   cardinals    Vegas:    panthers  by  2.5
Zoltar:  buccaneers  by    6  dog =      giants    Vegas:  buccaneers  by  6.5
Zoltar:    chargers  by    5  dog =      texans    Vegas:    chargers  by    3
Zoltar:      saints  by    0  dog =    seahawks    Vegas:    seahawks  by    5
Zoltar:    steelers  by    0  dog = fortyniners    Vegas: fortyniners  by    7
Zoltar:        rams  by    4  dog =      browns    Vegas:        rams  by  2.5
Zoltar:       bears  by    2  dog =    redskins    Vegas:       bears  by    4

Zoltar theoretically suggests betting when the Vegas line is more than 3.0 points different from Zoltar’s prediction. For week #3 Zoltar has one hypothetical suggestions, and four comments.

1. Zoltar likes the Vegas favorite Colts against the Falcons. Zoltar thinks the Colts are 6 points better than the Falcons but Vegas has the Colts favored by just 2.5 points. So, Zoltar believes that the Colts will win and cover the spread (win by 3 or more points).

There are four games where my basic version of Zoltar recommends a wager, but in three of these games there is a key injury to a starting quarterback (Jets, Saints, Steelers), and in one game (Dolphins) there is team chaos and near mutiny.

I’ll have to run advanced Zoltar, which takes injuries into account, to figure out what the advice is for the three injury games.

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better.

Just for fun, I track how well Zoltar does when just trying to predict just which team will win a game. This isn’t useful except for parlay betting.


Zoltar did so-so in week #2. Against the Vegas point spread, Zoltar was 3-2. Zoltar correctly liked Vegas favorite Patriots, and Vegas underdogs Seahawks and Cardinals. But Zoltar missed on Vegas underdogs Redskins and Raiders.

Just predicting winners, Zoltar was a decent 10-6 and Vegas was a good 10-5. (Vegas had one so-called pick’em game, Eagles at Falcons).

My system is named after the Zoltar fortune teller machine. Here, courtesy of an Internet image search for “Zoltar’s girlfriend” are Zoltara, Zoltana, and Esmeralda.

Posted in Zoltar | Leave a comment

Why Machine Learning One-Versus-All is Not a Good Technique

In machine learning, using the one-versus-all technique is almost never a good idea. One-versus-all (OvA) is also called one-versus-rest (OvR), and several other similar terms.

A binary classification problem is one where the goal is to predict something that can be one of just two possible values. For example, predicting if a person is male or female based on predictor variables such as age, annual income, and so on. A multiclass classification problem is one where the thing to predict can be one of three or more possible values. For example, predicting if a person’s political leaning is conservative, moderate or liberal.

Note: In the discussion that follows there are dozens of exceptions, but diving into them would slow my discussion to a crawl.

There are many binary classification algorithms: logistic regression, probit models, winnow classification, support vector machines, etc. But there aren’t nearly as many algorithms that can easily handle multiclass classification problems: neural networks and naive Bayes are the main two.

But neural networks usually require lots of training data and naive Bayes works best with non-numeric predictor values.

The idea of one-versus-all is to attack a multiclass classification problem by using a collection of binary classifiers. The technique is best explained by example. Suppose you want to predict the political leaning of a person (conservative, moderate, liberal). You can create three binary classifiers, using logistic regression for example:

Model 1 predicts conservative or not-conservative
Model 2 predicts moderate or not-moderate
Model 3 predicts liberal or not liberal

Now suppose you want to make a prediction for a new, previously unseen person. Your results of the likelihoods (L) of each class might look like:

Model 1 : L(conservative) = 0.35  L(not-conservative) = 0.65
Model 2 : L(moderate) = 0.55  L(not-moderate) = 0.45
Model 3 : L(liberal) = 0.60  L(not-liberal) = 0.40

You would conclude the person is a political liberal. However, one-versus-all has at least four major problems. First, the technique isn’t feasible if there are many classes to predict. Second, you can only use a binary classifier that emits a numeric likelihood score. To demonstrate that idea, suppose you use a technique that just emits a class. Results could look like:

Model 1 : predicts not-conservative
Model 2 : predicts moderate
Model 3 : predicts liberal

The results are ambiguous. A third problem with OvA is that even if the class labels in your data are evenly distributed, by dividing your data you will almost always end up with significantly unbalanced data. A fourth problem is that because there’s a probability of an error on each classifier, when using multiple classifiers you often get a higher overall probability of error than using just one classifier. (this is similar to the idea in classical statistics of why using multiple t-tests rather than ANOVA is not a good approach).

Many machine learning beginners accept “common knowledge”, such as using one-versus-all, uncritically. The moral of this blog post is: don’t believe everything written about ML, and only use the one-versus-all technique for ML multiclass classification when absolutely necessary.

Weekly World News (WWN) was a black-and-white tabloid sold in supermarkets from the 1970s through the early 2000s. Quite a few readers accepted many of the stories uncritically and believed the stories to be true. Left: This issue included the paradoxical “How to Tell if You Are Psychic”. Center: The story “Mini-Mermaid Found in Tuna Sandwich” seems a bit fishy to me. Right: The “Severed Leg Hops to Hospital” story is part of the WWN legacy.

Posted in Machine Learning | 2 Comments

PyTorch Tanh and a Downside to Open Source Software

I try to write a little bit of code each day. Writing code is a skill that can only be learned by practice, and furthermore, if you don’t practice you will lose your existing skill. I don’t speak any foreign language fluently, but I suspect the same use-it-or-lose-it factor is true there too.

The PyTorch neural network code library is very complex. But the fact that PyTorch is open source makes learning PyTorch even more difficult than it should be. I’ll explain using the tanh function as an example.

The tanh function (hyperbolic tangent) is the most common function used for hidden layer activation for shallow neural networks. The PyTorch library has at least four different ways to use tanh. This is not good and contributes to confusion. Why are there so many ways to call tanh? How are they different? When should each be used?

Here’s the simplest way to call tanh:

import torch as T
import numpy as np

def forward(self, x):
  z = self.hid1(x)
  z = T.tanh(z)  # 1. ordinary function
  z = T.nn.Softmax(0)(self.oupt(z))
  return z

When PyTorch was first released, this was the way to call tanh — simple and effective. But with open source, there’s no penalty to the maintainers of the library when they make arbitrary changes. So they make many unnecessary changes.

So at some point the preferred technique became:

  z = T.nn.functional.tanh(z) 

But then the preferred technique changed again to a class:

  z = T.nn.Tanh()(z)

On top of this, some of the very early documentation used a low-level approach:

  for j in range(4):
    z[j] = np.tanh(z[j].item())

This has lead to documentation and examples that differ significantly even on something as fundamental as a simple trigonometry function.

In code libraries that are maintained and curated by a company or central authority, there are releases only about once or twice a year, and the releases are usually well thought out. Open source software spews releases continuously which sometimes leads to bad decisions, poorly thought out design, many revisions, and mountains of irrelevant documentation.

Left: A boating bad decision. Center: A girl like this trying to ride a motorcycle like that — a bad decision that did not end well a few seconds after the photo was taken. Right: Amazon founder Jeff Bezos has set an epic standard for Bad Decision.

Posted in PyTorch | Leave a comment

A Recap of the 2019 CEC Conference

I spoke at the 2019 CEC (Casino/Cloud eSports Conference) event. The conference ran September 4-5 and was in Las Vegas. The conference was small (maybe about 150 people) but was a very good event from my perspective because both the speakers and attendees were very knowledgeable about eSports (professional video game players competing against each other). See

If I had to summarize the event in a sentence, I’d say it was an event with forward-looking people thinking about how eSports can be monetized. Most of the talks I listened to focused on the huge obstacles facing eSports. There are the obvious legal and regulatory challenges. In this area, several speakers from the University of Nevada and various Nevada government agencies had lots of interesting and useful information.

The CEC Conference was held at the Luxor Hotel. The Luxor has one of the largest eSports facilities in the United States.

Another significant challenge is that the young people who play and watch eSports are just very, very different from the traditional audience that wagers on sports such as NFL football. According to anecdotal evidence, eSports participants have no need or desire to go to a casino or anywhere else other than their home computer in order to play their games.

I sat on a panel with four other members plus a speaker/moderator. All the panelists had quite different backgrounds, and all had interesting things to say. But, in short, there are far more questions than answers when it comes to monetizing eSports in brick-and-mortar environments.

Left: I explain deep reinforcement learning. Center: The theme of the panel I sat on was innovation. Right: I show a Zoltar run.

As someone who works in research, I was clearly an outsider. In my time slot, I educated attendees about the connection between deep reinforcement learning and gaming. In particular I described AlphaZero (the RL chess program that revolutionized computer chess) and AlphaStar (the first program to defeat expert-level human StarCraft players).

By coincidence, my talk was on Thursday, September 5, just a few hours before the first game of 2019 NFL professional football season featuring the Chicago Bears against the Green Bay Packers. I showed attendees a sample run of my Zoltar prediction program which predicted that the Bears would beat the Packers by more than the Las Vegas point spread of 3.0 points. (The Bears went on to lose by a score of 10-3, playing one of the worst games I’ve seen in a long time.)

When I was invited to speak at the CEC conference, my first thought was to decline because I’m not familiar with the eSports world. But my friends and colleagues who work at Microsoft Xbox encouraged me to go to CEC because they feel that eSports has tremendous potential for growth and monetization. But as I discovered, there are no clear paths and it’s my hunch that eSports will grow organically and unpredictably.

The 2019 CEC event was definitely a good use of my time and I will attend next year’s event if I can. If you are interested in eSports in any way, I recommend that you check out the CEC event.

Posted in Conferences | Leave a comment

Feature Engineering and Machine Learning

Suppose you want to predict a person’s annual income based on their number years of experience, age, number years education, and so on. In classical statistics it’s common to spend a lot of time on feature engineering — deciding which predictors to use and which to not use, and creating derived predictors from raw predictors. One example might be creating an “age-education” variable which is the square root of the age times the years of education.

But in neural prediction systems it’s quite rare to perform lots of feature engineering. The idea is that during training, the neural system will figure out which predictors aren’t important and assign very small weights, and because of the neural activation function, non-linear combinations of predictor values are being created.

This morning (as I write this post) I decided to do some feature engineering on the airline passenger dataset to verify that it doesn’t work well. This is a time series regression problem where goal is to predict the number of airline passengers. A data setup for straightforward approach looks like:

|curr 1.12 1.18 1.32 1.29 |next 1.21
|curr 1.18 1.32 1.29 1.21 |next 1.35
|curr 1.32 1.29 1.21 1.35 |next 1.48
. . .

The first line means there were (112,000 118,000 132,000 129,000) passengers in months 1-4 and 121,000 passengers in month 5. Using this approach gives pretty good results with a standard neural network, and not-as-good results using a more sophisticated LSTM recurrent network. I created a feature engineering derived dataset:

|curr 1.12 1.18 1.32 1.29 |next_pct 1.0804 |next_raw 1.21
|curr 1.18 1.32 1.29 1.21 |next_pct 1.1441 |next_raw 1.35
|curr 1.32 1.29 1.21 1.35 |next_pct 1.1212 |next_raw 1.48
. . .

Instead of predicting the raw passenger count, I predicted the percentage increase based on the first value in the sequence. The first line means that in month 5, the passenger count was 1.0804 times 1.12, which is 1.21.

Anyway, after thrashing around a bit with a PyTorch LSTM network I got some results. The results are a bit difficult to interpret but overall the feature engineering approach I tried doesn’t appear like a promising approach — as expected.

Things like this happen all the time. In the field of machine learning, you spend a lot of time creating systems that just don’t work well. An important mindset for success is dealing with the failures that are much more common than the successes.

Dealing with failure is common across many fields. My friends who are in sales have the ability to let their form of failure (not making a sale) not affect them. Good baseball players fail more than half the time when batting but don’t dwell on the failures. And so on.

There is a lot of research evidence that indicates that women fear failure much more than men. For example, see “Gender Differences in Fear of Failure amongst Engineering Students” by Nelson, Newman, McDaniel, Buboltz. This fear causes women to quickly drop out of computer science and engineering classes. On the other hand, fashion models seem to have little fear of fashion failure.

Posted in Machine Learning | 1 Comment

NFL 2019 Week 2 Predictions – Zoltar Has Five Dubious Suggestions

Zoltar is my NFL prediction computer program. It uses a deep neural network and reinforcement learning. Here are Zoltar’s predictions for week #2 of the 2019 NFL season:

Zoltar:    panthers  by    6  dog =  buccaneers    Vegas:    panthers  by  6.5
Zoltar:     bengals  by    4  dog = fortyniners    Vegas:     bengals  by    1
Zoltar:    chargers  by    4  dog =       lions    Vegas:    chargers  by  2.5
Zoltar:     vikings  by    0  dog =     packers    Vegas:     packers  by  2.5
Zoltar:      texans  by    9  dog =     jaguars    Vegas:      texans  by    9
Zoltar:    patriots  by    2  dog =    dolphins    Vegas:    patriots  by 18.5
Zoltar:       bills  by    0  dog =      giants    Vegas:       bills  by    2
Zoltar:      titans  by    4  dog =       colts    Vegas:      titans  by    3
Zoltar:    seahawks  by    0  dog =    steelers    Vegas:    steelers  by  3.5
Zoltar:      ravens  by   10  dog =   cardinals    Vegas:      ravens  by 13.5
Zoltar:     cowboys  by    0  dog =    redskins    Vegas:     cowboys  by  4.5
Zoltar:      chiefs  by    4  dog =     raiders    Vegas:      chiefs  by  9.5
Zoltar:       bears  by    3  dog =     broncos    Vegas:       bears  by    1
Zoltar:        rams  by    2  dog =      saints    Vegas:        rams  by    3
Zoltar:      eagles  by    0  dog =     falcons    Vegas:      eagles  by    0
Zoltar:      browns  by    0  dog =        jets    Vegas:      browns  by  2.5

Zoltar theoretically suggests betting when the Vegas line is more than 3.0 points different from Zoltar’s prediction. For week #2 Zoltar has five hypothetical suggestions. All of them are highly questionable because Zoltar doesn’t have much data yet.

1. Zoltar likes the Vegas underdog Dolphins against the Patriots. Zoltar thinks the Patriots are just 2 points better than the Dolphins but Vegas has the Patriots favored by 18.5 points. So, Zoltar believes that the Patriots will not cover the spread. No human would bet this way — in week #1 the Dolphins were obliterated by Baltimore 59 – 10 while the Patriots destroyed a good Steelers team 33 – 3. Update: The version of Zoltar that deals with big mismatches using transitivity recommends betting the farm on the Patriots.

2. Zoltar likes the Vegas underdog Seahawks against the Steelers. Zoltar believes the two teams are evenly matched but Vegas has the Steelers better by 3.5 points. A bet on the Seahawks will pay if the Seahawks win by any score, or if the Steelers win by 3 points or less.

3. Zoltar likes the Vegas underdog Cardinals against the Ravens. Zoltar thinks the Ravens are 10 points better than the Cardinals but Vegas thinks the Ravens are 13.5 points better.

4. Zoltar likes the Vegas underdog Redskins against the Cowboys. Zoltar thinks the teams are evenly matched but Vegas thinks the Cowboys are 4.5 points better than the Redskins.

5. Zoltar likes the Vegas underdog Raiders against the Chiefs. Zoltar thinks the Chiefs are 4 points better than the Raiders but Vegas thinks the Chiefs are 9.5 points better.

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better.

Just for fun, I track how well Zoltar does when just trying to predict just which team will win a game. This isn’t useful except for parlay betting.

Zoltar sometimes predicts a 0-point margin of victory. There are six such games in week #2

Vikings at Packers
Bills at Giants
Seahawks at Steelers
Cowboys at Redskins
Eagles at Falcons
Browns at Jets

In these situations, just to pick a winner so I can track raw number of correct predictions, in the first four weeks of the season, Zoltar picks the home team to win. After week 4, Zoltar uses historical data for the current season (which usually, but not always, ends up in a prediction that the home team will win). Update: The Jets starting quarterback will be out, which Zoltar values as 3 points, so the Browns are predicted to win.


Zoltar did so-so in week #1. Against the Vegas point spread, Zoltar was 3-3. Zoltar correctly liked underdogs Titans, Bengals, and Redskins but incorrectly picked favorites Bears and Buccaneers, and underdog Dolphins.

Just predicting winners, Zoltar was a decent 10-5 which is pretty good for the first week. (There was one tie game – Lions at Cardinals).

By coincidence, I was speaking at a conference in Las Vegas on September 5 for the first game of the 2019 season, featuring the Green Bay Packers at the Chicago Bears. I went to the sports book at the Mandalay Bay where my conference was. I picked up a sheet of proposition bets (side bets). The final score was Packers 10 – Bears 3. A proposition bet on Total Points = 11-17 paid off 18 to 1.

Posted in Zoltar | Leave a comment