Selecting a Good Item Using Tournament Selection

Most machine learning neural techniques use Calculus gradient descent to minimize error to find a good set of weights for the network being trained. There has been an increase in interest in neuromorphic systems which more closely resemble biological systems. In most situations, you cannot use gradient descent with neuromorphic systems.

An alternative to gradient descent is evolutionary optimization. Evolutionary optimization doesn’t use gradients (but evolutionary techniques require much, much more processing power which is why they’re rarely used).



Evolutionary optimization maintains a population of possible solutions (good weights). In an iterative process, two “good” possible solutions are selected and the combined to create a new, presumably better possible solution. Therefore, one of the many sub-problems when using evolutionary optimization is selecting “good” possible solutions.

At any point in time, you don’t want to always pick the two best solutions because a non-best possible solution could have good characteristics.

There are several techniques to choose good, but not necessarily best items. The most common techniques are roulette wheel selection and tournament selection.

Suppose you have 10 possible solutions, and their associated errors are [0.1, 0.2, . . 1.0]. So the best solution is at [0] and the second best is at [1] and so on. To use tournament selection, you select a random subset, and then pick the best from the random subset. Suppose you set the percentage of the subset to 0.4 (40%). This is often called the tau value. Then suppose the 40% random subset items are [5, 3, 6, 4] and so the associated errors are [0.6, 0.4, 0.7, 0.5]. From this subset, you’d pick item [3] because it has the smallest error.

The tau values controls selection pressure. If tau = 1.0 you always examine all the items an so you’ll always get the best item. If tau is small, say 20%, then you have a much greater chance of getting the non-best item.

Here is some C# code (where I’ve replaced less-than operators so my blog software doesn’t freak out):

static int Select(double[] errors, double tau, Random rnd)
{
  // pick best from a random tau-percent of population
  int popSize = errors.Length;
  int numItems = (int)(popSize * tau);
  int[] allIndices = new int[popSize];
  for (int i = 0; i less popSize; ++i)
    allIndices[i] = i;
  Shuffle(allIndices, rnd);

  int bestIdx = allIndices[0];
  double bestErr = errors[allIndices[0]];
  for (int i = 0; i less numItems; ++i) {
    int idx = allIndices[i];
    if (errors[idx] less bestErr) {
      bestIdx = idx;
      bestErr = errors[idx];
    }
  }
  return bestIdx;
}

The idea is to use a Shuffle() function to scramble the order of the indices to pick a few randomly. Shuffle() uses the Fisher-Yates mini-algorithm:

static void Shuffle(int[] vec, Random rnd)
{
  int n = vec.Length;
  for (int i = 0; i less n; ++i) {
    int ri = rnd.Next(i, n);
    int tmp = vec[ri];
    vec[ri] = vec[i];
    vec[i] = tmp;
  }
}

For evolutionary optimization, you want two good items that are not the same:

static int[] SelectTwo(double[] errors, double tau, Random rnd)
{
  int[] result = new int[2];
  int ct = 0;
  result[0] = Select(errors, tau, rnd);
  while ((result[1] = Select(errors, tau, rnd)) == result[0] and
    ct less 100)
    ++ct;
  return result;
}

Here I just use brute force to repeatedly pick a second item until it’s not the same as the first. I set a sanity stop of 100 tries.

Notice the SelectTwo() function calls Select() which calls Shuffle(). When writing complex software, it’s usually a good idea to mask complexity by refactoring into helper functions.



The Venice Carnival has featured beautiful masks and costumes since the 12th century.

Posted in Machine Learning | 1 Comment

Machine Learning Perceptron Classification Using C#

I wrote an article titled “Machine Learning Perceptron Classification Using C#” in the January 2020 issue of Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2020/01/07/perceptron-classification.aspx.

Perceptron classification is arguably the most rudimentary machine learning (ML) technique. The perceptron technique can be used for binary classification, for example predicting if a person is male or female based on numeric predictors such as age, height, weight, and so on.

From a practical point of view, perceptron classification is useful mostly to provide a baseline result for comparison with more powerful ML techniques such as logistic regression and k-nearest neighbors. Perceptron classification is also interesting from a historical point of view as a predecessor to neural networks.

Perceptron classification is quite simple to implement but the technique only works well with simple data that is completely, or nearly, linearly separable.

In my article, I show a demo with a 10-item subset of the well-known Banknote Authentication dataset. The goal is to predict if a banknote (think euro or dollar bill) is authentic (coded -1) or a forgery (coded +1) based on four predictor values (image variance, skewness, kurtosis, and entropy).

My demo uses a variation of perceptron classification called averaged perceptron.

Although perceptron classification is simple and elegant, logistic regression is only slightly more complex and usually gives better results.

Some of my colleagues have asked me why averaged perceptron classification is part of the new ML.NET library. As it turns out, averaged perceptron was the first classifier algorithm implemented in the predecessor to ML.NET library, an internal Microsoft library from Microsoft Research named TMSN, which was later renamed to TLC. The averaged perceptron classifier was implemented first because it is so simple. The average perceptron classifier was retained from version to version, not because of its practical value, but because removing it would require quite a bit of effort.



The word “perceptron” was derived from “perception”. Here are three random images from an Internet search for “perception art”.

Posted in Machine Learning | 1 Comment

NFL 2019 Week 21 (Super Bowl) Prediction – Zoltar Picks the Chiefs Over the 49ers

Zoltar is my NFL prediction computer program. It uses a deep neural network and reinforcement learning. Here are Zoltar’s predictions for week #21 (the Super Bowl) of the 2019 NFL season:

Zoltar:      chiefs  by    3  dog = fortyniners    Vegas:      chiefs  by    1 

Both Zoltar and Las Vegas slightly favor the Kansas City Chiefs over the San Francisco 49ers.

Zoltar theoretically suggests betting when the Vegas line is more than 3.0 points different from Zoltar’s prediction. Therefore, Zoltar doesn’t really have a recommendation for this game. But, if Zoltar was forced to pick, he’d say bet on the Chiefs. Such a bet would pay off if the Chiefs win by more than 1 point (in other words, 2 points or more). If the Chiefs win by exactly 1 point the bet is a push. Any other result would be a loss of the bet.

===

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better.

In week #20 Zoltar went 0-0 against the Vegas point spread because he had no hypothetical recommendations.

For the 2019 season, through week #20, Zoltar is an OK 54-34 (61% accuracy) against the Vegas spread.

Just for fun, I track how well Zoltar and Las Vegas do when trying to predict only which team will win (but not by how much). This isn’t useful except for parlay betting.

Just predicting winners, Zoltar was 2-0. Las Vegas was also 2-0 last week. Both Zoltar and Vegas correctly picked the Chiefs to beat the Titans, and the 49ers to beat the Packers.

For the season, just picking winners, Zoltar is a pretty decent 178-85 (67%) and Vegas is also pretty good at 170-90 (65%).

Note: Vegas has fewer games than Zoltar because Vegas had three pick’em games. Also there was one tie game in week #1 (Lions at Cardinals).



My system is named after the Zoltar fortune teller machine you can find in arcades. Arcade Zoltar is named after the magic machine in the 1988 fantasy movie “Big” starring Tom Hanks. And the movie machine was probably named after the 1960s era Zoltan fortune teller arcade machine.

Posted in Zoltar | Leave a comment

SAT Math Scores By Race 2000 to 2019

Because I used to be a university professor, I’m interested in many aspects of education. The SAT college admission test results for 2019 were recently published so I thought I’d take a look at the math scores and compare them with previous years.

I instantly ran into an unexpected problem — I could not find summary data anywhere. After spending a lot of time looking, I got mildly irritated. I decided that I wouldn’t allow myself to be defeated in my quest for data.

The SAT organization publishes an annual summary report in PDF format every year. I found and opened each annual report from 2000 to 2019 and then manually extracted the SAT math scores, dropped the numbers into Excel and made a graph. The process was quite time-consuming.


SAT math scores from 2000 – 2019 are remarkably stable.

The first thing I noticed is that SAT math scores are remarkably stable over time. The uptick in scores starting in 2017 was due to a change in the SAT test, not a sudden surge of math ability in high school seniors. Put another way, all efforts that have been aimed at reducing the achievement gap between groups have had virtually no effect whatsoever. Interestingly, a few years ago it was speculated that family income has a great effect on math achievement but that hypothesis/myth has been thoroughly debunked – the poorest-family majority race students score much higher on math than the richest-family minority students. What this means is anybody’s guess.

The second thing I noticed is that the SAT people stopped breaking down scores by race and gender. For all years before 2017 the annual reports were broken down so you could see the scores of by race and gender, but from 2017 onward, gender was combined for each race. Why this change to reduce information occurred is beyond me, but changes in reporting like this are usually motivated by political factors rather than math factors. Perhaps the fact that Black females consistently score dramatically lower in math than other groups is not a politically happy result.

      wm   wf  white bm   bf  black
2000  549  514  530  436  419  426
2001  550  515  531  436  419  426
2002  552  517  533  438  419  427
2003  552  518  534  436  420  426
2004  550  514  531  438  420  427
2005  554  520  536  442  424  431
2006  555  520  536  438  423  429
2007  553  519  534  437  423  429
2008  555  521  537  434  420  426
2009  555  520  536  435  420  426
2010  555  519  536  436  422  428
2011  552  520  535  435  422  427
2012  554  520  536  436  422  428
2013  552  519  534  436  423  429
2014  552  519  534  435  423  429
2015  551  518  534  435  422  428
2016  550  518  533  430  422  425
2017  na   na   553  na   na   462
2018  na   na   557  na   na   463
2019  na   na   553  na   na   457

There’s no big moral to this story. The point is that even in a digital age, sometimes data is difficult to access. And in the end, numbers are just numbers; applying statistics that describe a group to an individual person is rarely a good idea.


Through 2016, SAT reported scores by race and gender (for example, the 2003 report is on the left) but starting in 2017 scores were combined by race (2017 report on right). Why the SAT people did this is a mystery to me.

Posted in Miscellaneous | Leave a comment

The Relationship Between Logistic Sigmoid and Softmax for Logistic Regression

The logistic sigmoid and softmax functions are closely related mathematically, but the relationship is much more complex than most Internet sources imply.

I was looking at multiclass logistic regression recently. Regular logistic regression is a binary classification technique, for example, predicting if a person is male (0) or female (1) based on predictors/features such as height, shoe size, income, and so on.



This demo shows that it’s possible to use Softmax for binary logistic regression but you have to hack a bit by using a dummy set of 0-value weights and bias — not a good idea.


Multiclass logistic regression is an extension that can predict a variable that can be one of three or more values, for example, predicting is a person is a political conservative, moderate, or liberal.

Note: The word “multiclass” is not a dictionary word so it should really be spelled as “multi-class” with a hyphen. But, as is often the case, machine learning terminology ignores convention and created a term on the fly. I find the habit of researchers and engineers creating words to be quite annoying.

For regular logistic regression, suppose you have four predictors (x0, x1, x2, x3). The output is computed like so:

z = (w0 * x0) + (w1 * x1) + (w2 * x1) + (w3 * x3) + b
p = logsig(-z)

where w0, w1, w2, w3 are weights and b is the bias. The p value will be between 0 and 1. The generic logsig(a) function is:

logsig(a) = 1.0 / (1.0 + exp(-a))

Notice you have to be extremely careful with the minus signs.

Suppose you have three classes. Multiclass logistic regression output is computed as:

z0 = (w00 * x0) + (w10 * x1) + (w20 * x1) + (w30 * x3) + b0
z1 = (w01 * x0) + (w11 * x1) + (w21 * x1) + (w31 * x3) + b1
z2 = (w02 * x0) + (w12 * x1) + (w22 * x1) + (w32 * x3) + b2
P = softmax(z0, z1, z2)

Here w is a weights matrix where the first index represents the predictor and the second index is the class. So w31 is the weight for predictor [3] and class [1]. The P result is a vector with three values that sum to 1 so that they can be interpreted as probabilities. The generic softmax function for three values is defined as:

sum = exp(z0) + exp(z1) + exp(z2)
P0 = exp(z0) / sum
P1 = exp(z1) / sum
P2 = exp(z2) / sum 

Notice there are no minus signs here.

Now, as it turns out, there is a very close but complex mathematical relationship between logistic sigmoid and softmax. (The Wikipedia article on logistic regression explains it quite well). It’s possible to use variations of logistic sigmoid or softmax for either binary or multiclass logistic regression, but from an engineering perspective, for binary logistic regression you should use logistic sigmoid and for multiclass logistic regression you should use softmax. Period.



Three illustrations by artist Klaus Burgle (1926 -2015). He did many German science fiction book and magazine covers in the 1950s and 1960s.

Posted in Machine Learning | Leave a comment

The 2020 Visual Studio Live is Coming to Las Vegas Soon

Visual Studio Live is one of my three favorite tech events. There are several VS Live events each year, in different cities such as Dallas, Austin, Atlanta, Orlando, and San Diego. My favorite is Las Vegas.

The 2020 Las Vegas event is just around the corner — March 1-6. See https://vslive.com/Home.aspx.



A couple of screenshots of the event Web site. I’m in the image on the right. I don’t look that good in real life.


Before I describe the details, let me cut to the chase and mention that early registration is good through tomorrow, Friday, Jan. 16, 2020. You can save $400.

VS Live is one of the longest-running technical conferences — I think this is the 27th consecutive year. The fact that VS Live has such longevity is a strong testament to its quality. When I talk to attendees at VS Live, it’s not uncommon for them to tell me they’ve attended many times.

As the name of the event suggests, VS Live is intended primarily for engineers and managers who work with the Microsoft technology stack. Unlike some conference that have a lot of thinly veiled Marketing and Sales content, VS Live is primarily an educational event. I’ve learned a ton of useful and valuable information at every event.



Two photos from last year’s event in Las Vegas.


I mentally categorize the technical events I attend by size: small (200 to 500 attendees), medium (500 to 2000 attendees), large (more than 2000 attendees). Each size has strengths and weaknesses. VS Live falls into my small category and its primary advantage is that the size fosters impromptu conversations with other speakers and attendees, where some of the most interesting information is exchanged.

All good conferences like VS Live are expensive. But VS Live delivers good value for the money in my opinion. It’s usually not feasible to pay for such an event out of pocket, but many companies will fund your attendance as part of training. The conference Web site has a Sell Your Boss page at https://vslive.com/events/las-vegas-2020/information/sell-your-boss.aspx.

In the end, only you can decide if attending VS Live makes sense for you. So I encourage you to check out the Web site and look the agenda over.



Left: “The Hangover Bail Bonds” — “because last night was no movie” – best motto award. Center: “Jesus Christ Bail Bonds” – most optimistic award. Right: “It Wasn’t Me Bail Bonds” – best name award.

Posted in Conferences | 1 Comment

NFL 2019 Week 20 (Conference Championships) Predictions – Zoltar Likes the Underdog Packers

Zoltar is my NFL prediction computer program. It uses a deep neural network and reinforcement learning. Here are Zoltar’s predictions for week #20 (Conference championships) of the 2019 NFL season:

Zoltar:      chiefs  by    6  dog =      titans    Vegas:      chiefs  by  7.5
Zoltar: fortyniners  by    1  dog =     packers    Vegas: fortyniners  by  7.5

Zoltar theoretically suggests betting when the Vegas line is more than 3.0 points different from Zoltar’s prediction. Therefore, for week #20 Zoltar has one hypothetical suggestion.

Zoltar likes the Vegas underdog Green Bay Packers against the San Francisco 49ers. Zoltar thinks the 49ers are just a tiny 1 point better than the Packers, but Las Vegas thinks the 49ers are 7.5 points better than the Packers.

A bet on the Packers will pay off if the Packers win by any score or if the 49ers win but by less than 7.5 points (in other words, 7 points or fewer).

Update: Oops. I forgot to factor in the 49ers home field advantage. Zoltar retracts his recommendation on the Packers.

===

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better.

In week #19 Zoltar went 0-1 against the Vegas point spread. Zoltar incorrectly liked the Vegas underdog Texans against the Chiefs. The Texans jumped off to a huge 21-0 lead and then . . . the rest of the game was not pretty for anyone who bet on the Texans.

For the 2019 season, through week #19, Zoltar is an OK 54-34 (61% accuracy) against the Vegas spread.

Just for fun, I track how well Zoltar and Las Vegas do when trying to predict only which team will win (but not by how much). This isn’t useful except for parlay betting.

Just predicting winners, Zoltar was a good 3-1. Las Vegas was also 3-1 last week. Both Zoltar and Vegas thought the Ravens would beat the Titans but the Titans won handily.

For the season, just picking winners, Zoltar is a pretty decent 176-85 (67%) and Vegas is also pretty good at 168-90 (65%).

Note: Vegas has fewer games than Zoltar because Vegas had three pick’em games. Also there was one tie game in week #1 (Lions at Cardinals).



My system is named after the Zoltar fortune teller machine. Here are two anonymous fortune tellers plus Rita Repulsa who appeared in an Internet image search for “fortune teller”. I think maybe the crystal-like ball in Rita’s staff influenced the result.

Posted in Zoltar | Leave a comment