Traveling Salesman Problem Combinatorial Optimization Using an Evolutionary Algorithm with C#

A combinatorial optimization problem is one where the goal is to place items in a correct order. The classic example is the Traveling Salesman Problem (TSP). Suppose you have n = 20 cities that are numbered 0 to 19. There is a distance between each pair of cities. You want to find the shortest path that visits all cities. For example, one possible solution might be [19, 5, 0, 2, 4, . . 11].

Combinatorial optimization problems are very difficult. There are 20! = 2,432,902,008,176,640,000 possible solutions if distance(A,B) != distance(B,A). This is a huge number and so you can’t try every permutation.

The most common technique for combinatorial optimization is to use Simulated Annealing (SA). See https://jamesmccaffrey.wordpress.com/2021/12/01/the-traveling-salesman-problem-using-simulated-annealing-in-csharp/

I wondered if I could design a technique for combinatorial optimization that uses an evolutionary algorithm. After a lot of work, I got a demo up and running.

In very high level pseudo-code, an evolutionary algorithm looks like:

create a population of random solutions
loop many times
  pick two existing solutions
  make a child solution
  mutate the child
  replace a poor solution with new child
end-loop
return best solution found

There were several rather tricky details. The most difficult sub-problem was finding a way to combine two permutation solutions to create a child solution.

I created a demo for the TSP with n = 20 cities. I set up a city distances mapping where distance(A, B) = 1 if A and B are adjacent and A is less than B, and distance(B, A) = 1.5 if A is greater than B. Therefore, the optimal permutation is [0, 1, 2, . . 19] with total distance of 1 + 1 + . . + 1 = 19.

I used a population size of 6 and looped for 1,000 iterations. The algorithm found a very good (but not optimal) solution of [0 1 2 5 14 4 3 6 8 18 19 16 17 15 13 12 11 10 9 7] with total distance of 1 + 1 + 3 + 9 + 15.0 + 1.5 + 3 + 2 + 10 + 1 + 4.5 + 1 + 3 + 3 + 1.5 + 1.5 + 1.5 + 1.5 + 3.0 = 67.0.



Playing cards are all about permutations and combinatorial optimization. When I was a college student at UC Irvine, my roommate Ed Koolish and I often drove to Las Vegas to play Blackjack — we knew how to card-count. Here’s an image from a project that uses machine learning to recognize cards.


Demo code. Replace “lt”, “gt” with Boolean operator symbols.

using System;
namespace TSP_Evolutionary
{
  internal class TSP_EvoProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nBegin evolutionary optimization ");
      Console.WriteLine("Setting TSP n = 20 cities ");
      Console.WriteLine("20! = 2,432,902,008,176,640,000 ");
      Console.WriteLine("Optimal soln is 0 1 2 . . 19 dist = 19");
      
      int numCities = 20;
      int numPop = 6;

      double[][] distances = GetDistances(numCities);

      Solver solver = new Solver(numCities, numPop, 
        distances, seed: 99);
      Console.WriteLine("\nInitial population: ");
      solver.Show();

      Console.WriteLine("\nBegin search ");
      solver.Solve(1000);
      Console.WriteLine("Done ");

      Console.WriteLine("\nFinal population: ");
      solver.Show();

      Console.WriteLine("\nEnd demo ");
      Console.ReadLine();
    } // Main()

    static double[][] GetDistances(int n)
    {
      // in non-demo, these come from a file
      double[][] result = new double[n][];
      for (int i = 0; i "lt" n; ++i)
        result[i] = new double[n];
      for (int i = 0; i "lt" n; ++i)
        for (int j = 0; j "lt" n; ++j)
          if (i "lt" j) result[i][j] = (j - i) * 1.0;
          else if (i "gt" j) result[i][j] = (i - j) * 1.5;
      return result;
    }
  } // Program

  class Solver
  {
    public Random rnd;
    public int numCities; 
    public int numPop; // should be even

    public double[][] distances;

    public int[][] pop;  // array of solns[]
    public double[] errs;

    public int[] bestSoln;
    public double bestErr;

    public Solver(int numCities, int numPop,
      double[][] distances, int seed)
    {
      this.rnd = new Random(seed);
      this.numCities = numCities;
      this.numPop = numPop;
      this.bestErr = 0.0;
      this.distances = distances;

      this.pop = new int[numPop][];  // allocate
      for (int i = 0; i "lt" numPop; ++i)
        this.pop[i] = new int[numCities];
      this.errs = new double[numPop];

      for (int i = 0; i "lt" numPop; ++i)  // init
      {
        for (int j = 0; j "lt" numCities; ++j)
        {
          this.pop[i][j] = j;  // [0, 1, 2, . . ]
        }
        this.Shuffle(this.pop[i]);  // 
        this.errs[i] = this.ComputeError(this.pop[i]);
      }

      Array.Sort(this.errs, this.pop); 

      this.bestSoln = new int[numCities];
      for (int j = 0; j "lt" this.numCities; ++j)
        this.bestSoln[j] = this.pop[0][j];
      this.bestErr = this.errs[0];
    } // ctor

    public void Show()
    {
      for (int i = 0; i "lt" this.numPop; ++i)
      {
        for (int j = 0; j "lt" this.numCities; ++j)
        {
          Console.Write(this.pop[i][j] + " ");
        }
        Console.WriteLine(" | " + 
          this.errs[i].ToString("F4"));
      }

      Console.WriteLine("-----");
      for (int j = 0; j "lt" this.numCities; ++j)
        Console.Write(this.bestSoln[j] + " ");
      Console.WriteLine(" | " + 
        this.bestErr.ToString("F4"));
    }

    public void Shuffle(int[] soln)
    {
      // Fisher-Yates algorithm
      int n = soln.Length;
      for (int i = 0; i "lt" n; ++i)
      {
        int ri = this.rnd.Next(i, n);  // random index
        int tmp = soln[ri];
        soln[ri] = soln[i];
        soln[i] = tmp;
      }
    }

    public double ComputeError(int[] soln)
    {
      double d = 0.0;  // total distance between cities
      int n = soln.Length;  // aka numCities
      for (int i = 0; i "lt" n - 1; ++i)
      {
        int fromCity = soln[i];
        int toCity = soln[i + 1];
        d += this.distances[fromCity][toCity];
      }
      return d;  
    }

    public int[] MakeChild(int idx1, int idx2)  // crossover
    {
      int[] parent1 = this.pop[idx1];
      int[] parent2 = this.pop[idx2];
      int[] result = new int[this.numCities];
      int[] used = new int[this.numCities];
      int[] candidates = new int[2 * this.numCities];

      int k = 0;
      for (int i = 0; 
        i "lt" this.numCities / 2; ++i)  // left parent1
      {
        candidates[k++] = parent1[i];
      }

      for (int i = this.numCities / 2;
        i "lt" this.numCities; ++i)  // right parent2
      {
        candidates[k++] = parent2[i];
      }

      for (int i = 0; 
        i "lt" this.numCities / 2; ++i)  // left parent2
      {
        candidates[k++] = parent2[i];
      }

      for (int i = this.numCities / 2;
        i "lt" this.numCities; ++i)  // right parent1
      {
        candidates[k++] = parent1[i];
      }

      k = 0;
      for (int i = 0; i "lt" this.numCities; ++i)
      {
        while (used[candidates[k]] == 1) 
          k += 1;
        result[i] = candidates[k];
        used[candidates[k]] = 1;
      }

      return result;
    } // MakeChild

    public void Mutate(int[] soln)
    {
      int idx = this.rnd.Next(0, this.numCities - 1);
      int tmp = soln[idx];
      soln[idx] = soln[idx+1];
      soln[idx+1] = tmp;
    }

    public void Solve(int maxGen)
    {
      for (int gen = 0; gen "lt" maxGen; ++gen)
      {
        // 1. pick parent indexes
        int idx1, idx2;
        int flip = this.rnd.Next(2);
        if (flip == 0)
        {

          idx1 = this.rnd.Next(0, this.numPop / 2);
          idx2 = this.rnd.Next(this.numPop / 2, this.numPop);
        }
        else
        {
          idx2 = this.rnd.Next(0, this.numPop / 2);
          idx1 = this.rnd.Next(this.numPop / 2, this.numPop);
        }

        // 2. create a child
        int[] child = MakeChild(idx1, idx2);

        // 3. mutate
        Mutate(child);
        double childErr = this.ComputeError(child);
        
        // 4. found new best?
        if (childErr "lt" this.bestErr)
        {
          Console.WriteLine("New best soltn found at gen " +
            gen);
          for (int i = 0; i "lt" child.Length; ++i)
            this.bestSoln[i] = child[i];
          this.bestErr = childErr;
        }

        // 4. replace weak soln
        int idx = this.rnd.Next(this.numPop / 2, this.numPop);
        this.pop[idx] = child;
        this.errs[idx] = childErr;

        // 5. create an immigrant
        int[] imm = new int[this.numCities];
        for (int i = 0; i "lt" this.numCities; ++i)
          imm[i] = i;
        this.Shuffle(imm);
        double immErr = this.ComputeError(imm);

        // found new best?
        if (immErr "lt" this.bestErr)
        {
          Console.WriteLine("New best (immigrant) at gen " +
            gen);
          for (int i = 0; i "lt" imm.Length; ++i)
            this.bestSoln[i] = imm[i];
          this.bestErr = immErr;
        }

        // 4. replace weak soln
        idx = this.rnd.Next(this.numPop / 2, this.numPop);
        this.pop[idx] = imm;
        this.errs[idx] = immErr;

        // 5. sort the new population
        Array.Sort(this.errs, this.pop);

      } // gen
    } // Solve

  } // Solver

} // ns
Posted in Machine Learning | 1 Comment

NFL 2022 Week 11 Predictions – Zoltar Thinks the Colts Will Beat the Eagles

Zoltar is my NFL football prediction computer program. It uses reinforcement learning and a neural network. Here are Zoltar’s predictions for week #11 of the 2022 season.

Zoltar:     packers  by    4  dog =      titans    Vegas:     packers  by    3
Zoltar:     falcons  by    5  dog =       bears    Vegas:     falcons  by    3
Zoltar:       bills  by    6  dog =      browns    Vegas:       bills  by  9.5
Zoltar:       colts  by    4  dog =      eagles    Vegas:      eagles  by    8
Zoltar:  commanders  by    0  dog =      texans    Vegas:  commanders  by  2.5
Zoltar:        rams  by    0  dog =      saints    Vegas:      saints  by  3.5
Zoltar:    patriots  by    6  dog =        jets    Vegas:    patriots  by    3
Zoltar:      giants  by    4  dog =       lions    Vegas:      giants  by    3
Zoltar:      ravens  by    6  dog =    panthers    Vegas:      ravens  by 11.5
Zoltar:     raiders  by    0  dog =     broncos    Vegas:     broncos  by  2.5
Zoltar:     cowboys  by    0  dog =     vikings    Vegas:     cowboys  by    1
Zoltar:      chiefs  by    0  dog =    chargers    Vegas:      chiefs  by    7
Zoltar:    steelers  by    2  dog =     bengals    Vegas:     bengals  by    5
Zoltar:   cardinals  by    1  dog = fortyniners    Vegas: fortyniners  by    8

Zoltar theoretically suggests betting when the Vegas line is “significantly” different from Zoltar’s prediction. For this season I’ve been using a threshold of 4 points difference but in some previous seasons I used 3 points.

At the beginning of the season, because of Zoltar’s initialization (all teams regress to an average power rating) and other algorithms, Zoltar is very strongly biased towards Vegas underdogs. I probably need to fix this. For week #11 Zoltar likes five Vegas underdogs:

1. Zoltar likes Vegas underdog Colts against the Eagles
2. Zoltar likes Vegas underdog Panthers against the Ravens
3. Zoltar likes Vegas underdog Chargers against the Chiefs
4. Zoltar likes Vegas underdog Steelers against the Bengals.
5. Zoltar likes Vegas underdog Cardinals against the 49ers.

For example, a bet on the underdog Colts against the Eagles will pay off if the Colts win by any score, or if the favored Eagles win but by less than 8 points (i.e., 7 points or less). If a favored team wins by exactly the point spread, the wager is a push. This is why point spreads often have a 0.5 added — called “the hook” — to eliminate pushes.

Zoltar’s prediction that the Colts will beat the Eagles is mystifying to me. The Eagles loss to the Commanders in week #10 must have hurt the Eagles a lot, at least in terms of Zoltar’s algorithms.

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better.

In week #10, against the Vegas point spread, Zoltar went 4-1 (using 4.0 points as the advice threshold). I was impressed that Zoltar picked the underdog Steelers and underdog Packers to win outright, which both teams did.

For the season, against the spread, Zoltar is 36-17 (~68% accuracy).

Just for fun, I track how well Zoltar does when just trying to predict just which team will win a game. This isn’t useful except for parlay betting. In week #10, just predicting the winning team, Zoltar went only 8-6 which isn’t very good — just slightly better than a coin flip. Vegas was even worse at 6-8 — slightly worse than a coin flip at just predicting the winning team.

Zoltar sometimes predicts a 0-point margin of victory. There are five such games in week #11. In those situations, to pick a winner (only so I can track raw number of correct predictions) in the first few weeks of the season, Zoltar picks the home team to win. After that, Zoltar uses his algorithms to pick a winner.


My system is named after the Zoltar fortune teller machine you can find in arcades. Coin-operated fortune teller machines have been around for well over 100 years.

Left: The Blind Man fortune teller machine was made about 1920 by Walter Hart of Ramsgate, England. It accepted a penny. The blind man spins and his cane points to a message such as, “Go, get thee a wife. She’ll mind thy folly.”

Right: The Columbian Fortune Teller was manufactured about 1892 by the World’s Fair Slot Machine Company. It was a combination fortune teller and early slot machine. It accepted a nickel, which was quite a bit of money back then — roughly equivalent to about $5.00 today.


Posted in Zoltar | Leave a comment

“Regression Using PyTorch, Part 1: New Best Practices” in Visual Studio Magazine

I wrote an article titled “Regression Using PyTorch, Part 1: New Best Practices” in the November 2022 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2022/11/01/pytorch-regression.aspx.

A regression problem is one where the goal is to predict a single numeric value. For example, you might want to predict the annual income of a person based on their sex, age, state where they live and political leaning (conservative, moderate, liberal). In my article I explain how to prepare training data and how to design a PyTorch neural network for regression.

My demo begins by loading a 200-item file of training data and a 40-item set of test data. Each tab-delimited line represents a person. The fields are sex (male = -1, female = +1), age, state of residence, annual income and politics type. The goal is to predict income from sex, age, state, and political leaning.

After the training data is loaded into memory, the demo creates an 8-(10-10)-1 neural network. This means there are eight input nodes, two hidden neural layers with 10 nodes each, and one output node.

The demo uses a batch size of 10, Adam (adaptive momentum) optimization with a learning rate of 0.01, and maximum training epochs of 1,000 passes through the training data.

After 1,000 training epochs, the demo program computes the accuracy of the trained model on the training data as 91 percent (182 out of 200 correct). The model accuracy on the test data is 85 percent (34 out of 40 correct). For regression problem accuracy, you must specify how close a prediction must be to the true value in order to be counted as a correct prediction. For the demo, a predicted income that’s within 10 percent of the true value is counted as a correct prediction.

After evaluating the trained network, the demo predicts the income for a person who is male, 34 years old, from Oklahoma, who is a political moderate. The prediction is $45,392.60.

The neural network definition is:

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(8, 10)  # 8-(10-10)-1
    self.hid2 = T.nn.Linear(10, 10)
    self.oupt = T.nn.Linear(10, 1)

    T.nn.init.xavier_uniform_(self.hid1.weight)
    T.nn.init.zeros_(self.hid1.bias)
    T.nn.init.xavier_uniform_(self.hid2.weight)
    T.nn.init.zeros_(self.hid2.bias)
    T.nn.init.xavier_uniform_(self.oupt.weight)
    T.nn.init.zeros_(self.oupt.bias)

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.tanh(self.hid2(z))
    z = self.oupt(z)  # regression: no activation
    return z

Even though the code isn’t very long, there are a ton of details to understand.



In my mind, there’s a connection between producing something of value and income generation. Therefore, I’m always a bit uneasy when it comes to gambling games. But I enjoy the intellectual challenge of probability and prediction.


Posted in PyTorch | Leave a comment

Combining Two Mathematical Permutations

I was walking my dogs one evening when I thought of a problem: how to combine two mathematical permutations. Suppose:

perm0 = [2 8 4 9 1 6 7 3 0 5]

perm1 = [3 5 1 2 9 8 0 6 7 4]

Here I use 0-based permutations of order = 10. How can you combine the two permutations in a way that retains characteristics of the source permutations, as opposed to just a random result?

Each element in a permutation is unique which makes it a bit tricky to combine them. For example, if you naively take the left half of perm0 and the right half of perm1 the result is [2 8 4 9 1 8 0 6 7 4] which isn’t valid because there are two 8s and two 4s in the result (and no 3 and no 5).

I came up with several algorithms to combine two permutations — most of them incorrect. Eventually (embarrassingly, after several hours) I thought of a simple approach. I create a candidate list that is the first half of perm0, followed by the first half of perm1, followed by the second half of perm0, followed by the second half of perm1. Then iterate through the candidates, fetching the first non-used value.

For the two permutations above, the candidate list is:

2 8 4 9 1 * 3 5 1 2 9 * 6 7 3 0 5 * 8 0 6 7 4

The iteration adds 2, 8, 4, 9, 1, 3, 5 (all unused) but then hits 1 which has already been used, and so advances to 2 (used), 9 (used), then 6 (not used and so adds to result). And so on. The resulting combined permutation is:

[2 8 4 9 1 3 5 6 7 0]

The order in which candidates are created (first half of perm0, first half of perm1, second half of perm0, second half of perm1) is something I need to think about a bit more.

The problem of combining two permutations has been in the back of my mind for a long time. It might be useful for an evolutionary algorithm for combinatorial optimization. But that’s another blog post.



Slot machines are applied permutations. Left: The first true slot machine was called the Liberty Bell. It was designed by Charles Fey in San Francisco, in approximately 1890 (exact date not clear). Center: The first modern slot machine that had automatic payout was the Money Honey by Bally in 1963. Right: The first rudimentary but functional video slot machine was designed by Fortune Coin Company in 1976 in Las Vegas. It was so unique at the time, it didn’t need a name.

Modern slot machines are very sophisticated and practically works of art. I don’t enjoy playing slot machines because there is no decision making by the player. But I do admire the wildly creative designs I see when I’m in Las Vegas.


Demo code.

# combine_perms.py

import numpy as np

# -----------------------------------------------------------

def combine(perm0, perm1):
  n = len(perm0)
  result = np.zeros(n, dtype=np.int64)
  used = np.zeros(n, dtype=np.int64)

  candidates = []
  for i in range(0, n//2):
    candidates.append(perm0[i])
  for i in range(0, n//2):
    candidates.append(perm1[i])
  for i in range(n//2, n):
    candidates.append(perm0[i])
  for i in range(n//2, n):
    candidates.append(perm1[i])

  idx = 0
  for i in range(n):
    while (used[candidates[idx]] == 1):  # advance to unused
      idx += 1
    result[i] = candidates[idx]
    used[candidates[idx]] = 1
  
  return result

# -----------------------------------------------------------

print("\nBegin combine permutations demo ")
rnd = np.random.RandomState(0)
n = 10
perm0 = rnd.permutation(n)
perm1 = rnd.permutation(n)

print("\nperm0: ")
print(perm0)
print("\nperm1: ")
print(perm1)

p = combine(perm0, perm1)
print("\nCombined: ")
print(p)

print("\nEnd demo ")
Posted in Miscellaneous | Leave a comment

My Top Ten Favorite Sappy Songs of the 1960s

I like all kinds of music — rock, classical, popular, and so on. One Sunday morning I decided to capture short clips of ten of my favorite sappy (slow and sentimental) songs from the 1960s. Listed in no particular order.


1. As Tears Go By by Marianne Faithfull (1965)


2. Sally Go Round the Roses by the Jaynetts (1963)


3. Woman of 1000 Years by Fleetwood Mac (1969)


4. You Baby by The Mamas and the Papas (1966)


5. Different Drum by The Stone Poneys (1967)


6. Our National Anthem by Gentle Soul (1968)


7. Anyone Who Had a Heart by Cilla Black (1966)


8. Walking in the Rain by The Ronettes (1964)


9. I Will Follow Him by Peggy March (1963)


10. Anyone Who Knows What Love Is by Irma Thomas (1968)


Posted in Top Ten | Leave a comment

Analyzing PyTorch Using the Bottleneck Utility

I hadn’t used the torch.utils.bottleneck tool for quite some time, so just for hoots I figured I’d see if anything had changed since my last use of the tool. The bottleneck tool analyzes a program run using the Python profiler and the PyTorch autograd profiler.

First I ran a demo multi-class classification program. The goal is to predict a person’s political leaning (conservative, moderate, liberal) from sex, age, state, and income. The program is named people_politics.py and it creates a network, trains it, evaluates accuracy, and use the trained model to make a prediction. See https://jamesmccaffrey.wordpress.com/2022/09/01/multi-class-classification-using-pytorch-1-12-1-on-windows-10-11/.

Using the bottleneck tool was easy. I navigated to the directory that held the PyTorch program and then issued the command:

python -m torch.utils.bottleneck people_politics.py

The -m argument means “run a module”, in this case the bottleneck program that’s part of the torch.utils package.

As is the case with most profilers, the output is large and not so easy to interpret. In general, the idea is to look for extreme values that might mean a problem.

I rarely find it necessary to use the bottleneck tool but it’s nice to have when programs are troublesome.



Liquor bottle designs can be very creative and interesting. Here are three rum bottles with graceful necks that I think are quite attractive.


Posted in PyTorch | Leave a comment

Why I Don’t Use Neural Network Early Stopping

I was chit-chatting with a work colleague about early stopping. We were next to our workplace coffee-machine / engineer-fueling-station. I almost never use early stopping for neural network training. Briefly, 1.) it’s not really possible to determine when to stop, and 2.) the gain from early stopping doesn’t outweigh the downside cost of having less training data due to the need for validation data.

The graph on the left shows early stopping in theory. Instead of dividing your labeled data into a train (maybe 80% of the data) and a test (20%) dataset as usual, you divide into train (perhaps 70%), validate (15%), and test (15%) datasets. During training, every so often you evaluate the network error/loss on the training and validation data. At some point the validation error will start to increase, indicating that overfitting has started to occur.

Unfortunately, the nice smooth, easy-to-interpret graph on the left almost never happens. The graph on the right shows validation error on an actual training run (I found the graph on the Internet). The graph jumps around and if you stopped training around epoch 50, when validation error starts to increase, you’d be early stopping too early.

There are some scenarios where a conservative form of early stopping is useful. For example, if you are doing programmatic hyperparameter optimization/tuning, a bad early-stop doesn’t hurt too much because you’ll likely get another chance so to speak.



I don’t know much about art, but I suspect that an artist has a difficult time knowing when to stop work on a piece of art. Here are three left-facing portraits by artists whose work I admire a lot. Left: By Hans Jochem Bakker. Center: By Randy Monteith. Right: By Daniel Arrhakis.


Posted in Machine Learning | Leave a comment

NFL 2022 Week 10 Predictions – Zoltar Thinks Underdogs Packers and Steelers Will Win Outright

Zoltar is my NFL football prediction computer program. It uses reinforcement learning and a neural network. Here are Zoltar’s predictions for week #10 of the 2022 season.

Zoltar:     falcons  by    0  dog =    panthers    Vegas:     falcons  by    3
Zoltar:  buccaneers  by    3  dog =    seahawks    Vegas:  buccaneers  by  2.5
Zoltar:       bills  by    6  dog =     vikings    Vegas:       bills  by    6
Zoltar:       bears  by    6  dog =       lions    Vegas:       bears  by    3
Zoltar:      chiefs  by   10  dog =     jaguars    Vegas:      chiefs  by  9.5
Zoltar:    dolphins  by    4  dog =      browns    Vegas:    dolphins  by    4
Zoltar:      giants  by    2  dog =      texans    Vegas:      giants  by  6.5
Zoltar:      titans  by    6  dog =     broncos    Vegas:      titans  by    3
Zoltar:    steelers  by    4  dog =      saints    Vegas:      saints  by  2.5
Zoltar:     raiders  by    5  dog =       colts    Vegas:     raiders  by    6
Zoltar:     packers  by    4  dog =     cowboys    Vegas:     cowboys  by  5.5
Zoltar:        rams  by    5  dog =   cardinals    Vegas:        rams  by    3
Zoltar: fortyniners  by    5  dog =    chargers    Vegas: fortyniners  by    7
Zoltar:      eagles  by    5  dog =  commanders    Vegas:      eagles  by 10.5

Zoltar theoretically suggests betting when the Vegas line is “significantly” different from Zoltar’s prediction. For this season I’ve been using a threshold of 4 points difference but in some previous seasons I used 3 points.

At the beginning of the season, because of Zoltar’s initialization (all teams regress to an average power rating) and other algorithms, Zoltar is very strongly biased towards Vegas underdogs. I probably need to fix this. For week #10 Zoltar likes four Vegas underdogs:

1. Zoltar likes Vegas underdog Texans against the Giants.
2. Zoltar likes Vegas underdog Steelers against the Saints.
3. Zoltar likes Vegas underdog Packers against the Cowboys.
4. Zoltar likes Vegas underdog Commanders against the Eagles.

For example, a bet on the underdog Texans against the Giants will pay off if the Texans win by any score, or if the favored Giants win but by less than 6.5 points (i.e., 6 points or less). If a favored team wins by exactly the point spread, the wager is a push. This is why point spreads often have a 0.5 added — called “the hook” — to eliminate pushes.

This is the part of the season where injuries start having a big effect on the point spread. I’m surprised that Zoltar thinks the Green Bay Packers are better than the Dallas Cowboys. To the human eye, the Packers have looked terrible and the Cowboys look like champion contenders. We’ll see.

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better.

In week #9, against the Vegas point spread, Zoltar went 5-1 (using 4.0 points as the advice threshold). Quite lucky because several big Vegas favorites won easily but didn’t quite cover the spread.

For the season, against the spread, Zoltar is 33-16 (~67% accuracy).

Just for fun, I track how well Zoltar does when just trying to predict just which team will win a game. This isn’t useful except for parlay betting. In week #9, just predicting the winning team, Zoltar went only 6-7 which is terrible. Vegas was much better, going 9-4 at just predicting the winning team.

Zoltar sometimes predicts a 0-point margin of victory. There is one such game in week #10 (Falcons vs. Panthers). In those situations, to pick a winner (only so I can track raw number of correct predictions) in the first few weeks of the season, Zoltar picks the home team to win. After that, Zoltar uses his algorithms to pick a winner.



My prediction system is math-based in the sense that it computes a numeric rating for each team and then uses ratings to compute the predicted margin of victory of the better team. An entirely different approach for predicting NFL football scores is to use a sophisticated simulation program and then simulate a game many thousands of times.

Left: There are many versions of simple football simulation games where a player rolls two ordinary dice and the result is based on the 21 (order doesn’t matter) or 36 (order matters) possible outcomes. This one is called simply “Football Board Game” by The Blue Crab company of Sunrise, Florida.

Center: “1st and Goal” by R and R Games is a fairly sophisticated game that uses several kinds of specialized dice and cards.

Right: “Half-Time Football” by Lakeside was produced in the late 1970s and early 80s. It’s strictly a dice game and is medium complexity.

Posted in Zoltar | Leave a comment

PyTorch Model Training Using a Program-Defined Function vs. Inline Code

In most cases, I train a PyTorch model using inline code inside a main() function. For example:

import torch as T
. . .
def main():
  # 0. get started
  print("Begin People predict politics type ")
  T.manual_seed(1)
  np.random.seed(1)
  
  # 1. create Dataset objects
  print("Creating People Datasets ")
  train_file = ".\\Data\\people_train.txt"
  train_ds = PeopleDataset(train_file) 

  # 2. create network
  print("Creating 6-(10-10)-3 tanh network ")
  net = Net().to(device)
  net.train()  # set mode

  # 3. train
  max_epochs = 1000
  ep_log_interval = 100
  lrn_rate = 0.01
  loss_func = T.nn.NLLLoss()  # assumes log_softmax()
  optimizer = T.optim.SGD(net.parameters(), lr=lrn_rate)

  print("Starting training")
  for epoch in range(0, max_epochs):
    epoch_loss = 0  # for one full epoch
    for (batch_idx, batch) in enumerate(train_ldr):
      X = batch[0]  # inputs
      Y = batch[1]  # correct class/label/politics
      optimizer.zero_grad()
      oupt = net(X)
      loss_val = loss_func(oupt, Y)  # a tensor
      epoch_loss += loss_val.item()  # accumulate
      loss_val.backward()
      optimizer.step()

    if epoch % ep_log_interval == 0:
      print("epoch = %5d  |  loss = %10.4f" % \
        (epoch, epoch_loss))
  print("Training done ")
  . . .

An alternative approach is to use a program-defined training function like so:

. . .
def train(net, ds, opt, lr, bs, me, le):
  train_ldr = T.utils.data.DataLoader(ds, batch_size=bs,
    shuffle=True)  
  loss_func = T.nn.NLLLoss()  # assumes log_softmax()
  if opt == 'sgd':
    optimizer = T.optim.SGD(net.parameters(), lr=lr)
  elif opt == 'adam':
    optimizer = T.optim.Adam(net.parameters(), lr=lr)
  # else error

  for epoch in range(0, me):
    epoch_loss = 0.0  # for one full epoch
    for (batch_idx, batch) in enumerate(train_ldr):
      X = batch[0]  # inputs
      Y = batch[1]  # correct class/label/politics
      optimizer.zero_grad()
      oupt = net(X)
      loss_val = loss_func(oupt, Y)  # a tensor
      epoch_loss += loss_val.item()  # accumulate
      loss_val.backward()
      optimizer.step()

    if epoch % le == 0:
      print("epoch = %5d  |  loss = %10.4f" % \
        (epoch, epoch_loss))
  print("Training done ") 
  return net 

def main():
  # 0. get started
  print("Begin People predict politics type ")
  T.manual_seed(1)
  np.random.seed(1)
  
  # 1. create Dataset objects
  print("Creating People Datasets ")
  train_file = ".\\Data\\people_train.txt"
  train_ds = PeopleDataset(train_file)  # 200 rows

# -----------------------------------------------------------

  # 2. create network
  print("Creating 6-(10-10)-3 tanh network ")
  net = Net().to(device)
  net.train()

# -----------------------------------------------------------

  # 3. train
  opt = 'sgd'  # optimizer algorithm
  lr = 0.01    # learning rate
  bs = 10      # batch size
  me = 1000    # max training epochs
  le = 200     # monitor loss every n epochs

  print("Starting training ")
  train(net, train_ds, opt, lr, bs, me, le)
  print("Done ")
. . .

Using a program-defined train() function is useful when doing hyperparameter optimization because the training code will be called many times. But for regular scenarios, defining a train() function isn’t as clear as using a simple online approach.

The moral of the story is that when writing code, there are always multiple design options where experience and intuition guide the design.



Minicomputers dominated the 1970s. They were a response to very expensive mainframes of the 1960s. Minicomputer designs varied wildly but most were 16-bit machines, and had processors that ran at about 12 MHz, and could support 16 concurrent users. With peripherals, a minicomputer system cost about $100,000 (roughly $800,000 today) which made them the hardware of choice for colleges and universities of the 1970s. The PCs of the 1980s killed minicomputers.

Left: The Hewlett-Packard HP-2100. Center: The Digital Equipment Corporation PDP-11. My first programming was done on a PDP-11 using the RSTS operating system, when I was a student at U.C. Irvine. Right: The Control Data Corporation CDC-1700.


Posted in PyTorch | 1 Comment

An Example of PyTorch Hyperparameter Random Search

Bottom line: Hyperparameter random search can be effective but the difficult part is determining what to parameterize and the range of possible parameter values.

When creating a neural network prediction model there are many architecture hyperparameters (number hidden layers, number of nodes in each hidden layer, hidden activation functions, initialization algorithms and their parameters, etc., etc.) And then there are dozens of training hyperparameters (optimization algorithm, learning rate, momentum, batch size, number training epochs, etc.)


In this demo, the best parameters were in trial 2 with 16 hidden nodes, tanh hidden activation, Adam optimization, learn rate = 0.01809, batch size = 14, and max_epochs = 799

Most of my colleagues and I use a manual approach for finding good hyperparameters. We use our experience and intuition. It’s possible to programmatically search for good hyperparameters. Somewhat surprisingly, a random search of hyperparameter values is highly effective compared to more sophisticated techniques, grid search in particular. See “Random Search for Hyper-Parameter Optimization” (2012) by J. Bergstra and Y. Bengio.

I put together a demo of hyperparameter random search. My demo problem is to predict a person’s political leaning (conservative, moderate, liberal) from sex, age, state, and income. In pseudo-code the idea is:

  # loop n times
  #   create random arch and train hyperparams
  #   use arch params to create net
  #   use train params to train net
  #   evaluate trained net
  #   log params and eval metric to file
  # end-loop
  # analyze log offline

I used just two architecture parameters: number of hidden nodes and hidden activation function. The architecture had fixed two hidden layers.

I used just four training parameters: optimization algorithm, learning rate, batch size, and max epochs. Here’s my demo function that generates random hyperparameters:

def create_params(seed=1):
  # n_hid, activation; opt, lr, bs, max_ep
  rnd = np.random.RandomState(seed)

  n_hid = rnd.randint(6, 21)  # [6, 20]
  activation = ['tanh', 'relu'][rnd.randint(0,2)]

  opt = ['sgd', 'adam'][rnd.randint(0,2)]
  lr = rnd.uniform(low=0.001, high=0.10)
  bs = rnd.randint(6, 16)
  max_ep = rnd.randint(200, 1000)

  return (n_hid, activation, opt, lr, bs, max_ep)

The number of hidden nodes varies from 6 to 20, the learning rate varies from 0.001 to 0.01, and so on. Where do these ranges come from? Just guesses based on experience.

There are dozens of details, such as how to evaluate a trained network.

So, hyperparameter search isn’t a magic wand — you have to use experience to determine which of the hundreds of possible parameters to search, and which of the literally infinite ranges for parameter values to use.

One of the disadvantages of random search is that you can get ugly results, such as a learning rate of 0.10243568790223344556677123. One way to deal with this issue is to round floating point values to three decimals and integers to a power of 10 before trying them.



Like many of the older guys I work with, I gained a love of reading from the juvenile “Hardy Boys” mystery series. Several of the books featured a search for something, such as a treasure of some kind. Left: “The Tower Treasure” (#1, 1959 edition). Center: “Hunting for Hidden Gold (#5, 1963 edition). Right: “The Secret of Pirates’ Hill” (#36, 1956 edition). All three covers by artist Rudy Nappi (1923-2015).


Demo code.

# people_hyperparam_search.py
# predict politics type from sex, age, state, income
# PyTorch 1.12.1-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10/11 

import numpy as np
import torch as T
device = T.device('cpu')  # apply to Tensor or Module

# -----------------------------------------------------------

class PeopleDataset(T.utils.data.Dataset):
  # sex  age    state    income   politics
  # -1   0.27   0  1  0   0.7610   2
  # +1   0.19   0  0  1   0.6550   0
  # sex: -1 = male, +1 = female
  # state: michigan, nebraska, oklahoma
  # politics: conservative, moderate, liberal

  def __init__(self, src_file):
    all_xy = np.loadtxt(src_file, usecols=range(0,7),
      delimiter="\t", comments="#", dtype=np.float32)
    tmp_x = all_xy[:,0:6]   # cols [0,6) = [0,5]
    tmp_y = all_xy[:,6]     # 1-D

    self.x_data = T.tensor(tmp_x, 
      dtype=T.float32).to(device)
    self.y_data = T.tensor(tmp_y,
      dtype=T.int64).to(device)  # 1-D

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    preds = self.x_data[idx]
    trgts = self.y_data[idx] 
    return preds, trgts  # as a Tuple

# -----------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self, n_hid, activation):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(6, n_hid)  # 6-(nh-nh)-3
    self.hid2 = T.nn.Linear(n_hid, n_hid)
    self.oupt = T.nn.Linear(n_hid, 3)

    if activation == 'tanh':
      self.activ = T.nn.Tanh()
    elif activation == 'relu':
      self.activ = T.nn.ReLU()
    
    T.nn.init.xavier_uniform_(self.hid1.weight)
    T.nn.init.zeros_(self.hid1.bias)
    T.nn.init.xavier_uniform_(self.hid2.weight)
    T.nn.init.zeros_(self.hid2.bias)
    T.nn.init.xavier_uniform_(self.oupt.weight)
    T.nn.init.zeros_(self.oupt.bias)

  def forward(self, x):
    z = self.activ(self.hid1(x))
    z = self.activ(self.hid2(z))
    z = T.log_softmax(self.oupt(z), dim=1)  # NLLLoss() 
    return z

# -----------------------------------------------------------
 
def overall_loss_3(model, ds, n_class):
  # MSE using built in MSELoss() version
  X = ds[0:len(ds)][0]  # all X values
  Y = ds[0:len(ds)][1]  # all targets, ordinal form
  YY = T.nn.functional.one_hot(Y, num_classes=n_class) 
  
  with T.no_grad():
    oupt = T.exp(model(X))  #  [all,3]  probs form

  loss_func = T.nn.MSELoss(reduction='sum')
  loss_val = loss_func(oupt, YY)  # a tensor
  mse = loss_val / len(ds)
  return mse  # as tensor

# -----------------------------------------------------------

def accuracy(model, dataset):
  # assumes model.eval()
  X = dataset[0:len(dataset)][0]
  # Y = T.flatten(dataset[0:len(dataset)][1])
  Y = dataset[0:len(dataset)][1]
  with T.no_grad():
    oupt = model(X)  #  [40,3]  logits

  # (_, arg_maxs) = T.max(oupt, dim=1)
  arg_maxs = T.argmax(oupt, dim=1)  # argmax() is new
  num_correct = T.sum(Y==arg_maxs)
  acc = (num_correct * 1.0 / len(dataset))
  return acc.item()

# -----------------------------------------------------------

def train(net, ds, opt, lr, bs, me, le):
  train_ldr = T.utils.data.DataLoader(ds, batch_size=bs,
    shuffle=True)  

  loss_func = T.nn.NLLLoss()  # assumes log_softmax()
  if opt == 'sgd':
    optimizer = T.optim.SGD(net.parameters(), lr=lr)
  elif opt == 'adam':
    optimizer = T.optim.Adam(net.parameters(), lr=lr)
  # else error

  for epoch in range(0, me):
    epoch_loss = 0.0  # for one full epoch
    for (batch_idx, batch) in enumerate(train_ldr):
      X = batch[0]  # inputs
      Y = batch[1]  # correct class/label/politics
      optimizer.zero_grad()
      oupt = net(X)
      loss_val = loss_func(oupt, Y)  # a tensor
      epoch_loss += loss_val.item()  # accumulate
      loss_val.backward()
      optimizer.step()

    if epoch % le == 0:
      print("epoch = %5d  |  loss = %10.4f" % \
        (epoch, epoch_loss))
  print("Training done ") 
  return net 

# -----------------------------------------------------------

def create_params(seed=1):
  # n_hid, activation; opt, lr, bs, max_ep
  rnd = np.random.RandomState(seed)

  n_hid = rnd.randint(6, 21)  # [6, 20]
  activation = ['tanh', 'relu'][rnd.randint(0,2)]

  opt = ['sgd', 'adam'][rnd.randint(0,2)]
  lr = rnd.uniform(low=0.001, high=0.10)
  bs = rnd.randint(6, 16)
  max_ep = rnd.randint(200, 1000)

  return (n_hid, activation, opt, lr, bs, max_ep)
  
# -----------------------------------------------------------

def search_params(ds):
  # using Datset ds
  # loop n times
  #  create random arch and train hyperparams
  #  use arch params to create net
  #  use train params to train net
  #  evaluate trained net
  #  log params and eval metric to file
  # end-loop
  # analyze log offline

  max_trials = 6
  for i in range(max_trials):
    print("\nSearch trial " + str(i))
    (n_hid, activation, opt, lr, bs, max_ep) = \
      create_params(seed=i*2)
    print((n_hid, activation, opt, lr, bs, max_ep))

    net = Net(n_hid, activation).to(device)
    net.train()

    net = train(net, ds, opt, lr, bs, max_ep, le=200)

    net.eval()
    error = overall_loss_3(net, ds, n_class=3).item()
    acc = accuracy(net, ds) 
    print("acc = %0.4f  error = %0.4f " % (acc, error))
    # log params, error, accuracy here

  return 0

# -----------------------------------------------------------

def main():
  # 0. get started
  print("\nBegin People hyperparameter random search ")
  T.manual_seed(1)
  np.random.seed(1)
  
  # 1. create Dataset objects
  print("\nCreating People Datasets ")

  train_file = ".\\Data\\people_train.txt"
  train_ds = PeopleDataset(train_file)  # 200 rows

  test_file = ".\\Data\\people_test.txt"
  test_ds = PeopleDataset(test_file)    # 40 rows

  # 2. search for good hyperparameters
  search_params(train_ds)

# -----------------------------------------------------------

  print("\nEnd People hyperparameter random search ")

if __name__ == "__main__":
  main()
Posted in PyTorch | Leave a comment