Predicting NFL Football Scores using Simulated Roaches

Every year during pro football season, I use a prediction program I wrote in C# to predict the outcomes of games. The prediction system is called Zoltar (as a reference to the fortune telling machine). Zoltar needs a numerical optimization module to determine the values of constants such as how much the home field advantage is worth.


There are many numerical optimization algorithms. Most are based on calculus derivatives (gradients) but some are modeled on the behavior of biological systems. This year I used an obscure algorithm called roach infestation optimization for Zoltar’s optimization module.

Without further ado, here are Zoltar’s predictions for week 5 of the 2015 NFL season:

                      Vegas   Zoltar
         Favorite     Spread  Spread  Underdog 
Oct.  8  Indianapolis  -1      -1     Houston 
Oct. 11  Tampa Bay     -3      -6     Jacksonville 
Oct. 11  Buffalo       -3      -4     Tennessee 
Oct. 11  Baltimore     -7      -9     Cleveland 
Oct. 11  Atlanta       -8      -7     Washington 
Oct. 11  Kansas City   -10     -6     Chicago 
Oct. 11  Philadelphia  -5      -6     New Orleans 
Oct. 11  Green Bay     -10     -11    St. Louis 
Oct. 11  Cincinnati    -1      -6     Seattle 
Oct. 11  Arizona       -3      -1     Detroit 
Oct. 11  New England   -8      -1     Dallas 
Oct. 11  Denver        -6      -8     Oakland 
Oct. 11  NY Giants     -7      -4     San Francisco 
Oct. 12  San Diego     -3      -1     Pittsburgh 

In all games, Zoltar agrees with the Las Vegas betting line. But there are three games that Zoltar identifies as betting opportunities.

1. Because Vegas says the Kansas City Chiefs will win by 10 or more points over the Chicago Bears, but Zoltar says the Chiefs will win by only 6 or more points, Zotar suggests betting on the underdog Bears. You’ll win if the Bears win the game, or if the Chiefs win, but by less than 10 points.

2. Zoltar says bet on the Cincinnati Bengals to beat the Seattle Seahawks by more than 1 point.

3. Zoltar says bet on the underdog Dallas Cowboys to beat the New England Patriots (or lose by fewer than 8 points).

Posted in Machine Learning | Leave a comment

Graphing Rastrigin’s Function in 3D Color Gradient using R

Rastrigin’s function is a math function that is often used as a benchmark problem to evaluate the effectiveness of numerical optimization algorithms. The function has a known minimum value of 0.0 at (0, 0, . . 0) where the number of zero values is equal to the dimension of the function. For example, if the dimension is set to 2, then the minimum value of Rastrigin’s function is at (0, 0).


Rastrigin’s function is difficult to solve because it has many peaks and valleys that represent local minimum values that can trap an algorithm.

I often work with Rastrigin’s function and so I sometimes find it useful to create a graph of it. My current method of choice for creating 3D graphs is to use the R language.

The commands I used start with:

# CTRL-L clears the shell
rm(list=ls()) # delete all objects
x0 <- seq(-5.12, 5.12, length=100)
x1 <- seq(-5.12, 5.12, length=100)
f <- function(x0, x1) { 20 + (x0^2 - 10 *
 cos(2 * 3.14 *x0)) + (x1^2 - 10 *
 cos(2 * 3.14 *x1)) }
z <- outer(x0, x1, f)

After deleting all existing objects in the R workspace, I set up arrays x0 and x1 with 100 values evenly spaced between -5.12 and +5.12 (standard ranges for Rastrigin’s function). The next command defines Rastrigin’s function. The hard-coded 20 is actually 2 * 10 where the 2 is because I have Rastrigin’s function with dim = 2 (x0 and x1).


jet.colors <- colorRampPalette(c("midnightblue",
 "blue", "cyan", "green", "yellow", "orange",
 "red", "darkred"))
nbcol <- 64
color <- jet.colors(nbcol)
nrz <- nrow(z)
ncz <- ncol(z)

zfacet <- z[-1,-1] + z[-1,-ncz] +
 z[-nrz,-1] + z[-nrz,-ncz]
facetcol <- cut(zfacet, nbcol)

I set up a custom color gradient from dark blue to dark red. You can think of the zfacet and facetcol as magic R incantations for doing a 3D color gradient graph.

The graph is created with:

persp(x0, x1, z, col=color[facetcol],
 phi=15, theta=-35, ticktype="detailed",
 d=10, r=1, shade=0.1, expand=0.7)

The phi argument is the tilt-the-base angle. The theta is the rotate-the-base angle. The d is the perspective effect; larger d lessens effect. The r is distance-to-eye. The expand shrinks the plotting box when d < 1. There are several other parameters you can use too.

Posted in Machine Learning | Leave a comment

Linear Discriminate Analysis Using C#

I wrote an article titled “Linear Discriminate Analysis Using C#” in the October 2015 issue of Microsoft MSDN Magazine. See


Linear Discriminate Analysis (LDA, but not to be confused with another LDA, latent Dirichlet allocation) is an old (from the 1930s) math technique that can be used to perform binary classification. A binary classification problem is one where you want to predict something that can take on only one of two possible values. For example, you might want to predict the sex (male or female) base on their age, annual income, and other predictors. Or you might want to predict the price of a stock one week from now (up or down), based on shares sold, price to earnings ratio, and so on.


In the article, I show how to code LDA using raw (no external libraries) C#. Although there are existing tools that can do LDA, if you need to integrate LDA directly into a software system, using existing tools may not be feasible.

In the end, LDA is very interesting, but I conclude that other binary classification techniques are generally preferable. These alternatives include logistic regression, neural network classification, and decision tree classification.

Posted in Machine Learning | Leave a comment

DevIntersection and IT Edge Intersection Conferences – Last Major Events of 2015

In my opinion, there are about four really good technical conferences each year for software developers and IT engineers who use Microsoft technologies. If you work with computer technology, you know how important it is to keep up with new techniques and products.

The last major event (well, actually events, plural, as I’ll explain shortly) of 2015 is the DevIntersection conference. It will run October 26-29, at the MGM Grand in Las Vegas. See The main DevIntersection event will have somewhere around 200 sessions aimed at developers who use Microsoft technologies. Topics will include ASP.NET, Visual Studio with C#, Azure, SQL, and so on.


DevIntersection also has an affiliated event, IT Edge Intersection, (see ) that aims more at IT engineers. There’s also the Anglebrackets event that aims at Web developers (see ). Most of my colleagues do a little bit of everything – development, IT, Web dev – and so DevIntersection has a lot to offer.

This will be my third year attending and speaking at DevIntersection. My talk will be “Introduction to R”. I’ll explain what R is (a programming language used for data analysis) and explain how to get up and running with R. I’ll also point out why R has exploded in popularity over the past 18 months or so, and discuss different levels of learning R and how they might influence your career.


So, you might be saying, “Sure, it’s easy to recommend going to a conference in Las Vegas, but how can I get my company to pay for it?” I believe that any money spent by your employer to send you to DevIntersection has a good return on investment. Yes, you could learn a lot of the things that will be presented at the conference by doing self-study via the Internet. But the reality is you probably won’t. Also, you’ll gain new information very efficiently at the conference, as opposed to thrashing around the Web.

So, if you work with Microsoft technologies, consider checking out the conference Web sites. If you can convince your boss to send you, I’ll bet you’ll be glad you attended.

Posted in Conferences | Leave a comment

C# vs. Java: An Instance of a Class with Static Members

I was working on a project where the goal was to translate/port some Java code to C# code. If the Java code doesn’t use exotic features, porting the Java code to C# code is usually pretty easy. But I’ve run into many interesting scenarios where translating Java to C# is surprisingly tricky.

Consider this Java class:

public class Params {

  public static double foo = 2.0;  // some arbitrary value
  public static double bar = foo * 3.0;  // depends on foo
  public static void SetFoo(double f)
    foo = f;
    bar = foo * 3.0;  // must update
  // the intent is to create a Params object that has all
  // the values so it can be passed to a method
  public final static Params inst = new Params();

The Params class holds values for some software system. The data members are static so you’d access them like

double f =;

The tricky part is the “inst” (instance) member. The intent is to export a reference to the entire class so that it can be passed to a function, like:

int x = SomeMethod(Params.inst);

Notice that the inst member is created by calling the default (not explictly defined) Params constructor. Anyway, this technique works in Java. If you try the idea in C#, the code will compile but the inst object cannot access the data members.

I tried many different approaches for porting this idea to Java. I finally came up with a solution which isn’t very elegant, but it works.

So, in C#, class Params would be:

public class Params
  public static double foo = 2.0;
  public static double bar = foo * 3;

  public static void SetFoo(double f)
    foo = f;
    bar = foo * 3;  


The C# class is the same as the Java class except I don’t have the “inst” member. Now I add a companion class named ParamsInfo:

public class ParamsInfo
  public double foo;
  public double bar;

  public ParamsInfo()
    foo =;
    bar =;

The constructor of the companion class grabs the values from the Params class. The idea is that now you can pass the essential content of the Params class by creating an instance of the ParamsInfo class. For example:

int x = DoSomething(new ParamsInfo());

This problem was surprisingly challenging. It looks very simple, especially after you see the solution, but I spent several hours trying approaches that didn’t quite work.


Posted in C# vs. Java, Machine Learning | Leave a comment

Determining a Location from a Computer’s IP Address

On a couple of projects I’ve worked on in the past, I needed to determine the location of a computer based on its IP address. There are a few companies that sell data sets that contain such information. I used a company called Quova, which was acquired by Neustar in 2010 and renamed but I still call it Quova.

Every month Quova publishes a new data set that maps all IP addresses to location (because the information changes frequently). The data set is actually a text file with several million lines. Each line has about 29 fields, separated by the ‘|’ character. The first two fields are a start_ip and an end_ip. The remaining 27 fields on the line are things that map to all the IP address between the start_ip and the end_ip. Data fields include country, city, state, latitude and longitude, postal code, and so on.


To find the information associated with a particular target IP address, it’s not really feasible to simply loop through the data set text file one line at a time until you hit the target interval — the data set file is just too big. In principle, a good way to access the Quova information would be to transfer the data into a SQL database, index the start_ip and end_ip columns, and then do a select statement.

For one project I worked on, I needed to do lookups with data in memory instead of SQL. The problem is that the Quova data is too large to fit in a normal machine. The solution is to just load part of the data file (about 10% would fit on my machine) at any one time, and load a different chunk of data if necessary. This meant I had to create an index to know which lines of data had which IP addresses.

Posted in Machine Learning | 2 Comments

Fitness Proportionate Selection

I ran into an interesting mini-problem recently. I had a collection of items, where each item had a value and I wanted to select one item in a way where the probability of being selected was proportional to the value.

Suppose there are 4 items (with IDs 0, 1, 2, 3) and the values of the four items are 20, 40, 10, 30. The sum of the four values is 20 + 40 + 10 + 30 = 100. I want to select item 0 with probability = 20/100 = 0.2, and item 1 with probability 40/100 = 0.4, etc. In other words, items with higher values have a higher likelihood of being selected.

This mini-problem occurs in genetic algorithms where the items are genetic individuals and the values are their fitness. Individuals with higher fitness values are selected for breeding. The mini-problem, sometimes called fitness proportionate selection, occurs inside other algorithms too.

There are several ways to perform fitness proportionate selection. When I implement genetic algorithms, I almost always use a technique called tournament selection because it is very easy to code.

But I was looking at a different problem — k-means++ initial means selection. Here the items are data tuples and their values are squared-distance to the closest existing mean. I wanted to select a data item that has a high distance-squared value as the next initial mean, because you want the means to be different from each other.

Anyway, for the k-means++ initial means mini-problem, a technique called roulette wheel selection was nicer than tournament selection. The reason why roulette wheel is better is complicated, but basically, roulette wheel allowed me to easily avoid selecting duplicate items.


The core roulette wheel code looks like this:

// pick a data item, using the squared-distances
// this is a form of roulette wheel selection
double p = rnd.NextDouble();
double sum = 0.0; // sum of distances-squared 
for (int i = 0; i less-than distSquared.Length; ++i)
  sum += distSquared[i];
double cumulative = 0.0; // cumulative prob

int ii = 0; // points into distSquared[]
int sanity = 0; // sanity count
while (sanity less-than data.Length * 10)
  cumulative += distSquared[ii] / sum;
  if (cumulative greater-than p && 
    used.Contains(ii) == false) 
    newMean = ii; // the chosen index
    used.Add(newMean); // don't pick again
  ++ii; // next candidate
  if (ii greater-than-or-equal distSquared.Length)
    ii = 0; // back to first item
// save the data of the chosen index
Array.Copy(data[newMean], means[k],

The code above has lots of missing pieces because the complete code is quite long.

Posted in Machine Learning