The Normal Cumulative Density Function using C#

I was looking at a machine learning technique called probit (“probability unit”) classification. Probit classification is exactly like logistic regression classification except that where LR uses the logistic sigmoid function to compute output, probit uses the cumulative density function of the Gaussian (Normal) distribution. The CDF is the area under the curve of the Gaussian bell-shaped curve, from -infinity to some value z.


So, to write probit classification code, I needed to implement a CDF method using C#. Unlike the logistic sigmoid function which is very simple — 1.0 / (1.0 + Math.Exp(-z)) — there is no simple way to compute CDF. The most common approach is to use one of many close approximations. I used a famous equation 7.1.26 from a famous reference the “Handbook of Mathematical Functions” (1965), by Abramowitz and Stegun. The book is often just called “A&S”.

Actually, equation 7.1.26 is something called the “error function” (often called “erf”) but the CDF is just a slight modification of erf. Anyway, here’s one way to code CDF using C#:

static double CumDensity(double z)
  double p = 0.3275911;
  double a1 = 0.254829592;
  double a2 = -0.284496736;
  double a3 = 1.421413741;
  double a4 = -1.453152027;
  double a5 = 1.061405429;

  int sign;
  if (z < 0.0)
    sign = -1;
    sign = 1;

  double x = Math.Abs(z) / Math.Sqrt(2.0);
  double t = 1.0 / (1.0 + p * x);
  double erf = 1.0 - (((((a5 * t + a4) * t) + a3)
    * t + a2) * t + a1) * t * Math.Exp(-x * x);
  return 0.5 * (1.0 + sign * erf);

I wrote a little harness to spit out values of CDF for z values from -4.0 to +4.0 (every 0.25), and then copy-pasted the results into Excel and made a graph. I also put values for the logistic sigmoid on the graph to see how they compare.

As the graph shows, both functions range from 0.0 to 1.0. So, it isn’t a surprise that logistic regression classification and probit classification give pretty much the same results. As it turns out, logistic regression is easier to work with, in part because it is easily differentiable which makes model training using calculus based techniques possible. Different fields tend to use one or the other classification technique. For example, in economics, probit seems to be more common, but in other fields, logistic regression seems more common.

Posted in Machine Learning | Leave a comment

A Digi-Comp ii Emulator and Simulation using Python

Digi-Comp ii was an amazing mechanical toy computer made in the 1960s. It worked by rolling marbles down a plastic surface filled with toggle switches. A company called Evil Mad Scientist recently released a beautiful wooden re-creation (below).


I decided to write a computer program to emulate (or simulate depending on your point of view) the Digi-Comp ii. My Python programming was a bit rusty because I’ve mostly been using C#, so I thought I’d use Python just for fun.


My demo program multiples 3 and 13. The calling code, with some minor edits to save space, is:

print "\nBegin Digi-Comp ii simulation \n"
dc = DigiComp()
dc.Initialize(35) # 35 balls in top reservoir
print "\nInitial state: "

# set up 3 * 13 problem
dc.q1 = 1; dc.q2 = 1; dc.q3 = 0
dc.m1 = 1; dc.m2 = 0; dc.m3 = 1; dc.m4 = 1
dc.multiply = 1
print "\nSettings to compute 3 * 13:"
# result in A should be 1110010

print "\nResult state (answer in A register):"

print "\nEnd Digi-Comp ii simulation \n"

The simulator is mostly a (not-shown) program-defined class named DigiComp. The program has about 250 lines of Python code. For example, here are two of the class methods:

def DoStart(self):
  if == 0:
    print "No balls in top reservoir"
  else: = - 1 # launch ball

def DoMultiply(self):
  if self.multiply == 0: self.DoClear() # go right
  elif self.multiply == 1: self.DoD2() # go left

The calling code sets Q (called MQ in the Digi-Comp manual) to 110 = 3d. Then it sets M to 1011 = 13d. The DoStart method launches the first virtual ball. The result is in the A register and is 1110010 = 1 + 2 + 4 + 32 = 39d. All in all, it was a fun little exercise.

Posted in Machine Learning, Miscellaneous | Leave a comment

My Top Ten Favorite Twilight Zone Episodes

I’m writing this blog post on the 4th of July. There’s a Twilight Zone marathon on TV (as usual) so I thought I’d list my favorite episodes so you can see if my list has any of your favorites.

1. It’s a Good Life – This is the episode where a young boy, Anthony, has the power to “send people to the cornfield”. He holds his neighbors as virtual hostages. Jack-in-the-boxes are creepy.


2. Nightmare at 20,000 Feet – William Shatner is on an airplane during a storm and thinks he sees some sort of creature on the wing of the airplane. Is he hallucinating or not?


3. Living Doll – Telly Savalas as an evil stepfather does not like his daughter’s Talky Tina doll. The feeling is mutual. “My name is Talky Tina and I’m going to kill you.”


4. Will the Real Martian Please Stand Up? – A group of bus passengers are stranded in a café during a snowstorm. One of them is a Martian scout in advance of an invasion. Why is the café owner not worried?


5. The Howling Man – An American traveler seeks shelter in a remote European castle during a storm. He discovers that Brother Jerome is holding a man captive in a dungeon. Jerome claims the prisoner is Satan. The prisoner claims Jerome is insane. One of them is right.


6. Death Ship – Three astronauts land on an alien planet and find a crashed spaceship identical to theirs – complete with exact replicas of dead versions of themselves! Are they really dead? Is it alien mind control? An alternate dimension?


7. To Serve Man – A group of seven-foot tall aliens come to earth bearing gifts. Two skeptical code breakers try to discover if the aliens are benign or just hungry.


8. The Jeopardy Room – A Soviet agent (played by Martin Landau) is attempting to defect to the West but is trapped inside a dingy hotel room by an assassin. The assassin tells him the room is rigged with a bomb. Is the trigger hooked to the door knob? The window latch?


9. The Silence – At an old school men’s club, an aristocratic member is annoyed by a new-money member who is constantly talking. They bet $500,000 the talky guy can/cannot stay silent for exactly one year. A double-twist ending.


10. Mirror Image – A woman waiting at a bus stop experiences weird events. She thinks she sees her exact twin. Is the twin real? Is the twin an evil replica from another dimension? Or is she just having a nervous breakdown?


Posted in Top Ten

Distorting the MNIST Image Data Set

I wrote an article titled “Distorting the MNIST Image Data Set” in the July 2014 issue of MSDN Magazine. See


The NMIST (mixed National Institute and Standards and Technology) image data set is a collection of small (28 by 28 pixels) black-and-white images of handwritten 0-9 digits. There are 60,000 images in a training set and another 10,000 images in a test set.

The data sets were created so that image recognition researchers and practitioners can test new algorithms against the set, and compare their results with other results.

Even 60,000 training images aren’t very many, so a clever idea is to create additional training images from the original images, by slightly distorting each image. For example, the screenshot above shows the third training image, which happens to be a ‘4’, and a distorted version.

Distorting images is surprisingly tricky. There are many possibilities, but the technique I demonstrate uses a Gaussian kernel, which is an interesting topic in its own right.

Posted in Machine Learning

Selected Software Development Conferences in 2014

There are a lot of conference for software developers. As I’m writing this blog post, year 2014 is about half-way over, so here are a few significant (meaning only that I know about them) software development conferences scheduled for the second half of 2014. If you know any I missed, tell me at jamesDmccaffrey -at-

OSCON (Open Source Convention) – As the title suggests, this is for developers who use Python and other open source technologies. Attendance varies but is typically about 2,500. Run by O’Reilly Media who does the strange animal cover books. Recommended if you use open source software. Price: about $2500. (Yikes.)


July 20-24, 2014, Portland, Oregon

Visual Studio Live – This conference has been around for many years. Typically has 500 – 1500 attendees and a significant Microsoft presence. Run by 1105 Media who publishes Visual Studio Magazine. Highly recommended. Price: about $2100.


August 18-22, 2014, Redmond, Washington
October 6-9, 2014, Washington, DC
November 17-21, 2014, Orlando, Florida

IT/DevConnections – Another long-running conference. Typically 500 – 2000 attendees and a moderate Microsoft presence. Run by Penton Media who publishes Windows IT Pro Magazine. Strongly recommended. Price: about $1900.


September 15-19, 2014, Las Vegas, Nevada

JavaOne Conference – Run by Oracle, a company I’m not very fond of, but this conference gets good reviews. Many thousands of attendees. Recommended, if you use Java, obviously. Price: about $2000.


September 28 – October 2, 2014, San Francisco, CA

Better Software Conference – Run by a company called Software Quality Engineering (SQE). Attendance typically about 500 – 1000 attendees. In my opinion, the speaker quality at SQE events has been very poor in recent years. Not recommended at this time. Price: about $1900.


November 9-14, 2014, Orlando, Florida

DevIntersection Conference – A relatively new (2014 will be their third year) independent conference run by some of the former managers of the DevConnections Conference. I expect roughly 2500 attendees. Has a strong Microsoft presence. Very highly recommended. Price: about $1800 (Excellent value).


November 10-13, 2014, Las Vegas, Nevada

App Dev Trends Conference – A brand new conference. Typically, a first year conference draws about 300-600 people, but you can often get great pricing. Run by 1105 Media so the event will be professionally handled. I intend to go, so in other words, recommended. Price: about $1900.


December 8-11, 2014, Las Vegas, Nevada

Posted in Machine Learning, Miscellaneous, Software Test Automation

Equal Width Binning using C#

A common machine learning task is converting numeric data into categorical data. For example, suppose you have a set of employee annual incomes with values such as $48,000.00 and $39,500.00 and you want to transform them into values such as “low”, “medium”, and “high”. This is called discretization or, informally, binning the data. Binning is done for two main reasons. First, some ML algorithms expect categorical data. Second, binning tends to smooth out differences between data points which is sometimes, but not always, useful.


There are three main ways to bin data. The simplest is to create intervals, or “buckets”, of equal width. The second approach, equal frequency, is to create buckets so that each bucket has an (approximately) equal number of data values in it. The third approach is to create buckets using a clustering algorithm such as k-means so that data values are grouped by similarity to each other. Each of the three techniques has significant pros and cons so there is not one clear best way to bin data.

There are many, many ways to implement equal width binning. Here’s one example. Suppose the data to bin is below and you want 4 bins:

2.0, 3.0, 4.0, 5.0, 6.0, 8.0, 9.0, 10.0, 12.0, 14.0

I’ve ordered the data, but equal width binning doesn’t require it. First, the min and max values are found:

min = 2.0
max = 14.0

Next the width of each bucket is computed:

width = (max - min) / number-buckets
      = (14.0 - 2.0) / 4
      = 3.0

Next, the intervals are constructed:

[2.0, 5.0)  [5.0, 8.0)  [8.0, 11.0)  [11.0, 14.0)  -> data
  [0]  [1]    [2]  [3]    [4]   [5]     [6]   [7]  -> index
    {0}         {1}         {2}           {3}      -> bin

Next, the two end intervals are modified to catch outliers:

[-inf, 5.0)  [5.0, 8.0)  [8.0, 11.0)  [11.0, +inf)
  [0]  [1]    [2]  [3]    [4]   [5]     [6]   [7]
    {0}         {1}         {2}           {3}

Now you can bin. Suppose you have some value x = 9.5. It belongs in bin {2} so it maps to categorical value “2”.

In the approach above, most interval values are stored twice. This isn’t necessary but gives better clarity at the expense of a little bit of extra storage.

In C# code, without error-checking:

static double[] MakeIntervals(double[] data, int numBins)
  double max = data[0]; // find min & max
  double min = data[0];
  for (int i = 0; i < data.Length; ++i)
    if (data[i] < min) min = data[i];
    if (data[i] > max) max = data[i];
  double width = (max - min) / numBins; // compute width

  double[] intervals = new double[numBins * 2]; // intervals
  intervals[0] = min;
  intervals[1] = min + width;
  for (int i = 2; i < intervals.Length - 1; i += 2)
    intervals[i] = intervals[i - 1];
    intervals[i + 1] = intervals[i] + width;
  intervals[0] = double.MinValue; // outliers
  intervals[intervals.Length - 1] = double.MaxValue;

  return intervals;

And the partner method:

static int Bin(double x, double[] intervals)
  for (int i = 0; i < intervals.Length - 1; i += 2)
    if (x >= intervals[i] && x < intervals[i + 1])
      return i / 2;
  return -1; // error

Calling the methods could resemble:

double[] data = new double[] { 2.0, 3.0, . . , 14.0 };
int numBins = 4;
double[] intervals = MakeIntervals(data, numBins);

double x = 9.5;
int bin = Bin(x, intervals);
Console.WriteLine("\nBin of " + x + " is " + bin);

Instead of the static method approach, you could use an OOP approach, putting MakeIntervals code into a constructor and Bin code into a member method. Calling would look like:

double[] data = new double[] { 2.0, 3.0, . . , 14.0 };
int numBins = 4;
Binner b = new Binner(data, numBins);

double x = 9.5;
int bucket = b.Bin(x);
Console.WriteLine("\nBin of " + x + " is " + bucket);

Binning code isn’t too difficult but as much as any code I deal with, has an absolutely huge number of options for implementation.

Posted in Machine Learning

Deep Neural Networks using C#

Deep neural networks and deep learning are hot topics now — machine learning goes through fads and fashions just like everything else. There’s no clear consensus on exactly what deep neural networks are or what deep learning means. But from a practical point of view, a deep neural network is one with two or more layers of hidden nodes.


I wrote an article titled “Deep Neural Networks: A Getting Started Tutorial” in the June 2014 issue of Visual Studio Magazine. See In the article I present a demo program, written in C#, of a neural network with two layers of hidden nodes.

Deep neural networks can, in theory, solve some prediction problems that ordinary neural networks cannot. Also, deep neural networks can, again in theory, solve some prediction problems in situations where an ordinary NN would require a huge number of hidden nodes in its single hidden layer.

The idea of deep neural networks is not new, but what has changed is that the computing power to implement deep neural networks is now starting to become available. My article shows the basics of a two-hidden-layer neural network but does not discuss how to train the beast — that’s for a future article.

Posted in Machine Learning | 3 Comments