Dev Connections Conference 2014 Recap

I spoke for Microsoft at the 2014 Dev Connections software and IT conference this week (September 15-19). I gave two talks, “Introduction to Speech Recognition with C#” and “Developing Neural Networks with Visual Studio”. I’d estimate there were about 1,500 attendees at the conference. One guy I talked to at a lunch was a typical attendee: he worked for a mid-sized regional bank as sort of a Jack of all trades, doing IT tasks, and also developing line of business applications (both desktop and Web). My ultimate strategic goal was to demonstrate Microsoft’s strong machine learning tools and enabling technologies.

DevConnections2014RegistrationArea

The conference had approximately 200 one-hour talks. In each hour time slot, there were between 12 and 20 talks for attendees to choose from. Each room held about 120 people, plus a handful of double-sized rooms.

The conference was at the Aria hotel in Las Vegas. Las Vegas is my favorite place for conferences. It’s relatively inexpensive, and easy to get to from almost anywhere. The hotels are huge (I read that 15 of the largest 25 hotels in the world are in Vegas) with enormous convention areas so there’s no need for attendees to be scattered across a dozen different hotels and have to bus or taxi to a dedicated convention center (such as in San Francisco). Also, the town is walking-friendly, close to the airport (about a $25 fare), and has lots of things to see.

If you haven’t been to a conference in Las Vegas before, you might not have an accurate idea of what goes on. These aren’t party-time morale events. You typically get up very early in the morning, and then attend in-depth technical sessions all day long. It’s actually quite exhausting, but in a pleasant way if you’re a geek like me.

DevConnections2014SpeechBeforeTheTalk

Developer and IT conferences like Dev Connections are fairly expensive, typically from $1700 to $3600 for a 3 to 5 day event. Are they worth the price? In my opinion, yes. The true value doesn’t really come from the content in the conference sessions — much of conferences’ content is available online. The value comes from being exposed to new ideas and techniques that you just don’t have time to discover during day to day work tasks. Without crunching any numbers, I’d estimate that a developer who attends one of these conferences will pay back his company, in terms of increased and improved productivity, far more than the cost of attending.

Posted in Machine Learning | Leave a comment

Graphing Ackley’s Function using SciLab

Ackley’s function is a standard benchmark mathematical function used to test numerical optimization algorithms. The function is defined in two dimensions as:

f(x,y) = -20 * exp(-0.2*sqrt(0.5*(x^2 + y^2))) -
           - exp(0.5*cos(2*pi*x) + cos(2*pi*y)))
           + 20 + e

The function has a minimum value of 0.0 located at x = 0, y = 0. I wanted to graph the function so I used SciLab, a free program that is similar to the very expensive MatLab program.

Here are the five SciLab commands I issued:

-->[x,y]=meshgrid(-4:.10:4,-4:.10:4);
-->z=-20 * exp(-0.2*sqrt(0.5 * (x.^2 + y.^2)))
     - exp(0.5 * (cos(2*%pi*x) + cos(2*%pi*y)))
     + 20 + %e;
-->g=scf();
-->g.color_map=jetcolormap(64);
-->surf(x,y,z);

The first command sets up a matrix of (x,y) values from -4 to +4, 0.10 units apart. The second command is the definition of Ackley’s function for two variables. Note the use of the .^ operator rather than the ^ operator. The third through fifth commands create the graph.

AckleysFunctionGraphScreenshot

Notice that Ackley’s function has many local minimum values, which makes finding the optimal solution at (0,0) a bit tricky for some numerical optimization algorithms.

Posted in Machine Learning | Leave a comment

Introduction to Microsoft Machine Learning Studio

I wrote an article titled “Introduction to Machine Learning Studio” in the September 2014 issue of MSDN Magazine. See http://msdn.microsoft.com/en-us/magazine/dn781358.aspx. Actually the title is a bit inaccurate (on purpose). Machine Learning Studio (ML Studio) is a Web application that allows you to create a machine learning system using drag-and-drop modules — no coding at all. ML Studio is essentially the UI for Microsoft Azure Machine Learning, which is a Web service that does all the work.

Figure1-A Complete ML Studio Experiment

In the article, I explain, step-by-step, how to create a system that predicts whether a U.S. Congressman is a Republican or Democrat, based on their voting record. The example uses Logistic Regression, but ML Studio supports several different ML classification techniques, including neural networks.

Overall, I am quite a fan of ML Studio. Using the tool saves a ton of time compared to my normal code-from-scratch approach. On the downside, ML Studio does not yet have an SDK so it’s not currently possible to create custom-coded modules using C# (although ML Studio can execute R language scripts, which is sort of a customization approach).

And, I’m really not a fan of Cloud-based applications. I know they’re the wave of the future and inevitable, but I prefer having my entire system local, sitting in a PC in my office. I get annoyed by Cloud-based systems that are constantly changing — I like a stable platform to work on even if it means I need to install upgrades every now and then.

Posted in Machine Learning | Leave a comment

Winnow Classification using C#

I wrote an article titled “Winnow Classification using C#” in the September 2014 issue of MSDN Magazine. See http://msdn.microsoft.com/en-us/magazine/dn781362.aspx. In machine learning, an example of a classification problem is to predict which class (male or female) a person belongs to, based on features such as occupation, political party, height and so on. There are many kinds of classification techniques, including neural network classifiers, logistic regression classification, and so on.

WinnowDemo

Winnow classification is a relatively little-used technique. Winnow classification applies when the dependent y-variable to predict is binary (two possible values, like sex) and also each of the independent predictor variables is also binary. The example I present in the MSDN article predicts whether a member of the U.S. Congress is a Democrat or a Republican, based on 16 predictor votes which can be “yes” or “no”.

In my opinion, there is no clear evidence that tells whether Winnow Classification is better, worse, or about equal to other classification techniques.

Posted in Machine Learning

Neural Network Batch and Online Training

I wrote an article titled “Understanding Neural Network Batch Training: A Tutorial” in the August 2014 issue of Visual Studio Magazine. See http://visualstudiomagazine.com/articles/2014/08/01/batch-training.aspx.

BatchTraining

By far the most common technique used to train a neural network is called back-propagation. But there are alternatives including using particle swarm optimization, or using simplex optimization. Each of the possible training techniques can be applied using one of two approaches, usually called “batch” and “online” training.

In batch training, in each iteration of the main training loop, all training data items are examined and their associated errors (differences between the computed output values and the actual output values) are then used to modify the network’s weights and bias values.

In online training, in each iteration of the main training loop, after each test item is fed to the network, weights and bias values are immediately updated.

Research suggest that online training is quite a bit superior to batch training when using back-propagation, but research results are not clear if online training is better when using particle swarm or simplex optimization.

Posted in Machine Learning

Scraping Active Directory E-mail Addresses using PowerShell

In my day-to-day software development activities, I rarely use PowerShell. My programming tasks don’t hit PowerShell’s sweet spot of between simple (when I use a .bat file) and more complex (when I use C#). But for one specific task — finding e-mail addresses on an Active Directory network — a neat PowerShell script is my utility of choice.

ScrapingActiveDiredctoryUsingPowerShell

My PowerShell script is named makeDistList.ps1 and starts with some documentation comments:

# makeDistList.ps1 
# create an e-mail list of certain employees.
# run as C:\> .\makeDistList.ps1 to view on screen
# run as C:\> .\makeDistList.ps1 > mylist.txt to save

# "cn" -- common name, first name & last name
# "sn" -- surname
# "givenName" -- first name
# "name" -- usually same as 'cn' but not always
# "mail" -- e-mail address
# "company" -- like 'Acme'
# "title" -- for some employees only
# "manager" -- employee's manager
# "description" -- misc. info such as '(Sponsor: jLee)'
# "department" -- Cost Center name
# "telephoneNumber" -- office phone number

# set up attribute of AD users to search for
# Examples:
#$target = "(&(objectCategory=User)(Title=*SDET*))"
# users with 'SDET' in their job titles

#$target = "(&(objectCategory=User)(company=*Acme*))"
# users who are contractors through Acme

#$target = "(&(objectCategory=User)(telephoneNumber=*706*))" 
# users whose phone number contains '702'

Next, I set up a wonky .NET DirectorySearcher object:

$root = new-object directoryServices.directoryEntry
$seeker = new-object directoryServices.directorySearcher
$seeker.searchRoot = $root
$seeker.SearchScope = "Subtree"
$seeker.serverTimeLimit = -1
$seeker.clientTimeout = -1
$seeker.filter = $target
$seeker.pageSize = 10

Then I set up which AD fields to retrieve (in this case, user name and e-mail address):

$colProplist = "cn", "mail"
foreach ($i in $colPropList) {
 $seeker.PropertiesToLoad.Add($i) | out-null
}

Next I set up any users I want to exclude in an array:

$omitList = @("jSmith@foo.com", "cLee@foo.com")

Now, invoke and show how many users were found:

$resultsCollection = $seeker.findAll()
write-host $resultsCollection.Count  

The hardest part was figuring out how to extract the results from the return object, and display them:

# create the list of e-mail addresses
$allMail = ""
foreach ($result in $resultsCollection) {           
  $propertiesCollection = $result.Properties  

  $mailValues = $propertiesCollection.mail          
  if ($mailValues -eq $null) {
    write-host "No mail found for: ", $propertiesCollection.cn
  }
  else {
    $mailString = $mailValues.Item(0)
    $addIt = $true

    foreach ($m in $omitList) {
      if ($m -eq $mailString) {
        write-host "NOT adding", $mailString
        $addIt = $false
        break
      }
    }

    # omit special cases
    if ($mailString.Contains("toronto")) {
      $addIt = $false
    }

    if ($addIt -eq $true) {
      
      write-host "Adding", $mailString
      $allMail += $mailString + ";"
    }                 
  } #else                                               
} #foreach $result

The script finishes by adding in special people and displaying the results:

$allMail += ";aBrown@bar.com;bJones@place.com"
write-output $allMail # show results!
write-host "Done"
# end script

I use this PowerShell script, or some variation of it, quite often. For generating e-mail addresses, PowerShell is an excellent tool for me.

Posted in Miscellaneous

Implementing the BFGS Minimization Function using C#

Last weekend, just for fun, I decided I’d take a look at the BFGS (Broyden–Fletcher–Goldfarb–Shanno) numeric minimization algorithm. The algorithm finds the minimum value of a function. To get started, I examined the C language implementation presented in the book “Numerical Recipes in C” (NR). The function is listed section 10.7 and is called dfpsrch (“DFP search) because BFGS is really just a slight variation of the earlier DFP (Davidon-Fletcher-Powell) algorithm. The NR implementation calls helper function lnsrch (“line search”) in section 9.7.

So, I refactored the NR C language code to C#. It was quite interesting in the sense that I had to sharpen up my C skill to understand what was going on. The NR code is, basically, crazy.

BFGS

The BFGS algorithm (not to be confused with the even more complicated L-BFGS algorithm (“limited memory” version) is based on calculus techniques such as function gradients (first derivatives) and the Hessian matrix of second partial derivatives. The most striking thing about BFGS is the number of ways that the function can fail. As it turns out, robust implementations of L-BFGS and BFGS exist but they have a huge amount of special-case checks in the code. My point is that the NR code for BFGS is an OK way to start understanding the algorithm, but, in addition to being copyrighted, the NR code is not suitable for anything but experimentation because the code can fail in dozens of situations.

using System;
namespace BFGSExperiments
{
  class Program
  {
    static void Main(string[] args)
    {
      try
      {
        Console.WriteLine("\nBegin\n");

        int iter;
        double fret;
        double[] p = new double[] {1, 1};
        Minimize(p, 2, 0.0001, out iter, out fret,
          FunctionToMinimize, GradientOfFunction);

        Console.WriteLine("Iterations = " + iter);
        Console.WriteLine("Function min value = " +
          fret);
        Console.WriteLine("Minimized at " + p[0] +
          " " + p[1]);

        Console.WriteLine("\nEnd\n");
        Console.ReadLine();
      }
      catch (Exception ex)
      {
        Console.WriteLine(ex.Message);
        Console.ReadLine();
      }


    } // Main

    public delegate double SomeFunction(double[] fvals);
    public delegate void GradFunction(double[] x,
      double[] grads);

    // -----------------------------------------------------------

    public static double FunctionToMinimize(double[] x)
    {
      return -Math.Exp(-Math.Pow(x[0] - 1, 2)) -
        Math.Exp(-0.5 * Math.Pow(x[1] - 2, 2));
      // has min at x = 1, y = 2
    }

    public static void GradientOfFunction(double[] x,
      double[] grads)
    {
      grads[0] = 2 * Math.Exp(-Math.Pow(x[0] - 1, 2)) *
        (x[0] - 1);
      grads[1] = Math.Exp(-0.5 * Math.Pow(x[1] - 2, 2)) *
        (x[1] - 2);
    }

    // -----------------------------------------------------------

    public static void Minimize(double[] p, int n,
      double gtol, out int iter, out double fret,
      SomeFunction func, GradFunction dfunc)
    {
      // aka dfpmin, aka BFGS
      // starting point p[] of length n, minimize func, 
      // using its gradient gfunc
      // returns are p[] (location of min value),
      // iter (number iterations performed),
      // fret (min value of function at p[])
      const int ITMAX = 200;
      const double EPS = 3.0e-8;
      const double TOLX = 4.0 * EPS;
      const double STPMX = 100.0;

      int check, i, its, j;
      double den, fac, fad, fae, fp, stpmax, sum,
        sumdg, sumxi, temp, test;
      double[] dg, g, hdg, pnew, xi;
      double[][] hessin;

      iter = 0;
      sum = 0.0; // keep compiler happy

      dg = new double[n];
      g = new double[n];
      hdg = new double[n];
      hessin = MakeMatrix(n, n);
      pnew = new double[n];
      xi = new double[n];
      fp = func(p);
      Console.WriteLine("starting fp = " + fp);
      dfunc(p, g);

      Console.WriteLine("starting Grads: g[0] = " +
        g[0] + " g[1] = " + g[1]);
      Console.ReadLine();

      for (i = 0; i < n; ++i)  {
        for (j = 0; j < n; ++j) hessin[i][j] = 0.0;
        hessin[i][i] = 1.0;
        xi[i] = -g[i];
        sum += p[i] * p[i];
      }
      
      stpmax = STPMX * Math.Max(Math.Sqrt(sum),
        (double)n);
      for (its = 1; its <= ITMAX; ++its) // main loop
      {
        iter = its;
        LineSearch(n, p, fp, g, xi, pnew, out fret, 
          stpmax, out check, func);

        fp = fret;
        Console.WriteLine("fp in loop = " + fp);
        for (i = 0; i < n; ++i)  {
          xi[i] = pnew[i] - p[i];
          p[i] = pnew[i];
        }
        Console.WriteLine("New p0 p1 = " +
          p[0] + " " + p[1]);
        Console.ReadLine();
        test = 0.0;
        for (i = 0; i  test) test = temp;
        }

        if (test < TOLX) {
          Console.WriteLine("Exiting when test = " +
            test + " < tolx = " + TOLX);
          return;
        }

        for (i = 0; i < n; ++i) dg[i] = g[i];
        dfunc(p, g);
        test = 0.0;
        den = Math.Max(fret, 1.0);
        for (i = 0; i  test) test = temp;
        }
        if (test < gtol)  {
          Console.WriteLine("Exiting when test = " +
            test + " < gtol = " + gtol);
          return;
        }

        for (i = 0; i < n; ++i) dg[i] = g[i] - dg[i];
        for (i = 0; i < n; ++i) {
          hdg[i] = 0.0;
          for (j = 0; j < n; ++j)
            hdg[i] += hessin[i][j] * dg[j];
        }
        fac = fae = sumdg = sumxi = 0.0;
        for (i = 0; i  Math.Sqrt(EPS * sumdg * sumxi))
        {
          fac = 1.0 / fac;
          fad = 1.0 / fae;
          for (i = 0; i < n; ++i)
            dg[i] = fac * xi[i] - fad * hdg[i];
          for (i = 0; i < n; ++i) {
            for (j = 0; j < n; ++j) {
              hessin[i][j] += fac * xi[i] * xi[j]
                - fad * hdg[i] * hdg[j] +
                  fae * dg[i] * dg[j];
              hessin[j][i] = hessin[i][j];
            }
          }
        }

        for (i = 0; i < n; ++i) {
          xi[i] = 0.0;
          for (j = 0; j < n; ++j)
            xi[i] -= hessin[i][j] * g[j];
        }


      } // main loop
      throw new Exception("Too many iterations " + iter +
        " in Minimize");

    } // Minimize

    public static double[][] MakeMatrix(int rows, int cols)
    {
      double[][] result = new double[rows][];
      for (int i = 0; i < rows; ++i)
        result[i] = new double[cols];
      return result;
    }

    public static void LineSearch(int n, double[] xold,
      double fold, double[] g, double[] p, double[] x,
      out double f, double stpmax, out int check,
       SomeFunction func)
    {
      // aka lnsrch

      const double ALF = 1.0e-4;
      const double TOLX = 1.0e-7;

      int i;
      double a, alam, alam2, alamin, b, disc, f2, rhs1,
      rhs2, slope, sum, temp, test, tmplam;

      check = 0;
      for (sum = 0, i = 0; i  stpmax)
        for (i = 0; i < n; ++i)
          p[i] *= stpmax / sum;
 
      for (slope = 0.0, i = 0; i = 0.0)
        throw new Exception("Roundoff problem LineSearch");

      test = 0.0;
      for (i = 0; i  test) test = temp;
      }
      alamin = TOLX / test;
      alam = 1.0;
      
      int sanityCt = 0;
      int maxSanity = 1000000;
      f = 0.0; // keeps compiler happy
      f2 = 0.0;
      alam2 = 0.0;
      while (true && sanityCt < maxSanity)
      {
        ++sanityCt;

        for (i = 0; i < n; ++i)
          x[i] = xold[i] + alam * p[i];
        f = func(x);
        if (alam < alamin) {
          for (i = 0; i < n; ++i)
            x[i] = xold[i];
          check = 1;
          return;
        } else if (f <= fold + ALF * alam * slope)
            return;
        else  {
          if (alam == 1.0)
            tmplam = -slope / (2.0 * (f - fold - slope));
          else  {
            rhs1 = f - fold - alam * slope;
            rhs2 = f2 - fold - alam2 * slope;
            a = (rhs1 / (alam * alam) - rhs2 /
              (alam2 * alam2)) / (alam - alam2);
            b = (-alam2 * rhs1 / (alam * alam) + alam *
              rhs2 / (alam2 * alam2)) /
              (alam - alam2);
            if (a == 0.0) tmplam = -slope / (2.0 * b);
            else  {
              disc = b * b - 3.0 * a * slope;
              if (disc < 0.0) tmplam = 0.5 * alam;
              else if (b  0.5 * alam)
              tmplam = 0.5 * alam;
          } // else

        } // else
        alam2 = alam;
        f2 = f;
        alam = Math.Max(tmplam, 0.1 * alam);
      } // while

      if (sanityCt == maxSanity)
        throw new Exception("Insane in LineSearch");
    } // LineSearch
  } // Program
} // ns
Posted in Machine Learning