Scraping Active Directory E-mail Addresses using PowerShell

In my day-to-day software development activities, I rarely use PowerShell. My programming tasks don’t hit PowerShell’s sweet spot of between simple (when I use a .bat file) and more complex (when I use C#). But for one specific task — finding e-mail addresses on an Active Directory network — a neat PowerShell script is my utility of choice.

ScrapingActiveDiredctoryUsingPowerShell

My PowerShell script is named makeDistList.ps1 and starts with some documentation comments:

# makeDistList.ps1 
# create an e-mail list of certain employees.
# run as C:\> .\makeDistList.ps1 to view on screen
# run as C:\> .\makeDistList.ps1 > mylist.txt to save

# "cn" -- common name, first name & last name
# "sn" -- surname
# "givenName" -- first name
# "name" -- usually same as 'cn' but not always
# "mail" -- e-mail address
# "company" -- like 'Acme'
# "title" -- for some employees only
# "manager" -- employee's manager
# "description" -- misc. info such as '(Sponsor: jLee)'
# "department" -- Cost Center name
# "telephoneNumber" -- office phone number

# set up attribute of AD users to search for
# Examples:
#$target = "(&(objectCategory=User)(Title=*SDET*))"
# users with 'SDET' in their job titles

#$target = "(&(objectCategory=User)(company=*Acme*))"
# users who are contractors through Acme

#$target = "(&(objectCategory=User)(telephoneNumber=*706*))" 
# users whose phone number contains '702'

Next, I set up a wonky .NET DirectorySearcher object:

$root = new-object directoryServices.directoryEntry
$seeker = new-object directoryServices.directorySearcher
$seeker.searchRoot = $root
$seeker.SearchScope = "Subtree"
$seeker.serverTimeLimit = -1
$seeker.clientTimeout = -1
$seeker.filter = $target
$seeker.pageSize = 10

Then I set up which AD fields to retrieve (in this case, user name and e-mail address):

$colProplist = "cn", "mail"
foreach ($i in $colPropList) {
 $seeker.PropertiesToLoad.Add($i) | out-null
}

Next I set up any users I want to exclude in an array:

$omitList = @("jSmith@foo.com", "cLee@foo.com")

Now, invoke and show how many users were found:

$resultsCollection = $seeker.findAll()
write-host $resultsCollection.Count  

The hardest part was figuring out how to extract the results from the return object, and display them:

# create the list of e-mail addresses
$allMail = ""
foreach ($result in $resultsCollection) {           
  $propertiesCollection = $result.Properties  

  $mailValues = $propertiesCollection.mail          
  if ($mailValues -eq $null) {
    write-host "No mail found for: ", $propertiesCollection.cn
  }
  else {
    $mailString = $mailValues.Item(0)
    $addIt = $true

    foreach ($m in $omitList) {
      if ($m -eq $mailString) {
        write-host "NOT adding", $mailString
        $addIt = $false
        break
      }
    }

    # omit special cases
    if ($mailString.Contains("toronto")) {
      $addIt = $false
    }

    if ($addIt -eq $true) {
      
      write-host "Adding", $mailString
      $allMail += $mailString + ";"
    }                 
  } #else                                               
} #foreach $result

The script finishes by adding in special people and displaying the results:

$allMail += ";aBrown@bar.com;bJones@place.com"
write-output $allMail # show results!
write-host "Done"
# end script

I use this PowerShell script, or some variation of it, quite often. For generating e-mail addresses, PowerShell is an excellent tool for me.

Posted in Miscellaneous | Leave a comment

Implementing the BFGS Minimization Function using C#

Last weekend, just for fun, I decided I’d take a look at the BFGS (Broyden–Fletcher–Goldfarb–Shanno) numeric minimization algorithm. The algorithm finds the minimum value of a function. To get started, I examined the C language implementation presented in the book “Numerical Recipes in C” (NR). The function is listed section 10.7 and is called dfpsrch (“DFP search) because BFGS is really just a slight variation of the earlier DFP (Davidon-Fletcher-Powell) algorithm. The NR implementation calls helper function lnsrch (“line search”) in section 9.7.

So, I refactored the NR C language code to C#. It was quite interesting in the sense that I had to sharpen up my C skill to understand what was going on. The NR code is, basically, crazy.

BFGS

The BFGS algorithm (not to be confused with the even more complicated L-BFGS algorithm (“limited memory” version) is based on calculus techniques such as function gradients (first derivatives) and the Hessian matrix of second partial derivatives. The most striking thing about BFGS is the number of ways that the function can fail. As it turns out, robust implementations of L-BFGS and BFGS exist but they have a huge amount of special-case checks in the code. My point is that the NR code for BFGS is an OK way to start understanding the algorithm, but, in addition to being copyrighted, the NR code is not suitable for anything but experimentation because the code can fail in dozens of situations.

using System;
namespace BFGSExperiments
{
  class Program
  {
    static void Main(string[] args)
    {
      try
      {
        Console.WriteLine("\nBegin\n");

        int iter;
        double fret;
        double[] p = new double[] {1, 1};
        Minimize(p, 2, 0.0001, out iter, out fret,
          FunctionToMinimize, GradientOfFunction);

        Console.WriteLine("Iterations = " + iter);
        Console.WriteLine("Function min value = " +
          fret);
        Console.WriteLine("Minimized at " + p[0] +
          " " + p[1]);

        Console.WriteLine("\nEnd\n");
        Console.ReadLine();
      }
      catch (Exception ex)
      {
        Console.WriteLine(ex.Message);
        Console.ReadLine();
      }


    } // Main

    public delegate double SomeFunction(double[] fvals);
    public delegate void GradFunction(double[] x,
      double[] grads);

    // -----------------------------------------------------------

    public static double FunctionToMinimize(double[] x)
    {
      return -Math.Exp(-Math.Pow(x[0] - 1, 2)) -
        Math.Exp(-0.5 * Math.Pow(x[1] - 2, 2));
      // has min at x = 1, y = 2
    }

    public static void GradientOfFunction(double[] x,
      double[] grads)
    {
      grads[0] = 2 * Math.Exp(-Math.Pow(x[0] - 1, 2)) *
        (x[0] - 1);
      grads[1] = Math.Exp(-0.5 * Math.Pow(x[1] - 2, 2)) *
        (x[1] - 2);
    }

    // -----------------------------------------------------------

    public static void Minimize(double[] p, int n,
      double gtol, out int iter, out double fret,
      SomeFunction func, GradFunction dfunc)
    {
      // aka dfpmin, aka BFGS
      // starting point p[] of length n, minimize func, 
      // using its gradient gfunc
      // returns are p[] (location of min value),
      // iter (number iterations performed),
      // fret (min value of function at p[])
      const int ITMAX = 200;
      const double EPS = 3.0e-8;
      const double TOLX = 4.0 * EPS;
      const double STPMX = 100.0;

      int check, i, its, j;
      double den, fac, fad, fae, fp, stpmax, sum,
        sumdg, sumxi, temp, test;
      double[] dg, g, hdg, pnew, xi;
      double[][] hessin;

      iter = 0;
      sum = 0.0; // keep compiler happy

      dg = new double[n];
      g = new double[n];
      hdg = new double[n];
      hessin = MakeMatrix(n, n);
      pnew = new double[n];
      xi = new double[n];
      fp = func(p);
      Console.WriteLine("starting fp = " + fp);
      dfunc(p, g);

      Console.WriteLine("starting Grads: g[0] = " +
        g[0] + " g[1] = " + g[1]);
      Console.ReadLine();

      for (i = 0; i < n; ++i)  {
        for (j = 0; j < n; ++j) hessin[i][j] = 0.0;
        hessin[i][i] = 1.0;
        xi[i] = -g[i];
        sum += p[i] * p[i];
      }
      
      stpmax = STPMX * Math.Max(Math.Sqrt(sum),
        (double)n);
      for (its = 1; its <= ITMAX; ++its) // main loop
      {
        iter = its;
        LineSearch(n, p, fp, g, xi, pnew, out fret, 
          stpmax, out check, func);

        fp = fret;
        Console.WriteLine("fp in loop = " + fp);
        for (i = 0; i < n; ++i)  {
          xi[i] = pnew[i] - p[i];
          p[i] = pnew[i];
        }
        Console.WriteLine("New p0 p1 = " +
          p[0] + " " + p[1]);
        Console.ReadLine();
        test = 0.0;
        for (i = 0; i  test) test = temp;
        }

        if (test < TOLX) {
          Console.WriteLine("Exiting when test = " +
            test + " < tolx = " + TOLX);
          return;
        }

        for (i = 0; i < n; ++i) dg[i] = g[i];
        dfunc(p, g);
        test = 0.0;
        den = Math.Max(fret, 1.0);
        for (i = 0; i  test) test = temp;
        }
        if (test < gtol)  {
          Console.WriteLine("Exiting when test = " +
            test + " < gtol = " + gtol);
          return;
        }

        for (i = 0; i < n; ++i) dg[i] = g[i] - dg[i];
        for (i = 0; i < n; ++i) {
          hdg[i] = 0.0;
          for (j = 0; j < n; ++j)
            hdg[i] += hessin[i][j] * dg[j];
        }
        fac = fae = sumdg = sumxi = 0.0;
        for (i = 0; i  Math.Sqrt(EPS * sumdg * sumxi))
        {
          fac = 1.0 / fac;
          fad = 1.0 / fae;
          for (i = 0; i < n; ++i)
            dg[i] = fac * xi[i] - fad * hdg[i];
          for (i = 0; i < n; ++i) {
            for (j = 0; j < n; ++j) {
              hessin[i][j] += fac * xi[i] * xi[j]
                - fad * hdg[i] * hdg[j] +
                  fae * dg[i] * dg[j];
              hessin[j][i] = hessin[i][j];
            }
          }
        }

        for (i = 0; i < n; ++i) {
          xi[i] = 0.0;
          for (j = 0; j < n; ++j)
            xi[i] -= hessin[i][j] * g[j];
        }


      } // main loop
      throw new Exception("Too many iterations " + iter +
        " in Minimize");

    } // Minimize

    public static double[][] MakeMatrix(int rows, int cols)
    {
      double[][] result = new double[rows][];
      for (int i = 0; i < rows; ++i)
        result[i] = new double[cols];
      return result;
    }

    public static void LineSearch(int n, double[] xold,
      double fold, double[] g, double[] p, double[] x,
      out double f, double stpmax, out int check,
       SomeFunction func)
    {
      // aka lnsrch

      const double ALF = 1.0e-4;
      const double TOLX = 1.0e-7;

      int i;
      double a, alam, alam2, alamin, b, disc, f2, rhs1,
      rhs2, slope, sum, temp, test, tmplam;

      check = 0;
      for (sum = 0, i = 0; i  stpmax)
        for (i = 0; i < n; ++i)
          p[i] *= stpmax / sum;
 
      for (slope = 0.0, i = 0; i = 0.0)
        throw new Exception("Roundoff problem LineSearch");

      test = 0.0;
      for (i = 0; i  test) test = temp;
      }
      alamin = TOLX / test;
      alam = 1.0;
      
      int sanityCt = 0;
      int maxSanity = 1000000;
      f = 0.0; // keeps compiler happy
      f2 = 0.0;
      alam2 = 0.0;
      while (true && sanityCt < maxSanity)
      {
        ++sanityCt;

        for (i = 0; i < n; ++i)
          x[i] = xold[i] + alam * p[i];
        f = func(x);
        if (alam < alamin) {
          for (i = 0; i < n; ++i)
            x[i] = xold[i];
          check = 1;
          return;
        } else if (f <= fold + ALF * alam * slope)
            return;
        else  {
          if (alam == 1.0)
            tmplam = -slope / (2.0 * (f - fold - slope));
          else  {
            rhs1 = f - fold - alam * slope;
            rhs2 = f2 - fold - alam2 * slope;
            a = (rhs1 / (alam * alam) - rhs2 /
              (alam2 * alam2)) / (alam - alam2);
            b = (-alam2 * rhs1 / (alam * alam) + alam *
              rhs2 / (alam2 * alam2)) /
              (alam - alam2);
            if (a == 0.0) tmplam = -slope / (2.0 * b);
            else  {
              disc = b * b - 3.0 * a * slope;
              if (disc < 0.0) tmplam = 0.5 * alam;
              else if (b  0.5 * alam)
              tmplam = 0.5 * alam;
          } // else

        } // else
        alam2 = alam;
        f2 = f;
        alam = Math.Max(tmplam, 0.1 * alam);
      } // while

      if (sanityCt == maxSanity)
        throw new Exception("Insane in LineSearch");
    } // LineSearch
  } // Program
} // ns
Posted in Machine Learning | Leave a comment

World Chess Champion Performances at the Chess Olympiads

The 2014 Chess Olympiad was held from August 1-14 in Tromso, Norway. Many people, including me, were surprised that the reigning world champion, Magnus Carlsen, lost two games (to Arkadij Naiditisch of Germany in round 7, and Ivan Saric of Croatia in round 10). This got me wondering about other world champions who have played in Olympiads. So, I parsed through online databases and Wikipedia articles to collect the information below.

Olympiad  Champion   Score
=====================================
1930      Alekhine   9/9       = 100%
1931      Alekhine   13.5/18   = 75%
1933      Alekhine   9.5/12    = 79%
1935      Alekhine   12/17     = 71%
1937      Euwe       9.5/13    = 73%
1939      Alekhine   7.5/10    = 75%
1954      Botvinnik  8.5/11    = 77%
1956      Botvinnik  9.5/13    = 73%
1958      Smyslov    9.5/12    = 79%
1960      Tal        11/15     = 73%
1962      Botvinnik  8/12      = 67%
1964      Petrosian  9.5/13    = 73%
1966      Petrosian  11.5/13   = 88%
1968      Petrosian  10.5/12   = 88%
1970      Spassky    9.5/12    = 79%
1980      Karpov     9/12      = 75%
1982      Karpov     6.5/8     = 81%
1986      Kasparov   8.5/11    = 77%
1988      Kasparov   8.5/10    = 85%
1992      Kasparov   8.5/10    = 85%
1994      Kasparov   6.5/10    = 65%
1996      Kasparov   7/9       = 78%
2014      Carlsen    6/9       = 67%

According to my research, the reigning world champion did not play in the Olympiads for 1927 (Capablanca), 1928, 1950, 1952, 1972 (Fisher), 1974, 1976, 1978, 1984, 1990, 1998, 2000, 2002 (Kramnik), 2004, 2006 (Topalov), 2008 (Anand), 2010, and 2012.

I likely made a few mistakes because the data had to be gathered from several sources. For the years when there were dual world champions, notably when Kasparov split from FIDE, I picked the champion I decided was the more legitimate.

So, the data more or less speaks for itself as they say. It’s hard to compare these results because the number of teams and competition strength has varied over time. But that said, I’d say all the world champions’ performances were extremely good at Olympiads. I am a big Magnus Carlsen fan, but his 2014 Olympiad score of 67% is tied for the second worst on record, although he’s in good company: Botvinnik in 1962 (65%) and Kasparov in 1994 (67%). To Carlsen’s great credit, he played in the 2014 Olympiad, with really nothing to gain, unlike previous champion Anand who did not play in any of the three Olympiads held while he was champion.

CarlsenAndKasparov

Posted in Miscellaneous | Leave a comment

Setting the Active Excel Worksheet using Interop and Speech Recognition with C#

I’ve been looking at using speech recognition to manipulate Excel 2013. One interesting task I solved is setting the active worksheet. For example, the command “sheet one” will have the same effect as clicking on the Sheet1 tab near the bottom of an open spreadsheet.

ExcelSpeechActiveWorksheet

In order to do this, the client machine must have a lot of somewhat confusing prerequisite libraries related to Excel interop (to manipulate Excel) and Speech (to recognize commands) installed.

Although it’s not essential, I implemented everything as an Excel add-in, mostly so I could have some UI to display logging messages and turn speech on/off. To develop the add-in using Visual Studio 2012 I needed to download and install the “Microsoft Office Developer Tools for Visual Studio 2012 ENU”. To develop the speech components I needed to install three 32-bit (not 64! 64-bit doesn’t work with Excel interop — that was a pain to discover) Microsoft Speech Platform 11 libraries: the SDK (to create in VS), the Runtime (to use speech on my client machine), and English Language pack MSSpeech_SR_en-US_TELE.msi for recognition. Also, although I didn’t use it, I downloaded and installed MSSpeech_TTS_en-US_Helen.msi for speech synthesis (to make the computer speak with Helen’s voice).

Just getting a basic add-in working was no small chore. See my blog post at http://jamesmccaffrey.wordpress.com/2013/07/08/analyzing-an-excel-2013-spreadsheet-programmatically-using-an-add-in/ for the ugly details. In this case the add-in is essentially a WinForm with a CheckBox control (“SR” in the image) to toggle speech on and off.

The key code is listed below. The code recognizes a command like “sheet two”, parses the “two” out, converts it to numeric 2 and then uses the Excel.Worksheet Activate method. What is not shown in the code is that I programmatically insert new worksheets. The first sheet is always at the last index in the Worksheets collection. The indexing for the other sheets is a bit tricky and the code is the best explanation.

namespace Excelligence
{
  public partial class ExcelligenceUserControl : UserControl
  {
    static int sheetNum = 0; // used to create a new sheet for each clustering result
    static SpeechRecognitionEngine sre; // to recognize spoken commands like "cluster 3"
    //static SpeechSynthesizer ss; // not right now . . 
 
    public ExcelligenceUserControl()
    {
      InitializeComponent();
      System.Globalization.CultureInfo ci = new System.Globalization.CultureInfo("en-us");
      sre = new SpeechRecognitionEngine(ci);
      sre.SetInputToDefaultAudioDevice();
      sre.SpeechRecognized += new EventHandler(sre_SpeechRecognized);

      Choices numbers = new Choices();
      numbers.Add("one");
      numbers.Add("two");
      numbers.Add("three");
      numbers.Add("four");
 
      Choices sheets = new Choices();
      sheets.Add("Sheet");
      GrammarBuilder gb_Sheets = new GrammarBuilder();
      gb_Sheets.Append(sheets);
      gb_Sheets.Append(numbers);
      Grammar g_Sheets = new Grammar(gb_Sheets);
      sre.LoadGrammarAsync(g_Sheets);
    }

    void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
    {
      string txt = e.Result.Text;
      float conf = e.Result.Confidence;

      if (conf < 0.65) return;
      listBox1.Items.Add("Recognized " + txt); 
      
      string[] tokens = txt.Split(' ');
      int n = 0;
      if (txt.IndexOf("Sheet one") >= 0 ||
        txt.IndexOf("Sheet two") >= 0 ||
        txt.IndexOf("Sheet three") >= 0 ||
        txt.IndexOf("Sheet four") >= 0)
      {
        n = Globals.ThisAddIn.Application.Worksheets.Count;
        listBox1.Items.Add("there are " + n + " sheets");
        string[] twoWords = txt.Split(' ');
        string sheetNumAsString = twoWords[1];
        int sn = NumberFromString(sheetNumAsString);
        listBox1.Items.Add("sn = " + sn);
        if (sn > n) return;

        int idx;
        if (sn == 1) idx = n; // weirdness! "Sheet1" is always last
        else idx = sn - 1; // ??? (but works)

        Excel.Worksheet ws = Globals.ThisAddIn.Application.Worksheets[idx];
        ws.Activate();
        ws.Visible = Excel.XlSheetVisibility.xlSheetVisible;
      }
    } // sre_SpeechRecognized

    static int NumberFromString(string numAsString)
    {
      if (numAsString == "one") return 1;
      else if (numAsString == "two") return 2;
      else if (numAsString == "three") return 3;
      else if (numAsString == "four") return 4;
      return -1; // error
      //return 3;
    }

    private void checkBox1_CheckedChanged(object sender, EventArgs e)
    {
      if (checkBox1.Checked == true) // turn SR on
        sre.RecognizeAsync(RecognizeMode.Multiple);
      else if (checkBox1.Checked == false) // turn SR off
        sre.RecognizeAsyncCancel();
    }
    
    // etc.
Posted in Machine Learning | 2 Comments

Solving Sudoku using C#

I wrote an article titled “Solving Sudoku Puzzles Using the MSF Library” in the August 2014 issue of MSDN Magazine. See http://msdn.microsoft.com/en-us/magazine/dn759446.aspx. The MSF (Microsoft Solver Foundation) library is a collection of code that can solve several different types of problems, including constraint satisfaction problems (CSPs).

A Sudoku puzzle is a CSP because the constraints are 1.) the numbers in each row must all be different (1 through 9), 2.) the numbers in each column must all be different, and 3.) the numbers in each of the nine 3×3 sub-cubes must all be different. And of course the starting numbers that define the puzzle are constraints too.

SudokuDemo_Corrected

Another way of thinking about Sudoku puzzles is that they are combinatorial optimization problems where the goal is to minimize the number of constraint violations.

Anyway, it was a fun little exercise and I learned how to use the MSF library. Sadly, MSF has been discontinued — it just never gained any traction in the developer community. I’m not sure why, but the failure of MSF could be a case where good technology just didn’t get publicized properly, or maybe the documentation was not helpful enough. Not sure.

Posted in Machine Learning

Neural Network Weight Decay and Restriction

I wrote an article in the July 2014 issue of MSDN Magazine titled “Neural Network Weight Decay and Restriction”. See http://visualstudiomagazine.com/articles/2014/07/01/weight-decay-and-restriction.aspx.

NeuralNetworkWeightDecay

You can think of a neural network as a complicated math function that accepts one or more numeric input values, and generates one or more numeric output values. The output values are determined in part by a set of values called the networks weights and biases. Training a neural network is the process of finding the values for the weights and biases.

Training is accomplished by using a set of training data that has known output values. Training finds the set of weights and bias values so that when presented with training data input values, the computed output values are very close to the known output values in the test data.

A major problem when training a neural network is a phenomenon called over-fitting. Over-fitting means you’ve found weights and bias values that work perfectly or almost perfectly with the test data, but when you present the trained network with new data, the prediction accuracy is very poor.

There are several strategies you can use to try and deal with over-fitting. One is called weight decay; during training, each weight and bias value is decremented by a small amount. The idea is that if there is no counter effect from the training data, useless weights will fade away. Another strategy is weight restriction; during training weights and bias values are not allowed to get too big or too small.

Posted in Machine Learning

Neural Network Back-Propagation and De-Modularizing

I ran across an unusual scenario recently where it was beneficial to “de-modularize” some code. I’m not sure if “de-modularize” is a word or not, but I mean refactoring some code that was in two functions to one larger function. The code I was working with was neural network training using the back-propagation algorithm. My original code resembled this:

Method Train:

loop until done
    for each data item
      compute output values
      update weight values
    end for
  end loop

In other words, the training method was a wrapper around two methods, ComputeOutputs and UpdateWeights. Nice and modular. The update-weights method resembled:

Method UpdateWeights:

compute output gradients
  compute hidden gradients
  use gradients to compute deltas
  update weights using deltas
  modify weights using deltas from previous call

I’m leaving out a lot of details, but the problem with the modular approach was storing the previous-deltas. The previous deltas are stored in two matrices and two arrays. One approach is to put these four data structures as class members. But that’s ugly because it doesn’t make replacing the training method easy. A second approach is to place the data structures inside method Train. But this means I’d have to pass them as four additional parameters to method UpdateWeights or create a “state-context” data structure that holds all four data structures and pass it as a parameter.

DeModularizingBackPropagation

In the end, the best solution was to de-modularize the code by ditching the UpdateWeights method and placing its code directly into method Train. Here’s the result:

public void Train2(double[][] trainData, int maxEpochs,
  double learnRate, double momentum)
{
  // integrated 'UpdateWeights' version 
  // back-prop specific arrays
  double[] oGrads = new double[numOutput]; // gradients
  double[] hGrads = new double[numHidden];

  // back-prop momentum specific arrays 
  double[][] ihPrevWeightsDelta = MakeMatrix(numInput,
    numHidden);
  double[] hPrevBiasesDelta = new double[numHidden];
  double[][] hoPrevWeightsDelta = MakeMatrix(numHidden,
    numOutput);
  double[] oPrevBiasesDelta = new double[numOutput];

  // train 
  int epoch = 0;
  double[] xValues = new double[numInput]; // inputs
  double[] tValues = new double[numOutput]; // targets

  int[] sequence = new int[trainData.Length];
  for (int i = 0; i < sequence.Length; ++i)
    sequence[i] = i;

  while (epoch < maxEpochs)
  {
    double mse = MeanSquaredError(trainData);
    if (mse < 0.040) break;

    Shuffle(sequence); // random order
    for (int ii = 0; ii < trainData.Length; ++ii)
    {
      int idx = sequence[ii];
      Array.Copy(trainData[idx], xValues, numInput);
      Array.Copy(trainData[idx], numInput, tValues, 0,
        numOutput);
      ComputeOutputs(xValues);
      //UpdateWeights(tValues, learnRate, momentum);

      // ---- Update-Weights section
      // 1. compute output gradients
      for (int i = 0; i < numOutput; ++i)
      {
        // derivative for softmax = (1 - y) * y 
        double derivative = (1 - outputs[i]) * outputs[i];
        oGrads[i] = derivative * (tValues[i] - outputs[i]);
      }

      // 2. compute hidden gradients
      for (int i = 0; i < numHidden; ++i)
      {
        // derivative of tanh = (1 - y) * (1 + y)
        double derivative = (1 - hOutputs[i]) *
          (1 + hOutputs[i]);
        double sum = 0.0;
        for (int j = 0; j < numOutput; ++j)
        {
          double x = oGrads[j] * hoWeights[i][j];
          sum += x;
        }
        hGrads[i] = derivative * sum;
      }

      // 3a. update hidden weights
      // weights can be updated in any order)
      for (int i = 0; i < numInput; ++i) // 0..2 (3)
      {
        for (int j = 0; j < numHidden; ++j) // 0..3 (4)
        {
          double delta = learnRate * hGrads[j] * inputs[i];
          ihWeights[i][j] += delta; 
          // now add momentum using previous delta.
          ihWeights[i][j] += momentum *
            ihPrevWeightsDelta[i][j];
          ihPrevWeightsDelta[i][j] = delta; 
        }
      }

      // 3b. update hidden biases
      for (int i = 0; i < numHidden; ++i)
      {
        double delta = learnRate * hGrads[i]; 
        hBiases[i] += delta;
        hBiases[i] += momentum *
          hPrevBiasesDelta[i]; // momentum
        hPrevBiasesDelta[i] = delta;
      }

      // 4. update hidden-output weights
      for (int i = 0; i < numHidden; ++i)
      {
        for (int j = 0; j < numOutput; ++j)
        {
          double delta = learnRate * oGrads[j] *
            hOutputs[i];
          hoWeights[i][j] += delta;
          hoWeights[i][j] += momentum *
            hoPrevWeightsDelta[i][j]; // momentum
          hoPrevWeightsDelta[i][j] = delta; // save
        }
      }

      // 4b. update output biases
      for (int i = 0; i < numOutput; ++i)
      {
        double delta = learnRate * oGrads[i] * 1.0;
        oBiases[i] += delta;
        oBiases[i] += momentum *
          oPrevBiasesDelta[i]; // momentum
        oPrevBiasesDelta[i] = delta; // save
      }
      // ---- end Update-Weights
    } // each training item
        ++epoch;
  } // whil
} // Train2

De-modularizing makes the training method long — well over one page of code, which is usually bad. But in this rare case, the de-modularized version is superior.

Posted in Machine Learning