Quick Start for Microsoft Speech Recognition with C#

Recently I was working with C# programs that do speech recognition. I found the official documentation somewhat confusing — there are several speech technologies and ironically I felt there was too much extraneous information (rather than not enough information). Here’s a no-nonsense speech recognition Quick Start.

There are several different Microsoft speech platforms. The differences between them are quite confusing. I used the “Microsoft Speech Platform 11”. To use the Speech Platform to create a simple C# program that recognizes speech, you need to download and install three packages. First, you need the Speech Platform 11 SDK to create programs. Second, you need the Speech Platform 11 Runtime to run programs. Third you need at least one Microsoft Speech Platform 11 recognition Language.

In my case, the SDK installer file name was x64_MicrosoftSpeechPlatformSDK\MicrosoftSpeechPlatformSDK.msi because I was using a 64-bit laptop. The Runtime installer file was x64_SpeechPlatformRuntime\SpeechPlatformRuntime.msi. The recognition Language installer file was MSSpeech_SR_en-US_TELE.msi because I wanted to use U.S. English. (I think the ‘TELE’ means telephone, i.e., crude, quality).

The danger here is that there are older (circa 2009) “Server” versions of the SDK and Runtime, and older versions, like 10.9 (circa 2010); and there are x86 versions and x64 versions of all. You have to be very careful to install a consistent set of three packages.

After finding (at the Microsoft downloads site) and installing the three packages, I launched VS 2012 (I’m pretty sure VS 2010 will work too) and created a new C# console application program and named it HelloSpeech. I added a reference to the Microsoft.Speech.dll assembly which I found stored at C:\Program Files\Microsoft SDKs\Speech\v11.0\Assembly (the install location was displayed during the install process). At the top of the C# code I added “using Microsoft.Speech.Recognition”.

This program listens for “red”, “blue”, and “green”, until it hears the secret quit-word “klaatu”.

using System;
using Microsoft.Speech.Recognition;

namespace HelloSpeech
  class Program
    static bool done = false;

    static void Main(string[] args)
        Console.WriteLine("I'm listening");

        System.Globalization.CultureInfo ci =
          new System.Globalization.CultureInfo("en-us");
        SpeechRecognitionEngine sre =
          new SpeechRecognitionEngine(ci);
        sre.SpeechRecognized +=
          new EventHandler(SpeechRecognizedEventArgs)
        sre.RecognizeCompleted +=
          new EventHandler(RecognizeCompletedEventArgs)

        Choices colorChoices = new Choices();
        colorChoices.Add("klaatu"); // quit

        GrammarBuilder colorsGrammarBuilder =
          new GrammarBuilder();
        Grammar keyWordsGrammar = 
          new Grammar(colorsGrammarBuilder);


        while (done == false) { ; }

      catch (Exception ex)
    } // Main

    static void sre_SpeechRecognized(object sender,
      SpeechRecognizedEventArgs e)
      if (e.Result.Text == "klaatu")
      if (e.Result.Confidence >= 0.75)
        Console.WriteLine("I heard " + e.Result.Text);
        Console.WriteLine("Unknown word");

    static void sre_RecognizeCompleted(object sender,
      RecognizeCompletedEventArgs e)
      done = true;
  } // class Program
} // ns

Before Building the project, I had to go to the Project Properties page and uncheck the “Prefer 32-bit” option. Interestingly, the code to quit is in some ways trickier than the regular recognition-code. When the special quit word is recognized, the RecognizeAsyncCancel method is called on the speech recognition engine object. This fires the RecognizeCompleted event and control is transferred to the RecognizeCompleted event handler which sets the global variable done to true. And this breaks out of the hold loop in the Main method.

The major alternative to using Speech Platform 11 and its associated Microsoft.Speech.dll assembly for speech recognition is to use the System.Speech.dll assembly that is typically already on your computer in the GAC. Both of these libraries can also be used to do speech synthesis — programs that speak with a human-like voice. There is also a set of libraries for use with the Kinect hardware.

The Speech Platform 11 is really interesting and if you are willing to put in a couple of hours to get started, I think you’ll like it. But be prepared for rather confusing documentation.


This entry was posted in Machine Learning. Bookmark the permalink.