Microsoft Speech Recognition with a C# WinForm

There aren’t many examples of working with Microsoft Speech and WinForms. I am exploring how to have a WinForm recognize speech, display what was said in one of the Form’s controls (like a TextBox or ListBox), and echo what was said in audio.

The first step is to get your system set up properly. There are several different Microsoft speech platforms. I like to work with the managed code Microsoft Speech Platform 11 — essentially Microsoft.Speech.dll — rather than the System.Speech.dll platform.

1.) Install the Speech 11 SDK to be able to develop your program. I strongly recommend the 32-bit version. 2.) Install the Speech 11 Runtime to be able to run the program once it’s been created. 3.) Install a language to understand. I used English: MSSpeech_SR_en-US_TELE.msi. 4.) Install a voice to speak. I used “Helen”: MSSpeech_TTS_en-US_Helen.msi.

Then I created a demo WinForm that has a CheckBox to toggle speech recognition on and off. If speech recognition is on, the WinForm recognizes and parrots back the commands “start” and “stop”, as text in the ListBox and also audibly. The TextBox and Button controls are not used in the demo. The entire code is at the bottom of this post.

SpeechWithWinForm

I launched Visual Studio and created a new C# WinForm named SpeechWithWinForm. I used VS 2012 but I’m pretty sure VS 2010 will work too. I added a reference to the speech DLL which in my case was at C:\ Program Files (x86)\ Microsoft SDKs\ Speech\ v11.0\ Assembly. I removed unneeded using statements and added two using statements to bring Speech.Recognition and Speech.Synthesis into scope.

In the designer, I added a CheckBox, TextBox, Button, and ListBox. I gave the CheckBox a label of “SR on/off” then I double-clicked the CheckBox to register its CheckedChanged event handler.

There’s a ton going on in the short demo. An interesting obstacle was trying to get the demo to echo the recognized command into the ListBox control. My initial attempt was:

listBox1.Items.Add("I heard " + txt);

But this doesn’t work because it’s being called from the recognizer object which is on a different thread than the WinForm. Two equivalent ways to display text in the ListBox are:

this.Invoke(new MethodInvoker( () =>
  { listBox1.Items.Add("I heard " + txt); }));

or

this.Invoke( (Action)( () =>
  listBox1.Items.Add("I heard " + txt))); 

Anyway, this is my basic speech recognition and synthesis with a WinForm example. Here’s the code:

using System;
using System.Data;
using System.Drawing;
using System.Windows.Forms;

using Microsoft.Speech.Recognition;
// C:\Program Files (x86)\Microsoft SDKs\
// Speech\v11.0\Assembly\
using Microsoft.Speech.Synthesis;
using System.Globalization;

namespace SpeechWithWinForm
{
  public partial class Form1 : Form
  {
    static CultureInfo ci = new CultureInfo("en-us");
    static SpeechRecognitionEngine sre =
      new SpeechRecognitionEngine(ci);
    static SpeechSynthesizer ss =
      new SpeechSynthesizer();

    public Form1()
    {
      InitializeComponent();
      sre.SetInputToDefaultAudioDevice();
      ss.SetOutputToDefaultAudioDevice();
      sre.SpeechRecognized += sre_SpeechRecognized;
      Grammar g_StartStop = GetStartStopGrammar();
      sre.LoadGrammar(g_StartStop);
      // load other patterns here
    }

    static Grammar GetStartStopGrammar()
    {
      Choices choicesStartStop = new Choices();
      choicesStartStop.Add("start");
      choicesStartStop.Add("stop");
      GrammarBuilder gb_StartStop =
        new GrammarBuilder(choicesStartStop);
      Grammar g_StartStop =
        new Grammar(gb_StartStop);
      return g_StartStop;
    }

    void sre_SpeechRecognized(object sender,
      SpeechRecognizedEventArgs e)
    {
      string txt = e.Result.Text;
      float conf = e.Result.Confidence;

      if (conf >= 0.65)
      {
        //listBox1.Items.Add("I heard " + txt); // NO
        this.Invoke(new MethodInvoker( () =>
          { listBox1.Items.Add("I heard " + txt); }));
        //this.Invoke( (Action)( () =>
        //listBox1.Items.Add("I think I heard " + txt)));
        ss.SpeakAsync("You said " + txt);
      }
    }

    private void checkBox1_CheckedChanged(object sender,
      EventArgs e) // SR on/off
    {
      if (checkBox1.Checked == true)
        sre.RecognizeAsync(RecognizeMode.Multiple);
      else if (checkBox1.Checked == false) // turn off
        sre.RecognizeAsyncCancel();
    }
  } // Form
} // ns
Posted in Machine Learning | Leave a comment

Microsoft Solver Foundation Quick Start

The Microsoft Solver Foundation (MSF) is a .NET library that can be used to solve several different kinds of problems, including one very interesting kind called constraint satisfaction problems. The getting-started documentation for MSF is atrociously bad (but after you’re up and running the documentation is quite good). Here’s a short quick-start guide.

I want to solve this problem, which I found on the Web at http://www.purplemath.com/modules/linprog3.htm.

Maximize R = -2x + 5y, where x and y are integers, subject to
100 <= x <= 200
80 <= y <= 170
y >= -x + 200

The problem was actually stated as a word problem involving manufacturing two different kinds of calculators, but we’ll assume you can get to the equations somehow.

In its simplest form, MSF is Microsoft.Solver.dll that can be used in a C# program. First you have to locate an install it. It seems to jump around. I found it at http://msdn.microsoft.com/en-us/devlabs/hh145003. I clicked on the Download Solver Foundation 32-bit link (I prefer using 32-bit rather than 64-bit for demos) which led me to an .msi (installer) file that I saved to my local machine, and then ran. The installation generated a lot of crud plus the DLL at directory C:\Program Files (x86)\ Reference Assemblies\ Microsoft\ Framework\ .NETFramework\ v4.0.

I launched Visual Studio and created a C# Console Application program named SolverFoundationDemo. I added a reference to the MSF DLL. Here’s the code to use MSF to solve the problem:

using System;
using Microsoft.SolverFoundation.Services;

namespace SolverFoundationDemo
{
  class Program
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nBegin Solver demo\n");

      var solver = SolverContext.GetContext();
      var model = solver.CreateModel();

      var decisionX = new Decision(Domain.IntegerNonnegative, "X");
      var decisionY = new Decision(Domain.IntegerNonnegative, "Y");
      model.AddDecision(decisionX);
      model.AddDecision(decisionY);

      model.AddGoal("Goal", GoalKind.Maximize,
        (-2 * decisionX) + (5 * decisionY));

      model.AddConstraint("Constraint0", 100 <= decisionX);
      model.AddConstraint("Constraint1", decisionX <= 200);
      model.AddConstraint("Constraint2", 80 <= decisionY);
      model.AddConstraint("Constraint3", decisionY <= 170);
      model.AddConstraint("Constraint4",
        decisionY >= -decisionX + 200);

    
      var solution = solver.Solve();

      double x = decisionX.GetDouble();
      double y = decisionY.GetDouble();

      Console.WriteLine("X = " + x + " y = " + y);

      Console.WriteLine("\nEnd Solver demo\n");
      Console.ReadLine();
    } // Main
  } // Program
} // ns

The program ran and gave a result of

Begin Solver demo

X = 100 y = 170

End Solver demo

Pretty cool. My overall impression is that MSF is very neat, but I couldn’t believe how bad the getting-started documentation was. The documentation gave too much detail instead of giving a super simple example like I’ve tried to do here. It doesn’t matter how cool a technology is if developers can’t get started with it.

InstallingMicrosoftSolverFoundation

Posted in Machine Learning | Leave a comment

Microsoft 2014 Build Conference – Recap

I spoke at the 2014 Microsoft Build Conference. Build is Microsoft’s conference for software developers. Build used to be called “PDC”, which stood for the Professional Developers Conference. Build 2014 was in San Francisco and ran from Wednesday, April 2 through Friday, April 4. Build had a total of about 8,000 people, which includes attendees, speakers, vendors, and press.

I talked about recent and future trends in neural networks. Here’s an image of the video recording of my talk which you can watch at:

http://channel9.msdn.com/Events/Build/2014/3-643

VideoSnapshotTitleSlide

Overall, I’d give this Build conference a rating of “good but not great”. On the plus side, I liked the overall messaging of a vision of unifying software development across many platforms including the desktop, Web, phone, Xbox, and everything else. On the negative side, I felt there were too many talks about phone devices, but that’s just my preference and doesn’t reflect the importance of integrating phone with server systems (as opposed to lightweight phone apps).

Also, the logistics of the keynotes were strange to say the least. Normally at Build, there’d be three keynote talks. Each would be about one hour long. The first keynote would be the morning of the first day, where the CEO would give a vision statement and a few important business or technology announcements. The other two keynotes would be on the mornings of the second and third days, and be delivered by senior VPs. This approach is standard because it works well.

At this year’s Build, the first keynote was three hours long and was delivered by five or six different people, including the CEO who spoke last. After the first two hours of keynote, I was completely fatigued and had little interest in the CEO’s or anyone else’s comments. Common sense tells anyone (except apparently whoever set up the keynotes at Build this year) that three hours of key announcements is an oxymoron. The second keynote was like the first, a punishing three hour marathon on the second day. The keynote structure was incomprehensible.

But of course, developers go to Build to hear about software development. The talks I heard at Build ranged from great to weak. Many of the attendees I talked to said they felt that too many of the talks were aimed at managers instead of at developers. I’d tend to agree. I also heard lots of complaints about the food (breakfast and lunch) at Build. I’d agree — the food at Build 2014 was by far the worst food at any conference I’ve ever spoken at — I’m not exaggerating, the food was not good at all. I’ve got news for the Build organizers: attendees really, really care about the food at a conference.

Anyway, I enjoyed Build a lot. I learned a lot, talked to a lot of interesting people, and more than anything else, got my developer batteries recharged. Build 2014 was good, but could have been great.

Posted in Machine Learning

Microsoft 2014 Build Conference – Day 3

Today (Friday, April 4, 2014) was the third and final day of the 2014 Microsoft Build Conference. Build is Microsoft’s conference for software developers. There are a total of about 8,000 people here (which includes attendees, speakers, vendors, press, and so on).

Unlike the previous two days which started with incomprehensibly-long three-hour keynotes, today started immediately with session talks at 9:00 AM. I watched most of “The Future of C#” which really should have been titled “Overview of Roslyn, a C# API to the C# Compiler”. My immediate reaction to that talk was along the lines of, “I’m very happy to use the existing C# compiler, thank you very much. I don’t need an API into the compiler for what I do.”

I gave my talk from 10:30 to 11:30. Even though the conference organizers put my talk in a large room that held 500+ people, the room filled to capacity quickly and the doors had to be shut a full 10 minutes before my talk was scheduled to start. To be honest, I kind of expected this because my talk was clearly designed for developers, unlike too many of the Build talks that targeted managers and non-developers. Here’s my room before anyone was allowed in:

EmptyRoomBeforeTalkFromBackLeftCornerV2

I think my talk went pretty well. I tried to keep things informal and not take myself too seriously — many of the talks and keynotes I’d seen were way too scripted and formal for my tastes. Before my actual talk, I gave a 5-minute intended-to-be-humorous PowerPoint where I showed some images of my impressions of the first few days of Build. Here’s one lasting image that got quite a few chuckles from the audience:

NoStinkingJavaScript

Interestingly, there were three guys in the audience who were college students of mine in the early 1990s. It was good to see them — I actually remembered after scanning my memory for a few minutes. After my talk I was pretty exhausted (psychologically) because public speaking is truly terrifying for me. But I watched two other Build talks in the afternoon, but they were both duds in my opinion — probably a combination of my mental fatigue plus the topics weren’t in my primary areas of interest.

Posted in Miscellaneous | 2 Comments

Microsoft 2014 Build Conference – Day 2

Today (Thursday, April 3, 2014) was the second day of the 2014 Microsoft Build Conference. Build is Microsoft’s conference for software developers. There are a total of about 8,000 people here (which includes attendees, speakers, vendors, press, and so on).

Today started with a three-hour long mega-keynote, consisting of several 30 to 45 minute talks, just like yesterday. Today I am certain I don’t like sitting through three hours of people talking. I don’t care how interesting the information is, three hours is just too long. The good information in a set of talks that last three hours is simply diluted too much. Interestingly, everyone I spoke to (mostly other attendees and speakers) felt pretty much the same way — three hours is just too long.

Aside from the technical session talks, there are two things I especially like about the Build conference. First, there are a set of about 12 Microsoft kiosks in one of the common areas. Each kiosk is manned by a Microsoft subject matter expert and each kiosk has a sign that reads, “Ask Me About xxxx” where xxxx is some technology or service. I chatted up the TypeScript expert and came to really understand the motivation for creating TypeScript in the first place (but I’m still skeptical). I also talked to a WinRT expert. Here’s a photo of the kiosk area:

AskMeKiosks

A second thing I really like about the Build conference is the live streaming of interviews with experts and thought leaders. These interviews are recorded by the Channel 9 service and are very cool indeed. Here’s a photo of the Chanel 9 interview area:

Channel9

Today I listened to four technical session talks:

1:00 – 2:00 “The Present and Future of .NET in a World of Devices and Services”
2:30 – 3:30 “Thinking for Programmers”
4:00 – 5:00 “Go Mobile with C# and Xamarin”
5:30 – 6:30 “The Next Generation of .NET for Building Applications”

I’d describe all four of these talks as more high level and strategic rather than low level, nuts and bolts tactical. I thought all four were pretty good but none of them wowed me. I’d give them an average score of about 7.5 on a scale of 1 to 10.

My talk on neural networks is tomorrow morning. Even though I’ve given hundreds of talks before, I still get nervous before a big talk where I’ll be speaking to hundreds of people — thousands if you include online viewers — so I know I won’t sleep at all tonight. All in all, I’m really enjoying Build 2014. I think the main value for me is recharging my psychic batteries. I can hardly wait to get back to work in Redmond and build software.

Posted in Miscellaneous

Microsoft 2014 Build Conference – Day 1

Today (Wednesday, April 2, 2014) was the first day of the 2014 Microsoft Build Conference. Build is Microsoft’s conference for software developers. There are a total of about 8,000 people here (which includes attendees, speakers, vendors, press, and so on).

The format of the beginning of the conference was a bit awkward in my opinion — a mega-keynote lasting 3 hours long, from 8:30 AM to 11:30 AM, given by five or six speakers (I lost count). My first point of feedback to the conference organizers will be that I prefer a more traditional, separate, set of one-hour keynotes.

There was so much information presented in the keynotes it’s hard to pick out what I thought was most interesting or relevant to me. Because I’ve been working with speech recognition lately, I thought the Cortana demo was interesting. Cortana is the (vaguely disturbing according to Wikipedia) AI female voice that’s sort of Microsoft’s answer to Apple’s Siri.

If I had to pluck out a theme of the keynote talks, I’d say it was the notion that developers are feeling pressure to write software that needs to run on multiple platforms — desktop, mobile, Xbox, very small form factor devices — and so Microsoft is focusing on tools and technologies to make this happen. One of the keynote demos showed something where a guy wrote an app (some sports information thing) and then used something in Visual Studio to generate both a mobile app and a PC app. “Write-once, run anywhere” is an idea that’s been around for a while. Because of the multiple-devices theme perhaps, it feels like there’s an excess of Phone and Xbox related talks at Build this year.

Anyway, there’s a ton of energy here at Build. I’ll be speaking about neural networks on Friday. The picture below is one small section of the huge room where lunch was served today. I did a quick scan of the lunch area. I could see about 100 attendees and counted 3 women, so I’d estimate that Build attendees are roughly 95% male.

Lunch

Posted in Miscellaneous

Microsoft 2014 Build Conference – Day 0

The 2014 Microsoft Build Conference officially starts tomorrow, Wednesday, April 2, 2014, but registration started today at 3:00. The event is in San Francisco this year (as it was in 2013). I’m guessing the event has a total of maybe 8,000 people (attendees, speakers, staff, and vendors) so it’s a pretty big conference as far as software conferences go. This afternoon, there are hundreds of people setting the conference up:

SettingUpBuild2014

I will be giving a talk on neural networks on Friday morning. This means I’ll be able to sit in on several talks on Wednesday and Thursday. On each day of the conference, there are four time slots, for example, on Tuesday the slots are at 1:00-2:00, 2:00-3:30, 4:00-5:00, and 5:30-6:30. In each time slot there are 10-12 different talks, so . . . doing the math. . . there are a total of about 130 talks.

The registration area is huge, to accommodate the thousands of attendees:

Build2014Registration

There are, naturally, quite a few talks about Web-related and Cloud-related technologies. I want to hear about TypeScript, mostly because I’m skeptical about it, and Azure, mostly because I’m frustrated by it.

The weather here in San Francisco is rainy and dark. Not unexpected, but it makes it annoying to walk the 1 mile from my hotel (Hilton) to the Moscone Convention Center. San Francisco has pros and cons compared to my favorite place to attend conferences, Las Vegas. Las Vegas has a lot more distractions, which is both a pro and a con I think. San Francisco seems a bit grittier than I remember but that could well be my imagination.

Posted in Miscellaneous