Working with the MNIST Image Recognition Data Set

I wrote an article titled “Working with the MNIST Image Recognition Data Set” in the June 2014 issue of Microsoft’s MSDN Magazine. See http://msdn.microsoft.com/en-us/magazine/dn745868.aspx. The MNIST data set is a collection of a total of 70,000 small (28 by 28 pixels) images of handwritten digits from 0 through 9. The first eight images are:

The MNIST (“Mixed National Institute of Standards and Technology”) data set is divided into two groups: a 60,000 image training set and a 10,000 image test set. The idea is to use these images as a benchmark to evaluate the effectiveness of various image recognition algorithms. Or, put another way, suppose you have an idea for some new approach for image recognition. You can train your model using the 60,000 item set, and then evaluate the accuracy of your system on the 10,000 item test set. And then you can compare your results with the results of other algorithms (or at least those that published their results).

I discovered that working with the data set — reading the data into memory and displaying the images visually — wasn’t really explained anywhere. So I figured it out and wrote up an article. In addition to figuring out how to work with the MNIST data set, I learned several new programming techniques for working with binary data, the .NET PictureBox control, and big and little endian encoding.