MapReduce from a C# Developer’s Point of View

When I was first exploring Big Data, it took me a while to understand the MapReduce paradigm. One thing that really helped me grasp the idea is when I simulated MapReduce with a C# program.

I wrote a demo program that counts the number of times different words appear in a file – the standard “Hello World” MapReduce example. The input text file is a quote from Walt Disney:

If you can dream it,
you can do it.
Walt Disney

The output of the demo program is:

MapReduceSimulation

The Map part of MapReduce needs a container that sort of holds key-value pairs except the keys aren’t necessarily all different. I simulated this with a program-defined class that holds a string and a number:

public class WordInt
{
  public string word;
  public int n;

  public WordInt(string word, int n)
  {
    this.word = word;
    this.n = n;
  }
}

With that piece in place, here’s a C# program that roughly does a simulated MapReduce:

using System;
using System.Collections.Generic;
using System.IO;

namespace MapReduce
{
  class Program
  {
    static void Main(string[] args)
    {
      List list = new List();

      StreamReader sr = 
       new System.IO.StreamReader("..\\..\\TextFile.txt");
      string line; string[] words;
      while ((line = sr.ReadLine()) != null)
      {
        words = line.Split(' ');
        foreach (string w in words)
          list.Add(new WordInt(w, 1));
      }
      sr.Close();

      Dictionary dict =
        new Dictionary();
      foreach (WordInt item in list)
      {
        if (dict.ContainsKey(item.word) == false)
          dict.Add(item.word, item.n);
        else
          ++dict[item.word];
      }

      foreach (string s in dict.Keys)
        Console.WriteLine(s + " " + dict[s]);

      Console.ReadLine();

    } // Main

  } // class Program

  // class WordInt goes here
  
} // ns

The first line of the program creates a List of WordInt objects. The next handful of lines read the text file line-by-line, extracts each word, and adds each word with a count of 1 to the List. That is the Map part of MapReduce.

The Reduce part processes the mapping. The demo creates a Dictionary object where the unique key is one of the words in the map and the value part is the count of the word.

The demo walks through the List. If the current word hasn’t been seen yet, it’s added to the Dictionary with a count of 1. If the word has been seen, the associated count is incremented.

The demo finishes by displaying the word-count items in the Dictionary.

Now to be sure this isn’t exactly MapReduce, in particular the distributed and parallel and disk-based aspects. But it does give a developer a pretty good idea of what MapReduce is.

This entry was posted in Machine Learning. Bookmark the permalink.