The Knuth-Morris-Pratt String Search Algorithm in C#

I was working with string searching recently. I decided to code up a C# implementation of the Knuth-Morris-Pratt algorithm just to experiment. The basic idea of string search is quite simple. Suppose some string S = “computer” and some other, smaller, string W = “put”. Searching for W inside of S would return 3 because “put” is contained in “computer” starting at index position 3 (assuming 0-based indexes). A simple approach would look for “put” inside “computer” starting at index 0 then 1 then 2, and so on. There are many faster algorithms that essentially preprocess string W to construct a table T which can be used to skip ahead more than one position when a mismatch occurs. The Knuth-Morris-Pratt string search algorithm is one of these algorithms.

Wikipedia entries on algorithms tend to be not very good, but the entry on Knuth-Morris-Pratt string search was quite decent and has very detailed pseudo-code. I translated that pseudo-code more or less directly to the C# language in a class named StringSearcher. Calling the code looks like this:

string stringW = "put";
string stringS = "computer";

char[] W = stringW.ToCharArray();
char[] S = stringS.ToCharArray();

StringSearcher ss = new StringSearcher(W);
int ix = ss.Search(S);
Console.WriteLine("Location of W in S is " + ix);

Here is the code for the StringSearcher class that implements the Wikipedia pseudo-code for the Knuth-Morris-Pratt algorithm:

public class StringSearcher  // Knuth-Morris-Pratt string search
{
  private char[] W;  // the (small) pattern to search for
  private int[] T;   // lets the search function to skip ahead

  public StringSearcher(char[] W)
  {
    this.W = new char[W.Length];
    Array.Copy(W, this.W, W.Length);
    this.T = BuildTable(W);           // use a helper to build T
  }

  private int[] BuildTable(char[] W)
  {
    int[] result = new int[W.Length];
    int pos = 2;
    int cnd = 0;
    result[0] = -1;
    result[1] = 0;
    while (pos < W.Length)
    {
      if (W[pos - 1] == W[cnd])
      {
        ++cnd; result[pos] = cnd; ++pos;
      }
      else if (cnd > 0)
        cnd = result[cnd];
      else
      {
        result[pos] = 0; ++pos;
      }
    }
    return result;
  }

  public int Search(char[] S)
  {
    // look for this.W inside of S
    int m = 0;
    int i = 0;
    while (m + i < S.Length)
    {
      if (this.W[i] == S[m + i])
      {
        if (i == this.W.Length - 1)
          return m;
        ++i;
      }
      else
      {
        m = m + i - this.T[i];
        if (this.T[i] > -1)
          i = this.T[i];
        else
          i = 0;
      }
    }
    return -1;  // not found
  }
}

I didn’t thoroughly test this code but it seems to be correct.

Advertisements
This entry was posted in Software Test Automation. Bookmark the permalink.