Removing Extra Delimiters From A String

I ran into a surprisingly interesting little problem the other day. I had a text file that I needed to parse. Each line of the file contained multiple ‘*’ characters which separated the fields I was interested in. For example:
 
Smith***Bob*****27
Jones***Dan*****36
etc.
 
In order to use the C# String.Split() method I wanted to remove extra (but not all) delimiters so that I’d end up with:
 
Smith*Bob*27
Jones*Dan*36
etc.
 
I could have used a regular expression approach but instead I wrote a custom method I called StripExtraDelimitChars(string s, char c). The problem had lots of little nuances. Here’s a preliminary version of what I came up with:
 
public static string StripExtraDelimitChars(string s, char c) //
{
  if (s == null) return null;
  if (s.Length == 0 || s.Length == 1) return s;
  if (s.Length == 2 && s[0] == c && s[1] == c) return s[0].ToString();
  if (s.Length == 2 && (s[0] != c && s[1] == c) || (s[0] == c && s[1] != c)) return s;
  char[] charArray = s.ToCharArray(0, s.Length);
  int count = 0; // number of characters copied
  char[] temp = new char[s.Length];
  int j = 0; int k = 0;
  while (j <= charArray.Length – 2)
  {
    if (charArray[j] != c) // not a char to strip so copy it in
      temp[k] = charArray[j]; ++k; ++j; ++count;
    else if (charArray[j] == c && charArray[j + 1] != c) // copy it in
      temp[k] = charArray[j]; ++k; ++j; ++count;
    else if (charArray[j] == c && charArray[j + 1] == c) // do not copy it
      ++j;
    else
      throw new Exception("Impossible branch in StripExtraDelimitChars()");
  } // while
  // deal with the last character
  if (charArray[s.Length – 1] != c) // so copy it in
    temp[k] = charArray[j]; ++k; ++j; ++count;
  else if (charArray[s.Length – 1] == c && charArray[s.Length – 2] != c)
    temp[k] = charArray[j]; ++k; ++j; ++count;
  else if (charArray[s.Length – 1] == c && charArray[s.Length – 2] == c)
    ; // do nothing
  else
    throw new Exception("Impossible logic in StripExtraDelimitChars()");
  char[] scratch = new char[count];
  for (int i = 0; i < scratch.Length; ++i)
    scratch[i] = temp[i];
 
  string result = new String(scratch);
  return result;
} // StripExtraDelimitChars()
 
It seems like every time I feel I’ve seen just about every possible string manipulation problem, something new and interesting comes up.
 

 
This entry was posted in Software Test Automation. Bookmark the permalink.