An Interesting Bug in a C# List Removal Function

I was writing some fairly complex code last week and ran across an interesting bug in a small C# method that I wrote. Let me explain by way of an example similar to the one I was dealing with. Examine the method below that accepts a C# List of int collection and removes all even numbers from the List. Can you spot the error(s)?

static void RemoveEvens(List<int> list)
{
  for (int i = 0; i < list.Count; ++i)
  {
    if (list[i] % 2 == 0) // even
      list.Remove(list[i]);
  }
}

In my defense, the actual method I wrote was much more complicated and was a helper method (and I was tired) but code like this is horrendous. Ignoring the lack of error-checking, first of all it just doesn’t work. Suppose the method encounters an even number (like 8) at index 5. The call to Remove will scan the entire list, and remove the first 8 from the list. But now the value at i = 5 will be a different value. The for loop increments after the removal and the new value in the list at i = 5 will not be checked.

Here’s a working (but not very good) version:

static void RemoveEvens(List<int> list)
{
  for (int i = 0; i < list.Count; ++i)
  {
    if (list[i] % 2 == 0)
    {
      list.RemoveAt(i);
      --i;
    }
  }
}

Here, when an even number is encountered, it is removed then the index i is backed up by 1 so that the new value at i will be checked. This version still has some problems. A better approach in most situations would be something along the lines of:

static List<int> RemoveEvens(List<int> list)
{
  List<int> result = new List<int>();
  for (int i = 0; i < list.Count; ++i)
  {
    if (list[i] % 2 != 0) // not even
      result.Add(list[i];
  }
  return result;
}

The interesting thing to me about the whole exercise was the fact that even very simple coding can have some nasty pitfalls.

Advertisements
This entry was posted in Software Test Automation. Bookmark the permalink.

2 Responses to An Interesting Bug in a C# List Removal Function

  1. How about using Linq instead of the tired old for-loops? – imho, it would have been a lot easier and a lot less error-prone. Something like this: “result = list.Where(p => p % 2 != 0)”. – Or in your last version, at least use “foreach” (you can’t be sure that accessing the list by index is efficient – it might be a LinkedList you get).

    • Very interesting suggestions. First, using Linq instead of an explicit for loop to iterate through a List: The Linq approach is appealing because it operates at a higher level of abstraction than a for-loop approach and as a consequence is just one line of code. Most of my colleagues would agree with you and prefer Linq. But I’m just not a big fan of Linq precisely because it does work at a high level of abstraction. I have a lot of experience writing assembly code and when I write procedural code using C# or Java or C/C++, I tend to mentally track how my code will map to assembly (and then to machine code). This gives me some understanding of performance implications. But with Linq, I really don’t know exactly what the compiler will do behind the scenes. At some point the Linq code will have to generate assembly instructions that correspond to a for-loop. Additionally, I try to write most of my code in such a way that it can be easily refactored to other languages. A for-loop easily translates to any of the languages I use but Linq limits easy translation to .NET languages. Ultimately, the error-proneness issue aside, I think the Linq vs. for-loop is mostly a matter of personal coding style preference.

      Now for your second suggestion of at least using a foreach-loop rather than a for-loop, I’m not so sure I agree with you. The issue is quite similar — foreach is a tiny notch up the abstraction level from a for-loop. I’ve done a few experiments comparing the performance of code that uses a for-loop vs. code that uses a foreach-loop, and although the results weren’t conclusive, the simple for-loop was faster than the foreach-loop. So again, most of my colleagues (who are among the world’s best programmers) would prefer the foreach-loop over the for-loop so you’re in good company with your comment.

Comments are closed.