A Custom Hash Function for a List of Integers

This past week I ran into an interesting problem. I had a program which used many lists of integers and I needed to track them in a hash table. To do that I needed a custom hash function. Let me be more specific. Suppose I had a class Nodes like this:

class Nodes
{
  public List<int> data;
  public int timeSeen;

  public Nodes(List<int> data, int timeSeen)
  {
    this.data = new List<int>();
    this.data.AddRange(data);
    //this.data.Sort();
    this.timeSeen = timeSeen;
  }

  // define GetHashCode() here
  . . .
}

The Nodes class is a wrapper around the essential data which is a List. I generate lots of these Nodes objects and want to determine if I’ve seen the newest one, something like this:

Hashtable history = new Hashtable();
Nodes n1 = // data is {3, 17, 25}
if (history.Contains(n1.GetHashCode()) == true)
  // n1 is already in the hash table
else
  history.Add(n1.GetHashCode(), n1); // add n1

The problem is to define a custom hash function. There is a lot of really bad information on the Internet about this topic. I first tried this:

public override int GetHashCode()
{
  int result = 0;
  for (int i = 0; i < this.data.Count; ++i)
    result = result ^ this.data[i];
  return result;
}

I XOR each value in the List together. This was a bad idea: many Lists which had different data generated the same hash code. For example, both

{ 18, 286, 577, 699, 719, 1385, 1423, 1452, 1515, 1592, 1673, 1967, 931 }

and

{ 3, 18, 19, 85, 103, 138, 149, 164, 286, 301, 370, 378, 427, 439, 520, 532, 577, 597, 677, 691, 699, 719, 815, 834, 843, 866, 881, 931, 944, 997, 1073, 1187, 1236, 1268, 1276, 1385, 1423, 1452, 1504, 1515, 1592, 1621, 1673, 1858, 1946, 1967, 1980 }

give a hash code of 1829. Ugly. Anyway, the technique I’m using now is to sort the List data (see the Nodes definition), create a string with a space between each value, and then hash the string:

public override int GetHashCode()
{
  StringBuilder sb = new StringBuilder();
  for (int i = 0; i < clique.Count; ++i)
  {
    sb.Append(clique[i]);
    sb.Append(” “);
  }
  return result;
}
I have to sort because I want { 2, 9, 5 } to be the same as { 9, 5, 2 }. I add spaces so that { 3, 5, 7 } is different from { 3, 57 }. I have no idea if my approach is mathematically sound but it’s working well for my purposes. The moral is that writing a custom hash code function is very tricky and the documentation on the Web is contradictory and often misleading.

This entry was posted in Software Test Automation. Bookmark the permalink.