Yet Another Way to Generate Distinct Array Indices

A common task in machine learning algorithms is to generate distinct, random array indices. For example, suppose you have an array of length 10 (so the array indices are 0 to 9) and you want 4 random indices, like (0, 3, 4, 8).

SeekRotateRandomIndices

The best approach is my opinion is to use what’s called the reservoir algorithm. The only minor downside is that the algorithm is a bit subtle, and so it’s very easy to make a coding mistake. That said, here’s an alternative approach I call seek-rotate. Basically, you start with a random index, then advance a random distance to another index, rotating around to the beginning of the array if necessary. If the new index hasn’t been used yet, you add it to the results.

Here’s an implementation in the C# language:

static int[] RandomIndices(int n, int k, int seed)
{
  // k distinct indices for array length n
  int[] result = new int[k];
  int numFound = 0;
  int[] used = new int[n]; // all 0
  Random r = new Random(seed);

  int idx = r.Next(0, n); // first is good
  result[numFound++] = idx;
  used[idx] = 1; // mark as used

  int attempt = 0;
  while (numFound < n-1)
  {
    int step = r.Next(0, n);
    idx = idx + step;
    if (idx > n-1) idx = idx - n;
    if (used[idx] == 0) // not yet used
    {
      result[numFound++] = idx;
      used[idx] = 1;
    }
    ++attempt;
  }
  Array.Sort(result);
  return result;
}

And for comparison, here’s the reservoir algorithm version:

static int[] Reservoir(int n, int k, int seed)
{
  int[] result = new int[k];
  Random r = new Random(seed);

  for (int i = 0; i < k; ++i)
    result[i] = i;

  for (int t = k; t < n; ++t)
  {
    int m = r.Next(0, t + 1);
    if (m < k)
      result[m] = t;
  }
  Array.Sort(result);
  return result;
}

The reservoir algorithm iterates exactly n times. The seek-rotate algorithm might get lucky and iterate only k times, but it might get unlucky and iterate many times. All in all, I think the seek-rotate distinct indices algorithm is more of a curiosity than a practical technique.

Advertisements
This entry was posted in Machine Learning. Bookmark the permalink.