Determining k Distinct Random Integers

Most of my work involves machine learning algorithms. A mini-task I often run into is to generate k distinct random integers from n integers. For example if n = 10 and k = 3, that means I need 3, different, random, integers that are all between 0 and 9 inclusive, such as {0, 2, 7} or (1, 3, 9} but not {3, 5, 5}.

One approach is to use brute force and just generate k numbers in the range [0,n-1] and then check to see if there are any duplicates. This approach works quite well when n is very large compared to k because the chance of accidentally generating duplicate values is small.

A second approach is to create an array that holds values from 0 through n-1, then shuffle the values in the array using the Knuth shuffle algorithm, and then selecting the first k values from the shuffled array. This technique does a lot of extra work when n is very large.

A third, very elegant and efficient, approach is to use reservoir sampling. First create a result array of size k. Seed the array with values 0 through k-1. Now iterate from t = k through n-1 inclusive. For each value of t, generate a random integer, m, between 0 and t+1 inclusive. If m is less than k, replace result[m] with t otherwise do not change the result[m] value.

This third approach is quite efficient at the expense of being a bit tricky. The algorithm operates in O(n) time and doesn’t need any extra storage like the shuffle technique. Here is a C# implementation:

public static int[] DistinctIndexes(int k, int n)
  // k random distinct ints between 0 and n-1
  // using reservoir sampling
  // ex: DistinctIndexes(3, 10) returns
  // 3 distinct numbers betweeen 0 and 9
  int[] result = new int[k];
  for (int i = 0; i < k; ++i)
    result[i] = i;

  for (int t = k; t < n; ++t)
    int m = rnd.Next(0, t+1); 
    if (m < k) { result[m] = t; }

  return result;

Calling the method could look something like this:

for (int i = 0; i < 8; ++i) // 8 times
  int[] ri = DistinctIndexes(3, 10);
  Console.Write(i + ": ");
  for (int j = 0; j < 3; ++j)
    Console.Write(ri[j] + " ");

The code implementation is surprisingly tricky, and I made several mistakes until I got a correct version.


This entry was posted in Software Test Automation. Bookmark the permalink.