When I’m writing machine learning code, a common task I need to perform is to generate random array indices. For example, if I have an array with 10 cells, I might want to generate 3 random indices, such as 4, 6, 5. In other words, for an array with length k, the indices are 0 through k-1, and I need to generate n indices that are all different.

There are several ways to do this. One approach is a brute force method:

*loop
pick n random values between 0 and k-1
if all values different
return the values
end loop
return null*

The brute force approach is very effective when n, the number of indices to generate, is much smaller than k, the array length, because the chance of getting duplicate values is very small. The problem is that there’s no guarantee you’ll ever get n different values.

A second approach is to use the Reservoir algorithm. It’s quite tricky and I described it in an earlier blog post. The Reservoir algorithm is guaranteed to succeed, but the downside is you have to iterate k times. If the indices-generating routine must be called many times, as is often the case with ML code, you can get bad performance.

So, the approach I sometimes use is this:

*set trial := 0
set maxTrials := k * 2
loop maxTrials
pick n random values between 0 and k-1
if all values different
return the values
trial := trial + 1
end loop
if didn't get good values
return Reservoir(n, k)*

In other words, I first try the quick, but not guaranteed brute force approach, and then if that fails, I resort to the slower, but guaranteed, Reservoir algorithm. Here’s a C# implementation.

private static int[] RandomIndices(int n,
int range, int seed)
{
Random rnd = new Random(seed);
int maxAttempts = range * 2;
int[] result;
result = new int[n];
for (int i = 0; i < maxAttempts; ++i)
{
for (int j = 0; j < n; ++j)
result[j] = rnd.Next(0, range);
Array.Sort(result);
if (HasDups(result) == false)
return result;
}
// OK, use Reservoir
for (int i = 0; i < n; ++i)
result[i] = i;
for (int t = n; t < range; ++t)
{
int m = rnd.Next(0, t + 1);
if (m < n)
result[m] = t;
}
Array.Sort(result);
return result;
}
private static bool HasDups(int[] arr)
{
for (int i = 0; i < arr.Length-1; ++i)
if (arr[i] == arr[i + 1])
return true;
return false;
}

One possible enhancement I’ve considered is to first check if n and k (“range” in the code) are “close to each other”, and if so, go directly to the Reservoir part of the code because the brute force approach will likely fail.

### Like this:

Like Loading...

*Related*