Suppose you have the numbers 1 through 9 and you want to randomly select four of those numbers. There are a surprising number of mini-algorithms to do this. In the R language, you can use the built-in sample() function or write a user-defined function that uses the reservoir algorithm.

I wrote a little demo script to illustrate. The built-in sample() function, by default, selects distinct numbers from the parent set of numbers. If you supply an optional replace=T argument, you might get duplicate values.

The reservoir sampling algorithm is short but very tricky in the sense that it’s easy to make an off-by-one error in the implementation.

### Like this:

Like Loading...

*Related*