R Language Vector Sampling

Suppose you have the numbers 1 through 9 and you want to randomly select four of those numbers. There are a surprising number of mini-algorithms to do this. In the R language, you can use the built-in sample() function or write a user-defined function that uses the reservoir algorithm.

I wrote a little demo script to illustrate. The built-in sample() function, by default, selects distinct numbers from the parent set of numbers. If you supply an optional replace=T argument, you might get duplicate values.


The reservoir sampling algorithm is short but very tricky in the sense that it’s easy to make an off-by-one error in the implementation.

This entry was posted in Machine Learning, R Language. Bookmark the permalink.