The R language has a built-in kmeans() function for k-means data clustering. I was thinking about coding an R language custom k-means function from scratch. In most languages — C#, JavaScript, Python, etc. — I’d just start coding because I know the k-means algorithm.

But because R is intended mostly for interactive use, R has zillions of high level functions . So whenever I am going to write some non-trivial code using R, I spend a few minutes recalling R techniques and functions related to the area I’m going to be working on.

Here’s how to select n random rows from a data frame:

_ # I issued a options(prompt="_ ") to change prompt
_ mydf = read.table("DummyData8.txt", header=F, sep=",")
_ N = nrow(mydf) # number rows is 8
_ n = 3
_ set.seed(1)
_ ri = sample(N,n) # 3 random indices
_ rm = as.matrix(mydf[ri,]) # random subset
_ rm
V1 V2 V3
3 61 120 40
8 70 220 80
4 75 150 50

Here’s how to find the index of the smallest value in a vector:

_ ii = which.min(mydf[,1]) # index smallest val in col 1
_ ii
[1] 6

And here’s how to get the distance (Euclidean by default) between all pairs of a set of vectors:

_ dists = dist(rm) # gives a 'dist' object
_ dm = as.matrix(dists) # convert to a matrix
_ dm
1 2 3
1 0.00000 108.07868 34.58323
2 108.07868 0.00000 76.32169
3 34.58323 76.32169 0.00000

Anyway, the R language is really different from all the other programming languages I use — it has different syntax but it also requires a very different mindset compared to other common procedural languages.

### Like this:

Like Loading...

*Related*