I wrote an article titled “Data Clustering using R” in the February 2017 issue of Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2017/02/01/data-clustering-using-r.aspx.
Data clustering is the process of programmatically grouping data so that similar items belong together. For example, suppose you have the height (in inches) and weight (in lbs) of eight people:
(1) 65,220 (2) 73,160 (3) 59,110 (4) 61,120 (5) 75,150 (6) 68,230 (7) 62,130 (8) 66,210
Even with a ridiculously small dataset, it’s not easy to see if there’s any kind of pattern here. In my article I explain how to cluster data using the k-means algorithm and the R language. The result of clustering the data into three groups is:
73 160 75 150 ===== 65 220 68 230 66 210 ===== 59 110 61 120 62 130
A clear pattern has emerged. There are tall-height, medium-weight people; medium-height, heavy-weight people; and short-height, light-weight people.
In my article, I explain how to use the R language built-in kmeans() function which is applicable in most scenarios where you want to cluster data, and also how to write a custom cluster function using raw R. This approach is useful when you need some sort of custom behavior.