Data Clustering using R

I wrote an article titled “Data Clustering using R” in the February 2017 issue of Visual Studio Magazine. See


Data clustering is the process of programmatically grouping data so that similar items belong together. For example, suppose you have the height (in inches) and weight (in lbs) of eight people:

(1) 65,220
(2) 73,160
(3) 59,110
(4) 61,120
(5) 75,150
(6) 68,230
(7) 62,130
(8) 66,210

Even with a ridiculously small dataset, it’s not easy to see if there’s any kind of pattern here. In my article I explain how to cluster data using the k-means algorithm and the R language. The result of clustering the data into three groups is:

73  160  
75  150  
65  220  
68  230  
66  210  
59  110  
61  120  
62  130

A clear pattern has emerged. There are tall-height, medium-weight people; medium-height, heavy-weight people; and short-height, light-weight people.


In my article, I explain how to use the R language built-in kmeans() function which is applicable in most scenarios where you want to cluster data, and also how to write a custom cluster function using raw R. This approach is useful when you need some sort of custom behavior.

This entry was posted in Machine Learning, R Language. Bookmark the permalink.