Data Clustering using R

I wrote an article titled “Data Clustering using R” in the February 2017 issue of Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2017/02/01/data-clustering-using-r.aspx.

dataclusteringusingrtitlebanner

Data clustering is the process of programmatically grouping data so that similar items belong together. For example, suppose you have the height (in inches) and weight (in lbs) of eight people:

(1) 65,220
(2) 73,160
(3) 59,110
(4) 61,120
(5) 75,150
(6) 68,230
(7) 62,130
(8) 66,210

Even with a ridiculously small dataset, it’s not easy to see if there’s any kind of pattern here. In my article I explain how to cluster data using the k-means algorithm and the R language. The result of clustering the data into three groups is:

73  160  
75  150  
=====
65  220  
68  230  
66  210  
=====
59  110  
61  120  
62  130

A clear pattern has emerged. There are tall-height, medium-weight people; medium-height, heavy-weight people; and short-height, light-weight people.

clusteringusingr_demoscreenshot

In my article, I explain how to use the R language built-in kmeans() function which is applicable in most scenarios where you want to cluster data, and also how to write a custom cluster function using raw R. This approach is useful when you need some sort of custom behavior.

Advertisements
This entry was posted in Machine Learning, R Language. Bookmark the permalink.