I wrote an article titled “K-Means Data Clustering” in the August 2015 issue of Microsoft’s MSDN Magazine. See https://msdn.microsoft.com/en-us/magazine/mt185575.
Data clustering is the process of grouping data items so that similar items are placed together. Once grouped, the clusters of data can be examined to see if there are relationships that might be useful. For example, if a large set of voting data was clustered, information about the data in each cluster might reveal patterns that could be used for targeted advertising.
There are several clustering algorithms. One of the most common is called the k-means algorithm. There are several variations of this algorithm. My article explains a relatively recent variation (2007) called the k-means++ algorithm.
The k-means++ algorithm uses a technique called proportional fitness selection to initialize the clustering process. There are several ways to implement proportional fitness selection; I used a technique called roulette wheel selection.
To summarize, there are several algorithms to group data. The most common algorithm is called k-means. The k-means algorithm is very sensitive to its initialization process. The k-means++ variant uses a clever initialization scheme called proportional fitness selection. Fitness selection can be implemented in several ways, including roulette wheel selection.