A Quick Demo of the DBSCAN Clustering Algorithm

I was reading a research paper this morning and the paper used the DBSCAN (“density-based spatial clustering of applications with noise”) clustering algorithm. DBSCAN is somewhat similar to k-means clustering. Both work only with strictly numeric data.

In k-means you must specify the number of clusters. DBSCAN doesn’t require you to specify the number of clusters, but for DBSCAN you must specify an epsilon value (how close is “close”) and a minimum number of points that constitute a core cluster. These two DBSCAN parameters implicitly determine the number of clusters.

I hadn’t used DBSCAN in a long time so I coded up a quick demo to refresh my memory. Implementing DBSCAN from scratch isn’t too difficult (I’ve done so using the C# language), but the scikit-learn Python language code library has a built-in implementation that’s simple and easy to use. So my demo was based on the scikit documentation example.

I set up 20 items of dummy data. I used 2D data so that I could graph the results. Like most clustering algorithms, the source data must be normalized so that large magnitude items don’t swamp small magnitude items.

The clustering function assigns a cluster ID label to each data item: 0, 1, 2, etc. Items that don’t get assigned to a cluster get a label of -1 to indicate they are “noise”.

I used the documentation code to create a graph of the clustering results. The five red dots are class 0, six green dots are class 1, and four blue dots are class 2. Noise items are colored black.

Data items that belong to clusters can be “core points” or non-core points. The large dots are core points, the smaller dots are non-core points.

Good fun!



A cluster of three clever photographs with a playing card theme by Serge Lutens (1942-). Lutens is maybe best known for his work in the 1980s for Shiseido, a Japanese cosmetics company.


Code:

# dbscan_cluster.py

import numpy as np
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt

print("\nBegin clustering demo with DBSCAN ")

X  = np.array([
 [ 0.325, -0.595],
 [ 0.507,  1.619],
 [ 0.817,  1.895],
 [ 1.147,  0.764],
 [-1.285, -0.95 ],
 [-1.237, -0.532],
 [ 1.108,  1.248],
 [-0.847, -0.722],
 [ 0.124, -1.346],
 [ 0.910, -0.227],
 [ 0.310, -0.756],
 [-1.384, -0.715],
 [ 0.736,  1.15 ],
 [ 0.511, -0.517],
 [-1.081, -0.91 ],
 [ 0.416,  1.252],
 [-2.349, -0.42 ],
 [-0.559, -1.161],
 [ 0.806,  1.054],
 [ 1.023, -0.133]])

print("\nX data: ")
print(X)

print("\nPerforming clustering ")
clustering = DBSCAN(eps=0.5, min_samples=4).fit(X)
print("Done ")

print("\nComputed cluster labels: ")
print(clustering.labels_)

print("\nIndices of core points: ")
print(clustering.core_sample_indices_)

n_clusters = np.max(clustering.labels_) + 1
counts = np.zeros(n_clusters+1, dtype=np.int64)  # noise
for i in range(len(X)):
  lbl = clustering.labels_[i]
  if lbl == -1:
    counts[n_clusters] += 1
  else:
    counts[lbl] += 1
print("\nCluster counts, noise count: ")
print(counts)

print("\nDisplaying clusering: " )

core_samples_mask = np.zeros_like(clustering.labels_, \
  dtype=bool)
core_samples_mask[clustering.core_sample_indices_] = True

unique_labels = set(clustering.labels_)
colors = [plt.cm.hsv(each) \
          for each in np.linspace(0, 1, \
            len(unique_labels))]
for k, col in zip(unique_labels, colors):
  if k == -1:
    col = [0, 0, 0, 1]  # noise = black

  class_member_mask = (clustering.labels_ == k)

  xy = X[class_member_mask & core_samples_mask]
  plt.plot(xy[:, 0], xy[:, 1], 'o', 
    markerfacecolor=tuple(col),
    markeredgecolor='k', markersize=14)

  xy = X[class_member_mask & ~core_samples_mask]
  plt.plot(xy[:, 0], xy[:, 1], 'o',
    markerfacecolor=tuple(col),
    markeredgecolor='k', markersize=6)
plt.show()

print("\nEnd demo ")
This entry was posted in Machine Learning. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s