Jupyter Interactions

Clustering using DBSCAN

Will Furnass

DBSCAN is an n-dimensional clustering algorithm that has the advantages over other clustering algorithms of

not requiring the number of points to be specified in advance and
differentiating between clustered points and noise points
clusters can be non-spherical

Here we use the DBSCAN implementation provided by the scikit-learn package to cluster a 2D dataset. The algorithm enumerates distinct clusters using integer labels (assigning -1 to noise points); here these labels are plotted in 2D using the matplotlib library.

Run the cell below then use the two sliders to assess the impact of changing DBSCAN's two key parameters:

eps: The maximum distance between two samples for them to be considered as in the same neighborhood.
min_pts: The number of samples (or total weight) in a neighborhood for a point to be considered as a core point. This includes the point itself.

References

Ester, M., Kriegel, H., Sander, J., Xu, X. (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Presented at the Second International Conference on Knowledge Discovery and Data Mining, AAAI Press, pp. 226–231.

Keywords

DBSCAN
clustering

Clustering using DBSCAN

Will Furnass

References

Keywords

Packages used