DBSCAN in Python

Another very useful clustering algorithm is DBSCAN (which stands for “Density- based spatial clustering of applications with noise”). The main benefits of DBSCAN are that
###a) it does not require the user to set the number of clusters a priori,
###b) it can capture clusters of complex shapes, and
###c) it can identify point that are not part of any cluster.

We use the data from sklearn library, and the IDE is sublime text3. Most of the code comes from the book: https://www.goodreads.com/book/show/32439431-introduction-to-machine-learning-with-python?from_search=true

###DBSCAN is somewhat slower than agglomerative clustering and k-Means, working by picking 
###a point to start with, but still scales to relatively large datasets.

###There are two parameters in DBSCAN, min_samples and eps. If there are at least min_samples 
###many data points within a distance of eps to a given data point, it’s called a core sample. 
###Core samples that are closer than the distance eps are put into the same cluster by DBSCAN.
###Points more than min-sample are within eps are labeled core, otherwise noise

from sklearn.datasets import make_moons
import numpy as np
from sklearn.datasets import make_blobs
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt 
from sklearn.model_selection import train_test_split
X,y=make_blobs(random_state=0,n_samples=12)
dbscan=DBSCAN()
clusters=dbscan.fit_predict(X)
print(clusters)
#[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
###All data points were assigned the label -1, which stands for noise. 

###Try  changing eps and min_samples to see the effect
fig,axes=plt.subplots(3,4,figsize=(12,6), subplot_kw={'xticks': (), 'yticks': ()})
###delete the axises value with subplot_kw

colors=np.array(['r','b','g','y'])
for i, min_samples in enumerate([2,3,5]):
	for j, eps in enumerate([1,1.5,2,3]):
		dbscan=DBSCAN(min_samples=min_samples,eps=eps)
		clusters=dbscan.fit_predict(X)
		print('min_samples: %d eps: %.1f clusters: %s'%(min_samples,eps,clusters))
		sizes=30*np.ones(X.shape[0])
		sizes[dbscan.core_sample_indices_]*=8 ###enlarge the core points by 8 times
		axes[i,j].scatter(X[:,0],X[:,1],c=colors[clusters],s=sizes)
		axes[i,j].set_title('min_samples: %d eps: %0.1f'%(min_samples,eps))
fig.tight_layout()
plt.show()
###min_samples: 2 eps: 1.0 clusters: [-1  0  0 -1  0 -1  1  1  0  1 -1 -1]
###min_samples: 2 eps: 1.5 clusters: [0 1 1 1 1 0 2 2 1 2 2 0]
###min_samples: 2 eps: 2.0 clusters: [0 1 1 1 1 0 0 0 1 0 0 0]
###min_samples: 2 eps: 3.0 clusters: [0 0 0 0 0 0 0 0 0 0 0 0]
###min_samples: 3 eps: 1.0 clusters: [-1  0  0 -1  0 -1  1  1  0  1 -1 -1]
###min_samples: 3 eps: 1.5 clusters: [0 1 1 1 1 0 2 2 1 2 2 0]
###min_samples: 3 eps: 2.0 clusters: [0 1 1 1 1 0 0 0 1 0 0 0]
###min_samples: 3 eps: 3.0 clusters: [0 0 0 0 0 0 0 0 0 0 0 0]
###min_samples: 5 eps: 1.0 clusters: [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
###min_samples: 5 eps: 1.5 clusters: [-1  0  0  0  0 -1 -1 -1  0 -1 -1 -1]
###min_samples: 5 eps: 2.0 clusters: [-1  0  0  0  0 -1 -1 -1  0 -1 -1 -1]
###min_samples: 5 eps: 3.0 clusters: [0 0 0 0 0 0 0 0 0 0 0 0]

figure_1.png

We can also use DBSCAN to cluster the data we used for last article:
https://charleshsliao.wordpress.com/2017/05/30/quick-clustering-in-python/

###Points that belong to clusters are colored, while the noise points are shown in yellow.
###Core samples are shown as large points, while border points are displayed as smaller points.

###We can also use DBSCAN to cluster the data we used for last article
X, y = make_moons(n_samples=200, noise=0.05, random_state=0)
from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
scaler.fit(X)
X_s=scaler.transform(X)

dbscan=DBSCAN()
clusters=dbscan.fit_predict(X_s)
plt.scatter(X_s[:,0],X_s[:,1],c=clusters,cmap="Paired",s=40)
plt.show()

figure_2.png

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s