How Certain is This Classifier? Uncertainty Estimates in Python

We are not only interested in which class a classifier predicts for a certain test point, but also how certain it is that this is the right class.There are two different functions revealing the certainty of the classifier.

We use the data from sklearn library, and the IDE is sublime text3. Most of the code comes from the book: https://www.goodreads.com/book/show/32439431-introduction-to-machine-learning-with-python?from_search=true

###in scikit-learn that can be used to obtain uncertainty estimates from classifiers, 
###decision_function and predict_proba. Most (but not all) classifiers 
###have at least one of them, and many classifiers have both such as GBC.

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import make_blobs, make_circles
import numpy as np
import matplotlib.pyplot as plt 
from sklearn.model_selection import train_test_split
###############################################################
##############	  	In binary classification  		  #########
###############################################################

X,y=make_circles(noise=0.25,factor=0.5,random_state=1)
y_named=np.array(['blue','red'])[y] ###give y names 
X_train, X_test, y_train_named, y_test_named, y_train, y_test=train_test_split(X, y_named, y, random_state=0)
###1. We train the GBC model here
GradientBoostingClassifier(init=None, learning_rate=0.1,loss='deviance',max_depth=3,
	max_features=None, max_leaf_nodes=None,
	min_samples_leaf=1, min_samples_split=2,
	min_weight_fraction_leaf=0.0, n_estimators=100,
	presort='auto', random_state=0, subsample=1.0, verbose=0,
	warm_start=False)
gbrtmodel=GradientBoostingClassifier(random_state=0)
gbrtmodel.fit(X_train,y_train_named)
###2. Use decision_function to evaluate the confidence of features
###In the binary classification case, the return value of decision_function is of shape 
###(X_test,), it returns one floating point number for each sample
print(X_test.shape)
print(gbrtmodel.decision_function(X_test).shape)
###(25, 2)
###(25,)
print(gbrtmodel.decision_function(X_test))
print(gbrtmodel.predict(X_test))
greater_0=(gbrtmodel.decision_function(X_test)>0).astype(int)
print(greater_0) ###re-convert the pred results to boolean 
###value returned by decision_function reveals how strongly the model believes a data 
###point to belong to the “positive”(AKA 1) class

###[ 4.13592629 -1.7016989  -3.95106099 -3.62599351  4.28986668  3.66166106
### -7.69097177  4.11001634  1.10753883  3.40782247 -6.46262729  4.28986668
###  3.90156371 -1.20031192  3.66166106 -4.17231209 -1.23010022 -3.91576275
###  4.03602808  4.11001634  4.11001634  0.65708962  2.69826291 -2.65673325
### -1.86776597]

###['red' 'blue' 'blue' 'blue' 'red' 'red' 'blue' 'red' 'red' 'red' 'blue'
### 'red' 'red' 'blue' 'red' 'blue' 'blue' 'blue' 'red' 'red' 'red' 'red'
### 'red' 'blue' 'blue']

###[1 0 0 0 1 1 0 1 1 1 0 1 1 0 1 0 0 0 1 1 1 1 1 0 0]

###3. Use predic_proba. The output of predict_proba however is a probability for each 
###class, and is often more easily understood
##First we set precisions of decimals 
np.set_printoptions(suppress=True, precision=3)
print(gbrtmodel.predict_proba(X_test)[:6])
###[[ 0.016  0.984]
### [ 0.846  0.154]
### [ 0.981  0.019]
### [ 0.974  0.026]
### [ 0.014  0.986]
### [ 0.025  0.975]]
###Because the probabilities for the two classes sum to one, exactly one of the classes 
###is above 50% certainty. That class is the one that is predicted

###############################################################
###########	  	In multi-class classification  		  #########
###############################################################

from sklearn.datasets import load_iris
iris=load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=42)
###1. fit the model
GradientBoostingClassifier(init=None, learning_rate=0.01,loss='deviance',max_depth=3,
	max_features=None, max_leaf_nodes=None,
	min_samples_leaf=1, min_samples_split=2,
	min_weight_fraction_leaf=0.0, n_estimators=100,
	presort='auto', random_state=0, subsample=1.0, verbose=0,
	warm_start=False)
gbrtmc=GradientBoostingClassifier(random_state=0)
gbrtmc.fit(X_train,y_train)
###2. use decision_function
print(gbrtmc.decision_function(X_test).shape)
###(38, 3)
print(gbrtmc.decision_function(X_test)[:6,:])
###[[-4.96   3.624 -4.461]
### [ 5.873 -2.412 -4.815]
### [-4.943 -3.989  4.903]
### [-4.961  3.943 -3.887]
### [-4.97   4.735 -3.556]
### [ 5.873 -2.771 -4.815]]
###we can also find the strongest Var within all three vars manually based on the decision_function
### Or we can predict direcnt 
print(np.argmax(gbrtmc.decision_function(X_test),axis=1))
print(gbrtmc.predict(X_test))
###[1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1 0]
###[1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1 0]

###3. Use predict_proba
print(gbrtmc.predict_proba(X_test)[:6])
###[[ 0.     1.     0.   ]
### [ 1.     0.     0.   ]
### [ 0.     0.     1.   ]
### [ 0.     0.999  0.   ]
### [ 0.     1.     0.   ]
### [ 1.     0.     0.   ]]

###Predict_proba and decision_function always have shape (n_sam ples, n_classes)
###apart from the special case of decision_function in the binary case. 
###We can recover the prediction when there are n_classes many columns by simply 
###computing the argmax across columns
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s