Logistic Regression in Python to Tune Parameter C

The trade-off parameter of logistic regression that determines the strength of the regularization is called C, and higher values of C correspond to less regularization (where we can specify the regularization function).C is actually the Inverse of regularization strength(lambda)

We use the data from sklearn library, and the IDE is sublime text3. Most of the code comes from the book: https://www.goodreads.com/book/show/32439431-introduction-to-machine-learning-with-python?from_search=true

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
cancer=load_breast_cancer()
X_train,X_test,y_train,y_test=train_test_split(cancer.data,cancer.target,stratify=cancer.target,random_state=42)

######default C=1#####
lgr=LogisticRegression().fit(X_train,y_train)
print("training set score: %f" % lgr.score(X_train, y_train))
print('\n'"test set score: %f" % lgr.score(X_test, y_test))

######increase C to 100#####
lgr100=LogisticRegression(C=100).fit(X_train,y_train)
print('\n'"training set score of lgr100: %f" % lgr100.score(X_train, y_train))
print('\n'"test set score of lgr100: %f" % lgr100.score(X_test, y_test))

######decrease C to 0.01#####
lgr001=LogisticRegression(C=0.01).fit(X_train,y_train)
print('\n'"training set score of lgr001: %f" % lgr001.score(X_train, y_train))
print('\n'"test set score of lgr001: %f" % lgr001.score(X_test, y_test))

import matplotlib.pyplot as plt
plt.plot(lgr.coef_.T,'o',label='C=1')
plt.plot(lgr100.coef_.T,'+',label='C=100')
plt.plot(lgr001.coef_.T,'-',label='C=0.01')
plt.xticks(range(cancer.data.shape[1]),cancer.feature_names,rotation=90)
plt.ylim(-5,5)
plt.legend()
plt.show()

###training set score: 0.955399###

###test set score: 0.951049###

###training set score of lgr100: 0.971831###

###test set score of lgr100: 0.965035###

###training set score of lgr001: 0.934272###

###test set score of lgr001: 0.930070

figure_1.png

###If we desire a more interpretable model, using L1 regularization might help
###As LogisticRegression applies an L2 regularization by default, the result 
###looks simi‐ lar to Ridge in Figure ridge_coefficients. Stronger regularization 
###pushes coefficients more and more towards zero, though coefficients never 
###become exactly zero.

import numpy as np
import math
n=np.arange(-2,3)
r=pow(float(10),n)

for C in r:
	lr_l1=LogisticRegression(C=C,penalty="l1").fit(X_train,y_train)
	print('\n'"Training Accuracy of L1 LogRess with C=%f:%f"%(C,lr_l1.score(X_train,y_train)))
	print('\n'"Test Accuracy of L1 LogRegss with C=%f: %f"%(C,lr_l1.score(X_test,y_test)))
	plt.plot(lr_l1.coef_.T,'o',label="C=%f"%C)
plt.xticks(range(cancer.data.shape[1]),cancer.feature_names,rotation=90)
plt.ylim(-5,5)
plt.legend(loc='best')
plt.show()

###Training Accuracy of L1 LogRess with C=0.010000:0.917840###

###Test Accuracy of L1 LogRegss with C=0.010000: 0.930070###

###Training Accuracy of L1 LogRess with C=0.100000:0.931925###

###Test Accuracy of L1 LogRegss with C=0.100000: 0.930070###

###Training Accuracy of L1 LogRess with C=1.000000:0.960094###

###Test Accuracy of L1 LogRegss with C=1.000000: 0.958042###

###Training Accuracy of L1 LogRess with C=10.000000:0.978873###

###Test Accuracy of L1 LogRegss with C=10.000000: 0.972028###

###Training Accuracy of L1 LogRess with C=100.000000:0.985915###

###Test Accuracy of L1 LogRegss with C=100.000000: 0.979021

figure_2.png

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s