Ensemble with Random Forest in Python

We use the data from sklearn library, and the IDE is sublime text3. Most of the code comes from the book: https://www.goodreads.com/book/show/32439431-introduction-to-machine-learning-with-python?from_search=true

###1. we import according lib and models
from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt 
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_moons

X,y=make_moons(n_samples=100,noise=0.25,random_state=3)
X_train, X_test, y_train, y_test = train_test_split(X,y, stratify=y, random_state=46)

###2. We identfiy the parameters of the model
###If we set max_features to n_features, that means that each split can look at all 
###features in the dataset, and no randomness will be injected, which means the trees ensembled
### will be similar.
RandomForestClassifier(bootstrap=True,class_weight=None,criterion='gini',max_depth=None,
	max_features='auto',max_leaf_nodes=None,min_samples_leaf=1,min_samples_split=2,
	min_weight_fraction_leaf=0.0,n_estimators=30,n_jobs=1,oob_score=False, random_state=2, 
	verbose=0, warm_start=False)

###3. save the model and fit the data
###To build a random forest model, you need to decide on the number of trees to build with resampling methods
###(the n_estimator parameter, considering the original sample).
forest=RandomForestClassifier()
forest.fit(X_train,y_train)

###4. print the result
print("accuracy on training set: %f" % forest.score(X_train, y_train))
print("accuracy on test set: %f" % forest.score(X_test, y_test))
###accuracy on training set: 1.000000
###accuracy on test set: 0.920000

###The trees that are built as part of the random forest are stored in the estimator_
###To make a prediction using the random forest, the algorithm first makes a prediction for every 
###tree in the forest. For regression, we can average these results to get our final prediction. 
###For classification, a “soft voting” strategy is used. This means each algorithm makes a
### “soft” prediction, providing a probability for each possible output label. 

###we can also rebuild the tree model of breast cancer
from sklearn.datasets import load_breast_cancer
cancer=load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
        cancer.data, cancer.target, random_state=66)
forest = RandomForestClassifier(n_estimators=80, random_state=0)
forest.fit(X_train, y_train)
print("accuracy on training set: %f" % forest.score(X_train, y_train))
print("accuracy on test set: %f" % forest.score(X_test, y_test))
###accuracy on training set: 1.000000
###accuracy on test set: 0.979021
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s