# Implementation of Random Forest for classification in python

In the previous tutorial, I have discussed intuition behind the Random Forest algorithm. Before going through this post, you must be acquainted behind random forest. In this post, I will discuss the implementation of random forest in python for classification. Classification is performed when we have to classify the unknown item into a class, generally yes or no, or can be something else. We have other algorithms like logistic regression, decision tree, etc but among them, the random forest is the best.

Here is the link to the data set I have used – Social_Network_Ads.CSV

You may also be interested in learning: Random forest for regression and its implementation

## Implementation of Random forest for classification

Here are the steps, you can follow to run the algorithm to perform classification. I will also give you an example to have a better understanding of how you can write the code. Here is the link

• First of all, import the necessary libraries.
```     import numpy as np
import matplotlib.pyplot as plt
import pandas as pd```
• Now import the data set.
`     dataset = pd.read_csv('Social_Network_Ads.csv')`

This is what the data set looks like. • After you have imported the data set, first of all, go through the data set thoroughly and take only necessary columns in your data set.
```     X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values```
• Now split your data set into training and testing data set. Optimal splitting ratio is 7:3, 8:2, so you can choose any one of them as per your choice.
```     from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)```  • This step is to feature scale your data. One feature having values in range 1000-20000 don’t dominate feature having values in range 1-100. To ensure this feature scaling is done.
```     from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)```
• Now comes the main task i.e. fitting the classifier to the training set. In this, you have to first import required library. Go through the documentation of function randomForestClassifier and understand the meaning and usage of each parameter. Here, for ex- I have used no.of estimators to be 10, you can use more or less as per your requirement. Next, I have used criteria to be “entropy”.
```    from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)```
• Now apply the model on test set and predict the test set results.
`    y_pred = classifier.predict(X_test)`
• To evaluate the performance of your model, there are several available metrics like auc , ROC curve , confusion matrix etc. I have used confusion matrix here.
```    from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)``` Confusion Matrix is used

### Visualizing the output – Random Forest Classification in Python

• Although it ends here. Now we will visualize the test and train set results.
```    from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
``` Visualization of train set result

```    # Visualising the Test set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
``` Visualization of Test set result