Bagging in Machine Learning with Python

In this tutorial, we will learn an important ensemble learning method used very often in Machine Learning. Ensemble modeling is a method that combines several machine learning models to improve the overall results. The results are improved because these models complement the weaknesses of other models with their strengths. As the common phrase goes, Unity is strength.

Bagging

Bagging stands for Bootstrap Aggregating, is a popular ensemble method. It improves the overall accuracy by reducing the variance and overfitting. In this method, each model is trained independently and parallelly on a random subset of data, and each subset is selected with repetition. These subsets are called Bootstrap. Now, each model produces results that are aggregated using various methods such as Voting or Averaging. In this way, the strengths of different models are combined while reducing the errors.

Please note that bagging is different from the Boosting method. The former works parallelly, whereas the latter works sequentially, meaning that the next model will work in accordance with the errors given by the previous model. The aggregation method used is weighting in Boosting.

Python Code

Let’s write the Python code for Bagging. I will be using the famous iris dataset for this. I have combined 20 decision trees using Bagging.

from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


iris = load_iris()
X, y = iris.data, iris.target

# Splitting the dataset in the ratio of 70:30
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=2024)

# Classifier that will be using is Decision Tree
classifier = DecisionTreeClassifier()

# Bagging classifier combining 20 decision trees
bagging_classifier = BaggingClassifier(classifier, n_estimators=20, random_state=2024)

# Training
bagging_classifier.fit(X_train, y_train)

predictions = bagging_classifier.predict(X_test)

accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)
Accuracy: 0.8888888888888888

Comparison Bagging Accuracy with Decision Classifier

classifier.fit(X_train, y_train)

pred = classifier.predict(X_test)
accu = accuracy_score(y_test, pred)

print("Accuracy:", accu)
Accuracy: 0.8666666666666667

It’s evident that the accuracy gets improved using the Bagging Method.

Leave a Reply

Your email address will not be published. Required fields are marked *