Bagging in Machine Learning with Python
In this tutorial, we will learn an important ensemble learning method used very often in Machine Learning. Ensemble modeling is a method that combines several machine learning models to improve the overall results. The results are improved because these models complement the weaknesses of other models with their strengths. As the common phrase goes, Unity is strength.
Bagging
Bagging stands for Bootstrap Aggregating, is a popular ensemble method. It improves the overall accuracy by reducing the variance and overfitting. In this method, each model is trained independently and parallelly on a random subset of data, and each subset is selected with repetition. These subsets are called Bootstrap. Now, each model produces results that are aggregated using various methods such as Voting or Averaging. In this way, the strengths of different models are combined while reducing the errors.
Please note that bagging is different from the Boosting method. The former works parallelly, whereas the latter works sequentially, meaning that the next model will work in accordance with the errors given by the previous model. The aggregation method used is weighting in Boosting.
Python Code
Let’s write the Python code for Bagging. I will be using the famous iris dataset for this. I have combined 20 decision trees using Bagging.
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
iris = load_iris()
X, y = iris.data, iris.target
# Splitting the dataset in the ratio of 70:30
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=2024)
# Classifier that will be using is Decision Tree
classifier = DecisionTreeClassifier()
# Bagging classifier combining 20 decision trees
bagging_classifier = BaggingClassifier(classifier, n_estimators=20, random_state=2024)
# Training
bagging_classifier.fit(X_train, y_train)
predictions = bagging_classifier.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)
Accuracy: 0.8888888888888888
Comparison Bagging Accuracy with Decision Classifier
classifier.fit(X_train, y_train)
pred = classifier.predict(X_test)
accu = accuracy_score(y_test, pred)
print("Accuracy:", accu)Accuracy: 0.8666666666666667
It’s evident that the accuracy gets improved using the Bagging Method.
Leave a Reply