Bagging in Machine Learning with Python
In this tutorial, we will learn an important ensemble learning method used very often in Machine Learning. Ensemble modeling is a method that combines several machine learning models to improve the overall results. The results are improved because these models complement the weaknesses of other models with their strengths. As the common phrase goes, Unity is strength.
Bagging
Bagging stands for Bootstrap Aggregating, is a popular ensemble method. It improves the overall accuracy by reducing the variance and overfitting. In this method, each model is trained independently and parallelly on a random subset of data, and each subset is selected with repetition. These subsets are called Bootstrap. Now, each model produces results that are aggregated using various methods such as Voting or Averaging. In this way, the strengths of different models are combined while reducing the errors.
Please note that bagging is different from the Boosting method. The former works parallelly, whereas the latter works sequentially, meaning that the next model will work in accordance with the errors given by the previous model. The aggregation method used is weighting in Boosting.
Python Code
Let’s write the Python code for Bagging. I will be using the famous iris dataset for this. I have combined 20 decision trees using Bagging.
from sklearn.ensemble import BaggingClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score iris = load_iris() X, y = iris.data, iris.target # Splitting the dataset in the ratio of 70:30 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=2024) # Classifier that will be using is Decision Tree classifier = DecisionTreeClassifier() # Bagging classifier combining 20 decision trees bagging_classifier = BaggingClassifier(classifier, n_estimators=20, random_state=2024) # Training bagging_classifier.fit(X_train, y_train) predictions = bagging_classifier.predict(X_test) accuracy = accuracy_score(y_test, predictions) print("Accuracy:", accuracy)
Accuracy: 0.8888888888888888
Comparison Bagging Accuracy with Decision Classifier
classifier.fit(X_train, y_train) pred = classifier.predict(X_test) accu = accuracy_score(y_test, pred) print("Accuracy:", accu)
Accuracy: 0.8666666666666667
It’s evident that the accuracy gets improved using the Bagging Method.
Leave a Reply