Stacking Ensemble Machine Learning in Python

In this tutorial, we will learn about the Stacking ensemble machine learning algorithm in Python. It is a machine learning algorithm that combines predictions of machine learning models, like bagging and boosting.

It involves two base models level-0 and level-1 models. The other is commonly known as the meta-model or level-1. This model is used in predicting the base model.

  • Level-0 Models: Fitting of the model done after training the model on the data.
  • Level-1 Models: We use this model to learn and then combine the forecast made by the level-0 model.

Output received from the level-0 model is used in the training level-1 model.

Stacking for Classification & Regression in Python

1. Classification

The make_classification() function can prove to be helpful here as this will create a binary classification problem.

from sklearn.datasets import make_classification
X, Y = make_classification(n_samples=1500, n_features=30, n_informative=20, n_redundant=5, random_state=1)

Next, predicting models using various algorithms. Algorithms are:

  • Support Vector Machine.
  • k-Nearest Neighbors.
  • Decision Tree.
  • Logistic Regression.
  • Naive Bayes.
def new_models():
  models = dict()
  models['lr'] = LogisticRegression()
  models['knn'] = KNeighborsClassifier()
  models['cart'] = DecisionTreeClassifier()
  models['svm'] = SVC()
  models['bayes'] = GaussianNB()
  models['stacking'] = new_stacking()
  return models

Next, combining models into one mode with the help of stacking. Now, combining predictions of the five models using logistic regression.

def new_stacking():
  level0 = list()
  level0.append(('lr', LogisticRegression()))
  level0.append(('knn', KNeighborsClassifier()))
  level0.append(('cart', DecisionTreeClassifier()))
  level0.append(('svm', SVC()))
  level0.append(('bayes', GaussianNB()))
  level1 = LogisticRegression()
  model = StackingClassifier(estimators=level0, final_estimator=level1, cv=5)
  return model

UsingĀ evaluate_model() function to obtain list of scores.

def eval_model(model):
  cv = RepeatedStratifiedKFold(n_splits=20, n_repeats=4, random_state=2)
  scr = cross_val_score(model, X, Y, scoring='accuracy', cv=cv, n_jobs=-2, error_score='raise')
  return scr

Driver code.

X, Y = new_dataset()
models = new_models()
results, names = list(), list()
for name, model in models.items():
  scr = eval_model(model)
  results.append(scores)
  names.append(name)
  print('>%s %.4f (%.4f)' % (name, mean(scr), std(scr)))
pyplot.boxplot(results, labels=names, showmeans=True)
pyplot.show()

Output:

>lr 0.8353 (0.0383)
>knn 0.9307 (0.0297)
>cart 0.7583 (0.0430)
>svm 0.9435 (0.0306)
>bayes 0.7950 (0.0441)
>stacking 0.9583 (0.0240)

Stacking Ensemble Machine Learning in Python

2. Regression:

The make_regression() function can be helpful here as this will create a regression problem.

from sklearn.datasets import make_regression
X, Y = make_regression(n_samples=1500, n_features=30, n_informative=20, noise=0.1, random_state=1)

Next, predicting models using various algorithms. Algorithms are:

  • Support Vector Regression.
  • Decision Tree.
  • k-Nearest Neighbors.

Create a model you wish to evaluate using the new_models() function.

def new_models():
  models = dict()
  models['knn'] = KNeighborsRegressor()
  models['cart'] = DecisionTreeRegressor()
  models['svm'] = SVR()
  models['stacking'] = new_stacking()
  return models

Next, combining models into one mode with the help of stacking. Now, combining predictions from each of the separate three models using linear regression.

def new_stacking():
  level0 = list()
  level0.append(('knn', KNeighborsRegressor()))
  level0.append(('cart', DecisionTreeRegressor()))
  level0.append(('svm', SVR()))
  level1 = LinearRegression()
  model = StackingRegressor(estimators=level0, final_estimator=level1, cv=5)
  return model

Using evaluate_model() function obtain a list of scores.

def eval_model(model):
  cv = RepeatedKFold(n_splits=20, n_repeats=4, random_state=2)
  scr = cross_val_score(model, X, Y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-2, error_score='raise')
  return scr

The mean absolute error (MAE) can be used here. The scikit-learn library maximizes, from -infinity to 0 for the best score to invert the sign of an error.

Driver code.

X, Y = new_dataset()
models = new_models()
results, names = list(), list()
for name, model in models.items():
  scr = evaluate_model(model)
  results.append(scr)
  names.append(name)
  print('>%s %.4f (%.4f)' % (name, mean(scr), std(scr)))
pyplot.boxplot(results, labels=names, showmeans=True)
pyplot.show()

Output:

>knn -140.1542 (12.8824)
>cart -198.4270 (17.8205)
>svm -200.7247 (18.1583)
>stacking -71.0506 (6.6014)

Stacking for Classification & Regression in Python

Hope you had fun learning with me. Have a good day and happy learning.

Leave a Reply

Your email address will not be published. Required fields are marked *