Sequential forward selection with Python and Scikit learn

In this article, we will learn sequential forward selection with Python and Scikit learn.

Introduction: Sequential forward selection

Right now datasets are very complex and with extremely high dimensions. It is really hard to perform any machine learning task on such datasets, but there is key to improve the results. There are so many features available with some helpful tools in machine learning and apply algorithms for better results. Sequential feature selection is one of them. To know it deeply first let us understand the wrappers method.

Wrappers Method:

In this method, the feature selection process is totally based on a greedy search approach. It selects a combination of a feature that will give optimal results for machine learning algorithms.

Working process:

  • Set of all feature
  • It considers a subset of feature
  • Apply the algorithm
  • Gauge the result
  • Repeat the process

There are three most commonly used wrapper techniques:

  1. Forward selection
  2. Backward elimination
  3. Bi-directional elimination (also called as step-wise selection)

Forward Selection:

It fits each individual feature separately. Then make the model where you are actually fitting a particular feature individually with the rate of one at a time. Then it fits a model with two features and tries some earlier features with the minimum p-value. Now it fits three features with two previously selected features. Then we repeat the process again. these are the important steps.

Let us move to the coding part:

First I am showing you with the help of “MLxtend”. It is a very popular library in Python.

For implementing this I am using a normal classifier data and KNN(k_nearest_neighbours) algorithm.

Step1: Import all the libraries and check the data frame.

Step2: Apply some cleaning and scaling if needed.

Step3: Divide the data into train and test with train test split

Code: Sequential forward selection with Python and Scikit learn

#import pandas,numpy for process and seethe dataframe
#after step1 and 2 apply this mathod
from sklearn.model_selection import train_test_split
#dividing with train test split
X = df_feat
y = df['TARGET CLASS']

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=101)

#for sfs
from sklearn.neighbors import KNeighborsClassifier
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
knn = KNeighborsClassifier(n_neighbors=2) # ml_algo used = knn
sfs1 = SFS(knn, 
           k_features=3, 
           forward=True, # if forward = True then SFS otherwise SBS
           floating=False, 
           verbose=2,
           scoring='accuracy'
           )

#after applying sfs fit the data:

sfs.fit(X_train, y_train)
sfs.k_feature_names_  
# to get the final set of features
#our sfs part has done here
#now towards results

Let me define some keywords which we are using in SFS:

  1. KNN: It is an estimator for the entire process. You can put any algorithm which you are going to use.
  2. k_features: Number of features for selection. It is a random value according to your dataset and scores.
  3. forward: True is a forward selection technique.
  4. floating = False is a forward selection technique.
  5. scoring: Specifies the evaluation criterion.
  6. verbose: Specifies the evaluation criterion.

step 4: Print the results.

There are two methods also available for this you can use them according to your needs.

Also read:

Leave a Reply