Sequential forward selection with Python and Scikit learn
In this article, we will learn sequential forward selection with Python and Scikit learn.
Introduction: Sequential forward selection
Right now datasets are very complex and with extremely high dimensions. It is really hard to perform any machine learning task on such datasets, but there is key to improve the results. There are so many features available with some helpful tools in machine learning and apply algorithms for better results. Sequential feature selection is one of them. To know it deeply first let us understand the wrappers method.
In this method, the feature selection process is totally based on a greedy search approach. It selects a combination of a feature that will give optimal results for machine learning algorithms.
- Set of all feature
- It considers a subset of feature
- Apply the algorithm
- Gauge the result
- Repeat the process
There are three most commonly used wrapper techniques:
- Forward selection
- Backward elimination
- Bi-directional elimination (also called as step-wise selection)
It fits each individual feature separately. Then make the model where you are actually fitting a particular feature individually with the rate of one at a time. Then it fits a model with two features and tries some earlier features with the minimum p-value. Now it fits three features with two previously selected features. Then we repeat the process again. these are the important steps.
Let us move to the coding part:
First I am showing you with the help of “MLxtend”. It is a very popular library in Python.
For implementing this I am using a normal classifier data and KNN(k_nearest_neighbours) algorithm.
Step1: Import all the libraries and check the data frame.
Step2: Apply some cleaning and scaling if needed.
Step3: Divide the data into train and test with train test split
Code: Sequential forward selection with Python and Scikit learn
#import pandas,numpy for process and seethe dataframe #after step1 and 2 apply this mathod from sklearn.model_selection import train_test_split #dividing with train test split X = df_feat y = df['TARGET CLASS'] X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=101) #for sfs from sklearn.neighbors import KNeighborsClassifier from mlxtend.feature_selection import SequentialFeatureSelector as SFS knn = KNeighborsClassifier(n_neighbors=2) # ml_algo used = knn sfs1 = SFS(knn, k_features=3, forward=True, # if forward = True then SFS otherwise SBS floating=False, verbose=2, scoring='accuracy' ) #after applying sfs fit the data: sfs.fit(X_train, y_train) sfs.k_feature_names_ # to get the final set of features #our sfs part has done here #now towards results
Let me define some keywords which we are using in SFS:
- KNN: It is an estimator for the entire process. You can put any algorithm which you are going to use.
- k_features: Number of features for selection. It is a random value according to your dataset and scores.
- forward: True is a forward selection technique.
- floating = False is a forward selection technique.
- scoring: Specifies the evaluation criterion.
- verbose: Specifies the evaluation criterion.
step 4: Print the results.
There are two methods also available for this you can use them according to your needs.