GridSearchCV in Scikit-learn

In this article, we see how to implement a grid search using GridSearchCV of the Sklearn library in Python. The solution comprises of usage of hyperparameter tuning.
However, Grid search is used for making ‘accurate‘ predictions.

GridSearchCV

Grid search is the process of performing parameter tuning to determine the optimal values for a given model. Whenever we want to impose an ML model,  we make use of GridSearchCV, to automate this process and make life a little bit easier for ML enthusiasts.

Model using GridSearchCV

Here’s a python implementation of grid search on Breast Cancer dataset.

Download the dataset required for our ML model.

  1.  Import the dataset and read the first 5 columns.
    import pandas as pd
    
    df = pd.read_csv('../DataSets/BreastCancer.csv')
    
    df.head()

    Output:
    Model using GridSearchCV
    The ‘diagnosis‘ column in the dataset has one of two possible classes: benign (represented by 0) and malignant (represented by 1).  The few attributes shown above will be used for our predictions.

  2. Renaming the class values as ‘0’(benign) and ‘1’(malignant).
#Encoding categorical data values

from sklearn.preprocessing import LabelEncoder
labelencoder_Y = LabelEncoder()
Y = labelencoder_Y.fit_transform(Y)

df['diagnosis'].value_counts()

Output:

GridSearchCV in Scikit-learn
There are 357 benign and 212 malignant cases.

 

3. Let us now define our attributes and target variable. Further, save it to ‘X’ and ‘Y’.

X = df.iloc[:, 2:31].values
Y = df.iloc[:, 1].values

 

4.  Performing train test split.

 

from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.3, random_state = 4)

5.  Let us now prepare the preprocessing model for our dataset, using StandardScaler.

from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
X_train = ss.fit_transform(X_train)
X_test = ss.transform(X_test)

 

6.     Applying GridSearchCV to find the best model.

from sklearn.model_selection import GridSearchCV
parameters = [{'C': [1,10,100], 'kernel': ['linear']}]
grid_search = GridSearchCV(estimator= classifier_df,
                          param_grid = parameters, scoring = 'accuracy',cv = 10)
grid_search = grid_search.fit(X_train, Y_train)

 

7.  Calculate the accuracy score for this model.

accuracy = grid_search.best_score_

print("The accuracy ffor predicting test data for our model is : {0}% ".format(accuracy))

Output:  The accuracy for predicting test data for our model is: 94.234%

 

Also read,

Decision Tree Regression using Sci-kit learn

Leave a Reply

Your email address will not be published. Required fields are marked *