GridSearchCV in Scikit-learn
In this article, we see how to implement a grid search using GridSearchCV of the Sklearn library in Python. The solution comprises of usage of hyperparameter tuning.
However, Grid search is used for making ‘accurate‘ predictions.
GridSearchCV
Grid search is the process of performing parameter tuning to determine the optimal values for a given model. Whenever we want to impose an ML model, we make use of GridSearchCV, to automate this process and make life a little bit easier for ML enthusiasts.
Model using GridSearchCV
Here’s a python implementation of grid search on Breast Cancer dataset.
Download the dataset required for our ML model.
- Import the dataset and read the first 5 columns.
import pandas as pd df = pd.read_csv('../DataSets/BreastCancer.csv') df.head()
Output:
The ‘diagnosis‘ column in the dataset has one of two possible classes: benign (represented by 0) and malignant (represented by 1). The few attributes shown above will be used for our predictions. - Renaming the class values as ‘0’(benign) and ‘1’(malignant).
#Encoding categorical data values from sklearn.preprocessing import LabelEncoder labelencoder_Y = LabelEncoder() Y = labelencoder_Y.fit_transform(Y) df['diagnosis'].value_counts()
Output:
There are 357 benign and 212 malignant cases.
3. Let us now define our attributes and target variable. Further, save it to ‘X’ and ‘Y’.
X = df.iloc[:, 2:31].values Y = df.iloc[:, 1].values
4. Performing train test split.
from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.3, random_state = 4)
5. Let us now prepare the preprocessing model for our dataset, using StandardScaler.
from sklearn.preprocessing import StandardScaler ss = StandardScaler() X_train = ss.fit_transform(X_train) X_test = ss.transform(X_test)
6. Applying GridSearchCV to find the best model.
from sklearn.model_selection import GridSearchCV parameters = [{'C': [1,10,100], 'kernel': ['linear']}] grid_search = GridSearchCV(estimator= classifier_df, param_grid = parameters, scoring = 'accuracy',cv = 10) grid_search = grid_search.fit(X_train, Y_train)
7. Calculate the accuracy score for this model.
accuracy = grid_search.best_score_ print("The accuracy ffor predicting test data for our model is : {0}% ".format(accuracy))
Output: The accuracy for predicting test data for our model is: 94.234%
Also read,
Leave a Reply