SVM Parameter Tuning using GridSearchCV in Python

In this tutorial, we learn about SVM model, its hyper-parameters, and tuning hyper-parameters using GridSearchCV for precision.

Support Vector Machine algorithm is explained with and without parameter tuning. As an example, we take the Breast Cancer dataset. Meanwhile, we use Scikit Learn library to import GridSearchCV, which takes care of all the hard work.

Also, panda DataFrame is used for loading data and for preprocessing model train_test_split .

SVM Parameter Tuning with GridSearchCV – scikit-learn

Firstly to make predictions with SVM for sparse data, it must have been fit on the dataset.

To know more about SVM,

Secondly,  tuning or hyperparameter optimization is a task to choose the right set of optimal hyperparameters. There are two parameters for a kernel SVM namely C and gamma.
To read more about the construction of ParameterGrid, click here.

Our objective is to read the dataset and predict whether the cancer is ‘benign‘ or ‘malignant‘.

 

Example of SVM Parameter Tuning

Meanwhile, download the required Breast cancer dataset from Kaggle, that is used for code. Dataset.

#Importing libraries and loading data into pandas dataframe

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


df = pd.read_csv('BreastCancer.csv')

Now we decide our feature variables and the target variable.

df.info()
X = df.iloc[:, 2:31].values
Y = df.iloc[:, 1].values

SVM Parameter Tuning using GridSearchCV in Python scikit-learn

Here we can see our target variable  ‘Y’ is of ‘object’ data type. Before proceeding, we convert categorical data to numeric using LabelEncoder model. Thereby, benign reads as ‘0’ and malignant as ‘1’.

#Encoding categorical data values
from sklearn.preprocessing import LabelEncoder
labelencoder_Y = LabelEncoder()
Y = labelencoder_Y.fit_transform(Y)
#splitting the data into training set and test set

X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.3, random_state = 4)


#applying Support Vector Classifier 
#fitting kernel SVM to training dataset
from sklearn.svm import SVC
classifier_df = SVC(kernel = 'linear' , random_state = 0)
classifier_df.fit(X_train,Y_train)

#predicting test data result
Y_pred = classifier_df.predict(X_test)
#setting up accuracy score

acc = accuracy_score(Y_test,Y_pred) *100
print("Accuracy for our dataset in predicting test data is : {:.2f}%".format(acc))
Output : Accuracy for our dataset in predicting test data is: 94.73%

 

As a result, we obtain the accuracy of our test dataset without Tuning. Let us now tune our data by setting hyper-parameters coupled with GridSearchCV.

#applying Gridsearchcv to find the best model

from sklearn.model_selection import GridSearchCV
parameters = [{'C': [1,10,100], 'kernel': ['linear']}]
grid_search = GridSearchCV(estimator= classifier_df,
                          param_grid = parameters, scoring = 'accuracy',cv = 10)
grid_search = grid_search.fit(X_train, Y_train)

Generally, we use the attribute best_score_ as a scoring parameter.

accuracy = grid_search.best_score_ *100
print("Accuracy for our dataset with tuning is : {:.2f}%".format(accuracy) )
Output : Accuracy for our dataset with tuning is : 95.23%

 

Observation


Hence we can see an increase in our accuracy after model tuning with GridsearchCV from 94.73% to 95.23%.

Also read,

Leave a Reply

Your email address will not be published. Required fields are marked *