SVM Parameter Tuning using GridSearchCV in Python
In this tutorial, we learn about SVM model, its hyper-parameters, and tuning hyper-parameters using GridSearchCV for precision.
Support Vector Machine algorithm is explained with and without parameter tuning. As an example, we take the Breast Cancer dataset. Meanwhile, we use Scikit Learn library to import GridSearchCV, which takes care of all the hard work.
Also, panda DataFrame is used for loading data and for preprocessing model train_test_split .
SVM Parameter Tuning with GridSearchCV – scikit-learn
Firstly to make predictions with SVM for sparse data, it must have been fit on the dataset.
To know more about SVM,
Secondly, tuning or hyperparameter optimization is a task to choose the right set of optimal hyperparameters. There are two parameters for a kernel SVM namely C and gamma.
To read more about the construction of ParameterGrid, click here.
Our objective is to read the dataset and predict whether the cancer is ‘benign‘ or ‘malignant‘.
Example of SVM Parameter Tuning
Meanwhile, download the required Breast cancer dataset from Kaggle, that is used for code. Dataset.
#Importing libraries and loading data into pandas dataframe import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score df = pd.read_csv('BreastCancer.csv')
Now we decide our feature variables and the target variable.
df.info() X = df.iloc[:, 2:31].values Y = df.iloc[:, 1].values
Here we can see our target variable ‘Y’ is of ‘object’ data type. Before proceeding, we convert categorical data to numeric using LabelEncoder model. Thereby, benign reads as ‘0’ and malignant as ‘1’.
#Encoding categorical data values from sklearn.preprocessing import LabelEncoder labelencoder_Y = LabelEncoder() Y = labelencoder_Y.fit_transform(Y)
#splitting the data into training set and test set X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.3, random_state = 4) #applying Support Vector Classifier #fitting kernel SVM to training dataset from sklearn.svm import SVC classifier_df = SVC(kernel = 'linear' , random_state = 0) classifier_df.fit(X_train,Y_train) #predicting test data result Y_pred = classifier_df.predict(X_test)
#setting up accuracy score acc = accuracy_score(Y_test,Y_pred) *100 print("Accuracy for our dataset in predicting test data is : {:.2f}%".format(acc))
Output : Accuracy for our dataset in predicting test data is: 94.73%
As a result, we obtain the accuracy of our test dataset without Tuning. Let us now tune our data by setting hyper-parameters coupled with GridSearchCV.
#applying Gridsearchcv to find the best model from sklearn.model_selection import GridSearchCV parameters = [{'C': [1,10,100], 'kernel': ['linear']}] grid_search = GridSearchCV(estimator= classifier_df, param_grid = parameters, scoring = 'accuracy',cv = 10) grid_search = grid_search.fit(X_train, Y_train)
Generally, we use the attribute best_score_ as a scoring parameter.
accuracy = grid_search.best_score_ *100
print("Accuracy for our dataset with tuning is : {:.2f}%".format(accuracy) )
Output : Accuracy for our dataset with tuning is : 95.23%
Observation
Hence we can see an increase in our accuracy after model tuning with GridsearchCV from 94.73% to 95.23%.
Also read,
Leave a Reply