How to tune Hyperparameters with Python and scikit-learn

Introduction:

Whenever we train a machine learning model with classifier we use some levels to train it for pulling and turning purpose. These values are hyperparameters. We generally use them in KNN classification or in SVM. In the case of other algorithms, the weights, terms and or some particular values who actually change the whole result of the model just by changing that are hyperparameters.

But this is a technical thing let us start with the basics and coding part as well. This blog is going to explain the hyperparameters with the KNN algorithm where the numbers of neighbors are hyperparameters also this blog is telling about two different search methods of hyperparameters and which one to use.

Uses:

Hyperparameters are also defined in neural networks where the number of filters is the hyperparameters. We mostly use hyperparameters in CNN in the case of neural networks.

For example, If we are training a model of cats and dogs images or car and two-wheeler images than we use CNN to train it and there we use hyperparameters. Also when we work with sound data or applying KNN, GMM, SVM type os algorithm we prefer to hyperparameter.

 

How it works?: Here this blog is actually using the KNN algorithm

  • Well, In the KNN algorithm hyperparameters are the number of k_neighbors and similar function for example distance metric
  • There are two searches in hyperparameters grid search and then a randomized search.
  • We define hyperparameter in param dictionary as shown in the code, where we define n_neighbors and metric
  • After that, we can use either the grid search or randomized search hyperparameter to train each and every value.
  • Most of the time grid is expensive and costly.
  • In randomized search hyperparameter we need to wait and as long as we wait the performance will increase but this makes us impatient and also it reduces the iteration numbers.

Coding part:

Check out the code given below:

from imutils import paths
import numpy as np
import imutils
import time
import cv2
import os
from sklearn.neighbors import KNeighborsClassifier
from sklearn.grid_search import RandomizedSearchCV
from sklearn.grid_search import GridSearchCV

#first of all param dictionary:
params = {"n_neighbors": np.arange(1, 31, 2),
  "metric": ["search1", "search2"]}

#second step is for grid search or randomized search:
model = KNeighborsClassifier(n_jobs=args["jobs"])
grid = GridSearchCV(model, params)
start = time.time()
grid.fit(X_train, trainLabels)
accuracy = grid.score(testData, testLabels)

#Always choose one between grid search and randomized search
#randomized search:
grid = RandomizedSearchCV(model, params)
start = time.time()
grid.fit(trainData, trainLabels)
accuracy = grid.score(X_train, testLabels)

In the above code:

–>, First of all, we are importing all the libraries

–>Now here I am showing you with each hyperparameter but in the original training you re supposed to use one of them according to your need

–>In param dictionary we can see n_neighbors which shows the number of neighbors and metric shoes the image you want to search.

–>than we fit this in grid search and wait for the output, it can take time according to the hardware of your machine.

–>it is optional you can also use the randomized search for hyperparameter tuning.

 

Conclusion:

After fitting in the param dictionary we have two choices either go for randomized search or grid search. It totally depends on the user what is needed in that particular condition. Most of the times Users prefer randomized search unless and until the training data is small and can be trained using grid search.

Leave a Reply

Your email address will not be published. Required fields are marked *