k-fold Cross-Validation in Machine Learning

Post Views: 1,055

Performance estimation is crucial for any model. Cross-validation method is one of the estimation strategies which improves the accuracy of the model. In this tutorial, you will learn how to train the model using k fold cross-validation.

k fold cross-validation:

Steps involved:

Loading packages
Understanding the data
User input (value for k)
k-fold cross-validation
Training the model
Accuracy estimation

Working:

In this method, the dataset is divided into k equal, mutually exclusive folds (D1, D2,.., Dk).
A series of k runs are carried out with this decomposition by considering Di (ith iteration) as test data and remaining as train data.
Accuracy is calculated for each iteration and overall accuracy will be their average.

Loading packages:

import pandas as pd
from sklearn.model_selection import KFold 
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_breast_cancer

Loading the dataset:
Here we are considering breast cancer dataset which can be directly loaded from sklearn.

cancer_data = load_breast_cancer(as_frame = True)

df = cancer_data.frame
X = df.iloc[:,:-1]
y = df.iloc[:,-1]
print(df.columns)

User Input:
Here, the user needs to enter the value of k:

print("Enter the value of k")
k = int(input())

Enter the value of k
4

Let’s assume k to be 4

k-fold cross-validation:

kfold_val = KFold(n_splits=k, random_state=None)

This helps to divide the dataset into k((i.e) 4) equal and mutually exclusive folds.

Training and estimation:

To classify the data, we are using LogisticRegression as shown,

lr = LogisticRegression()
accuracy_scores = []
for i , j in kfold_val.split(X):
    X_train , X_test = X.iloc[i,:],X.iloc[j,:]
    y_train , y_test = y[i] , y[j]
    lr.fit(X_train,y_train)
    pred = lr.predict(X_test)
     
    accuracy = accuracy_score(pred , y_test)
    accuracy_scores.append(accuracy)

print("Accuracy score for each folds:")
print(accuracy_scores)

Accuracy score for each folds:
[0.916083916083916, 0.9436619718309859, 0.9647887323943662, 0.9295774647887324]

We got the accuracies for each fold. The final accuracy will be the average of the above accuracies as shown,

print("Overall accuracy:")
print(sum(accuracy_scores) / k)

Overall accuracy: 0.9385280212745001

In this way, we achieved an accuracy of 93% by using k-fold cross-validation. I hope it might be helpful for you. Thank you!

k-fold Cross-Validation in Machine Learning

k fold cross-validation:

Leave a Reply Cancel reply

Related Posts