# k-fold Cross-Validation in Machine Learning

Performance estimation is crucial for any model. Cross-validation method is one of the estimation strategies which improves the accuracy of the model. In this tutorial, you will learn how to train the model using k fold cross-validation.

## k fold cross-validation:

Steps involved:

2. Understanding the data
3. User input (value for k)
4. k-fold cross-validation
5. Training the model
6. Accuracy estimation

Working:

• In this method, the dataset is divided into k equal, mutually exclusive folds (D1, D2,.., Dk).
• A series of k runs are carried out with this decomposition by considering Di (ith iteration) as test data and remaining as train data.
• Accuracy is calculated for each iteration and overall accuracy will be their average.

```import pandas as pd
from sklearn.model_selection import KFold
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

Here we are considering breast cancer dataset which can be directly loaded from sklearn.

`cancer_data = load_breast_cancer(as_frame = True)`
```df = cancer_data.frame
X = df.iloc[:,:-1]
y = df.iloc[:,-1]
print(df.columns)```

User Input:
Here, the user needs to enter the value of k:

```print("Enter the value of k")
k = int(input())```
```Enter the value of k
4```

Let’s assume k to be 4

k-fold cross-validation:

`kfold_val = KFold(n_splits=k, random_state=None)`

This helps to divide the dataset into k((i.e) 4) equal and mutually exclusive folds.

Training and estimation:

To classify the data, we are using LogisticRegression as shown,

```lr = LogisticRegression()
accuracy_scores = []
for i , j in kfold_val.split(X):
X_train , X_test = X.iloc[i,:],X.iloc[j,:]
y_train , y_test = y[i] , y[j]
lr.fit(X_train,y_train)
pred = lr.predict(X_test)

accuracy = accuracy_score(pred , y_test)
accuracy_scores.append(accuracy)```
```print("Accuracy score for each folds:")
print(accuracy_scores)```
```Accuracy score for each folds:
[0.916083916083916, 0.9436619718309859, 0.9647887323943662, 0.9295774647887324]```

We got the accuracies for each fold. The final accuracy will be the average of the above accuracies as shown,

```print("Overall accuracy:")
print(sum(accuracy_scores) / k)```
`Overall accuracy: 0.9385280212745001`

In this way, we achieved an accuracy of 93% by using k-fold cross-validation. I hope it might be helpful for you. Thank you!