K-nearest Neighbors (KNN) Classification Model in Python

K-nearest Neighbors (KNN) is a simple machine learning model. So here I will write a detailed description of the KNN model which will include its brief details, algorithm, code in Python as an example, uses, advantages, and disadvantages.

K-Nearest Neighbors Model

K-Nearest Neighbor algorithm is a supervised learning algorithm. KNN is considered as lazy as well as a non-parametric algorithm. It is considered lazy as it doesn’t have a specialized training phrase. It is non-parametric as it doesn’t assume anything about the underlying data. It makes the selection on the basis of proximity to other data points regardless of what it’s features indicate. In this model, we will able to immediately classify new data points as they represent themselves.

USES: KNN is used in a variety of applications such as statistical representation, pattern recognition, economic forecasting, data compression, genetics, etc.

ALGORITHM:

  1. Pick a value of K.
  2. Take the K nearest neighbors of the new data points according to their distance from the new point of which you want to predict the class. We generally use Euclidean distance.
  3. Among these neighbors, count the number of data points belonging to each category and assign the new point the category with the maximum number of neighbors.

CODE:

We have predefined the KNN model in Python and use it in several machine learning or other classification based projects. Here is an example of how KNN can be used.

Importing modules:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix,classification_report
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer

Feature Selection: Data frame is made using the Pandas library, and then the features and target values are taken in separate variables.

df=load_breast_cancer()
X=df.data
Y=df.target

Split data in train and test set: Most of the data(approx 80% ) is taken as training data to build the model and rest is taken as test data.

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state=5)

Training Model: The model is trained by using the fit function of the KNeighborsClassifier class.

knn_model =KNeighborsClassifier()
knn_model.fit(X_train, Y_train)

Prediction: We can predict the values of the features of X_test by using the function predict.

predicted=knn_model.predict(X_test)

Accuracy: Evaluation of the model can be done by analyzing the confusion matrix or classification report.

print(confusion_matrix(Y_test,predicted))
print(classification_report(Y_test,predicted))

Output:

[[41  7]
 [ 0 66]]
              precision    recall  f1-score   support

           0       1.00      0.85      0.92        48
           1       0.90      1.00      0.95        66

   micro avg       0.94      0.94      0.94       114
   macro avg       0.95      0.93      0.94       114
weighted avg       0.94      0.94      0.94       114

We can analyze we are getting good accuracy. KNN is a model used in various fields such as for recommender system or pattern recognition etc.

I will suggest you all to work on several data sets and also check the accuracy of the model with different attributes.

Leave a Reply

Your email address will not be published. Required fields are marked *