Explain KNearestNeighbours in Machine Learing in Python with examples

In this article, we will learn together the overview of the K-Nearest Neighbors (KNN) algorithm and understand the step by step implementation using KNearest Neighbors(KNN) algorithm in Python.

K-Nearest is instance-based on lazy learning method off classification. Simplest of machine learning algorithms. It provides a classification based on the distances of the labeled data w.r.t the unlabeled.

For measuring distances KNN use Euclidean distance formula i.e,

Euclidean distance formula

Therefore, the larger k-value means the resulting curves for different complex models. Whereas, small k values ​​tend to over-fit the data and result in complex models.

KNN

Note: Having the correct k-value is very important when analyzing a data-set to avoid over-fitting and under-fitting of the data-set.

Iris-Flower Classification is the best example of this algorithm.

#Importing important libraries

from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import numpy as np

iris=load_iris()
print(iris.keys())
print(iris.data)
features=iris.data.T
sepal_length=features[0]
sepal_width=features[1]
petal_length=features[2]
petal_width=features[3]

sepal_length_label=iris.feature_names[0]
sepal_width_label=iris.feature_names[1]
petal_length_label=iris.feature_names[2]
petal_width_label=iris.feature_names[3]
plt.scatter(sepal_length,sepal_width,c=iris.target)
plt.xlabel(sepal_length_label)
plt.ylabel(sepal_width_label)
plt.show()

O/P Scatter plot is given below:

KNN scatterplot

Now you know everything about the dataset so it’s time to fit the train data by using the ‘fit()’ method.

After that, we will determine the train and test accuracy by using the ‘accuracy score()’ method. One thing can catch your attention here is that we are using k =1. You can vary the value of k and see the change in the result but the value of ‘K’ should be odd for better precision.

from sklearn.neighbors import KNeighborsClassifier

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(iris['data'],iris['target'],random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(x_train,y_train)

x_new=np.array([[5.0,2.9,1.0,0.2]])
prediction=knn.predict(x_new)

print("Predicted value is ",prediction)
print("KNN Score will be"),

print(knn.score(x_test,y_test))
Output:   Predicted value is [0]

Predicted value [0] means this will fall into [0] class.

KNN Score will be 0.9736842105263158

Which means the accuracy of 97.3%.

Also read: Classification of IRIS flower

Leave a Reply