K – Nearest Neighbor Algorithm (KNN)
K – Nearest Neighbor Algorithm or KNN, as is used commonly, is an algorithm that helps in finding the nearest group or the category that the new one belongs to. It is a supervised learning algorithm, which means, we have already given some labels on the basis of which it will decide the group or the category of the new one.
Supervised learning has been categorized into the following two:
KNN or K – Nearest Neighbor Algorithm comes under the category of Classification. Let us try to understand the concept with an example.
Example of K – Nearest Neighbor Algorithm
Suppose the company has to divide its members into four groups on the basis of their grade and the years of their life that they had given to the company for its progress. So, the company divided them according to these four groups as mentioned in the figure below:
So, here in this figure, on the X-axis we have the number of years and, on the Y-axis we have the grades (where A represents the lower grade and G represents the high grade).
Now there comes a new person who wants to know in which group he has been placed. The position of the person on the graph is shown in the figure below:
So, how will we decide in which group this person belongs?
We will use the KNN Algorithm: K – Nearest – Neighbor. As the name itself is suggesting, we have to find the nearest neighbor of this person or in other words, to whose level that person has performed. We will be going through the following steps in order to find that.
- Assign a number to K: We have to assign a number to K which means we have to decide how many neighbors do we need to measure the person’s distance from.
Suppose we chose K = 2. It means we want to decide the group of that person according to the two neighbors who are closer to him.
- Calculate the distance: If we see the figure below, we find that the two directions in which the groups are, the person is closer to the person in Group 2 (yellow) rather than to the person in Group 3 (blue).
Similarly, if we check the distance with the second neighbor (as we have assigned K = 2), we find that the second person in Group 2 (yellow) is much nearer than the second person in Group 3 (blue).
- Assigning the Group: As we have seen in step 2 that the person is nearer to the 2 persons in Group 2 (yellow) rather than 2 persons in Group 3 (blue), he will be assigned to Group 2.
Mathematical Calculation and Example:
In this section, we will see how to measure KNN mathematically.
Suppose we have been given the following table:
[Denoted by P]
|Temperature (in Celcius)|
[Denoted by T]
(Note: This is just an imaginary data)
We have to find out the day for a particular one, say Z, when Precipitation (P) was 20 percent and Temperature (T) was 40-degree Celcius and we have to check on the basis of nearest 3 calculations.
So, here, the value for K = 3.
We will use the method of Euclidean Distance to measure the distance between the given values and the value for which we have to find out.
Also, read: KNN Classification using Scikit-Learn in Python
The formula for finding the Euclidean distance is:
Now, we will be calculating the distance of Z with the given table one by one.
Therefore, putting up the values in the formula for Z (P = 20, T = 40) we get
As we can see from the calculations, the value of Z, if seen for the nearest 3 calculations, is near to cases 4, 1 and 3 which represents the Sunny Day.
Therefore, the day Z will also be considered as a Sunny Day.