Terrorism Detection and Classification using kNN Algorithm
KNN or k-Nearest Neighbour algorithm is one of the most basic and popular algorithms of machine learning. It finds its application in fields like image processing, web data mining, etc.
Unlike some algorithms, it does not assume any data distribution. We use some training data and distribute them into groups based on the attributes.
We can see the example of the kNN Algorithm in the following graphs. In a kNN algorithm, we take a reference and try to find a centroid that can be termed as the nearest neighbor.
image source: https://www.datacamp.com/tutorial/k-nearest-neighbor-classification-scikit-learn
Motivation
When we plot the graph for the points, we observe the formation of groups or clusters in some regions. This is used to form a relationship between the attributes. These groups can be used to find the nearest neighbor for the individual groups.
Process
Step 1: Store the training values in an array made up of data points arr[]
.
Step 2: for i=0 to m:
Calculate Euclidean distance d(arr[i], p)
Step 3: Make a set for the k smallest distances.
Step 4: Return the maximum value from the set.
Let us try to implement this concept using a sample program,
import math def classifyAPoint(points,p,k=3): distance=[] for group in points: for feature in points[group]: euclidean_distance = math.sqrt((feature[0]-p[0])**2 +(feature[1]-p[1])**2) distance.append((euclidean_distance,group)) distance = sorted(distance)[:k] freq1 = 0 #frequency of group 0 freq2 = 0 #frequency og group 1 for d in distance: if d[1] == 0: freq1 += 1 elif d[1] == 1: freq2 += 1 return 0 if freq1>freq2 else 1 def main(): points = {0:[(1,12),(2,5),(3,6),(3,10),(3.5,8),(2,11),(2,9),(1,7)], 1:[(5,3),(3,2),(1.5,9),(7,2),(6,1),(3.8,1),(5.6,4),(4,2),(2,5)]} p = (2.5,7) k = 3 print("The value classified to the unknown point is: {}".\ format(classifyAPoint(points,p,k))) if __name__ == '__main__': main()
Output:
The value classified to the unknown point is 0. Implement this algorithm on the Global Terrorism Database(GTD) for the required result.
I hope you have understood the concept. For any clarifications and suggestions comment down below.
Leave a Reply