Clustering in Machine Learning
In this article, we are going to study clustering which is used in machine learning.
So, what is a cluster? A very basic answer to the question is, cluster means a group.
In clustering, we actually group the data to get some actionable insight. To understand it more clearly and easily we will look at the following example:
Suppose, there are two variables and the data from those variables provide us the following scatter plot:
Now, here we can clearly see three groups, that are as follows:
These three groups are actually the result of clustering that we can see clearly and this is done as we can see the distance between each point. For example, the distance among the points in the green rectangle is much less than their distance with any point in the blue or red rectangle. The same applies to the points in the blue and red rectangle. Hence, we get three groups.
While doing the clustering, there are some steps involved in it. These steps help us in forming the clusters from the data points we get.
- Selecting the number of clusters.
- Placing the centroids.
- Assigning each of the points to the nearest centroid.
- Changing the position of the centroids and finding out the new groups/ clusters.
- Following step 3 and 4 and stop when there is no change in the groups/ clusters after reassigning we can see.
Practical Example of Clustering:
Suppose, you are playing a game in which you have to group the things provided into two groups as quickly as possible to win. You have not been given any particular criteria on which you have to differentiate. Therefore, it is up to you on what basis you want to group them.
The items provided are as follows:
You may do it according to the shape of the object, one having the corners and the other as not having the corner, in the following way:
But you may also decide to group them according to the color of the object, as follows:
You see it is up to you how you have to group/ cluster these items as you have not been given any instruction on how to divide or make groups. So, this is a kind of unsupervised learning.