Binning or Bucketing of column in pandas using Python

In this article, we will study binning or bucketing of column in pandas using Python. Well before starting with this, we should be aware of the concept of “Binning”.

What is Binning?

Binning is grouping values together into bins. Let’s understand this using an example. We have scores of 10 students as 35, 46, 89, 20, 58, 99, 74, 60, 18, 81. Our task is to make 3 teams. Team 1 will have students with score between 1-40, Team 2 will have students with score between 41-80 and Team 3 will have students with score between 81-100.

Binning or Bucketing of column in pandas using Python

Hence, we are making groups of students based on their scores.

Binning of column in pandas

Let us now understand how binning or bucketing of column in pandas using Python takes place. For this, let us create a DataFrame. To create a DataFrame, we need to import Pandas. Look at the following code:

import pandas as pd

data = {'Name':['Rani','Teju','Vihaan','Ritesh','Yash','Rupesh','Sneha','Smita','Roshan','Bhushan','Rupali'],
           'Age' :[23,56,4,17,3,67,10,13,8,52,78]}

df = pd.DataFrame(data)

print(df)

OUTPUT

Name
Age
0
Rani
23
1
Teju
56
2
Vihaan
4
3
Ritesh
17
4
Yash
3
5
Rupesh
67
6
Sneha
10
7
Smita
13
8
Roshan
8
9
Bhushan
52
10
Rupali
78

We have created DataFrame which contains Name of the Person along with its age. Now we are going to classify them into one these categories “Child”,”Adolescence”,”Adult”,”Senior Adult” based on their Age.

This can be done with the help of Binning concept.

Let us first create “bins”. This will have values using which we will categorize the person. Look at the following code:

bins = [0,12,18,59,100]

Here, 0-12 represents one group, 13-18 another group and so on.

Let us now create “category”. Look at the following code:

category = ['Child','Adolescence','Adult','Senior Adult']

This means person with age between 0-12 will fall in the category of “Child”, person with age between 13-18 will be labeled as “Adolescence” and so on.

Let us now categorize our data. Look at the following code:

df['Category'] = pd.cut(df["Age"],bins,labels = category)

Here, pd stands for Pandas.

The “cut” is used to segment the data into the bins.

It takes the column of the DataFrame on which we have perform bin function. In this case, ” df[“Age”] ” is that column.

The “labels = category” is the name of category which we want to assign to the Person with Ages in bins.

Since, we want this in a new column we have ” df[‘Category’] “.

It’s output is as follow:

Name
Age
Category
0
Rani
23
Adult
1
Teju
56
Adult
2
Vihaan
4
Child
3
Ritesh
17
Adolescence
4
Yash
3
Child
5
Rupesh
67
Senior Adult
6
Sneha
10
Child
7
Smita
13
Adolescence
8
Roshan
8
Child
9
Bhushan
52
Adult
10
Rupali
78
Senior Adult

Hence, we have grouped the data using Binning.

Thank You.

You may also read: How to convert JSON to Pandas DataFrame in Python?

Leave a Reply

Your email address will not be published. Required fields are marked *