Pandas Groupby Sort In Python

In this tutorial, we are going to learn about sorting in groupby in Python Pandas library. Firstly, we need to install Pandas in our PC. To install Pandas type following command in your Command Prompt.

 

pip install pandas

To do this program we need to import the Pandas module in our code. Moreover, we should also create a DataFrame or import a dataFrame in our program to do the task. Let us know what is groupby function in Pandas.

Groupby in Pandas

In Pandas Groupby function groups elements of similar categories. We can also apply various functions to those groups. Grouping is a simple concept so it is used widely in the Data Science projects. Groupby concept is important because it makes the code magnificent simultaneously makes the performance of the code efficient and aggregates the data efficiently. Let us see an example on groupby function.

Example:-

import pandas as pd
df = pd.DataFrame(
    [['Sachin', 'India', 46, 100],
     ['Dhoni', 'India', 31, 16],
     ['Kohli', 'India', 31, 70],
     ['Kane', 'New Zealand', 29, 34],
     ['Watson', 'Australia', 38, 14],
     ['Warner', 'Australia', 33, 43],
     ['Ben Stokes', 'England', 28, 12],
     ['Kevin Pietersen', 'England', 39, 32],
     ['Dwayne Bravo', 'West Indies', 36, 5]],
    index=[0, 1, 2, 3, 4, 5, 6, 7, 8],
    columns=['Name', 'Country', 'Age', 'Centuries']
)
a = df.groupby('Country')
print(a.groups)

In the above example, I’ve created a Pandas dataframe and grouped the data according to the countries and printing it. As a result, we will get the following output.

Output:-

{'Australia': Int64Index([4, 5], dtype='int64'), 
'England': Int64Index([6, 7], dtype='int64'), 
'India': Int64Index([0, 1, 2], dtype='int64'), 
'New Zealand': Int64Index([3], dtype='int64'), 
'West Indies': Int64Index([8], dtype='int64')}

It seems like, the output contains the datatype and indexes of the items. But we can’t get the data in the data in the dataframe. To get sorted data as output we use for loop as iterable for extracting the data.

Sorting Groupby:-

Here we are sorting the data grouped using age.

import pandas as pd
df = pd.DataFrame(
    [['Sachin', 'India', 46, 100],
     ['Dhoni', 'India', 31, 16],
     ['Kohli', 'India', 31, 70],
     ['Kane', 'New Zealand', 29, 34],
     ['Watson', 'Australia', 38, 14],
     ['Warner', 'Australia', 33, 43],
     ['Ben Stokes', 'England', 28, 12],
     ['Kevin Pietersen', 'England', 39, 32],
     ['Dwayne Bravo', 'West Indies', 36, 5]],
    index=[0, 1, 2, 3, 4, 5, 6, 7, 8],
    columns=['Name', 'Country', 'Age', 'Centuries']
)
a = df.sort_values(['Age']).groupby(['Age'], sort=False)
for name, group in a:
    print(name)
    print(group)

Output:-

28
Name Country Age Centuries
6 Ben Stokes England 28 12
29
Name Country Age Centuries
3 Kane New Zealand 29 34
31
Name Country Age Centuries
1 Dhoni India 31 16
2 Kohli India 31 70
33
Name Country Age Centuries
5 Warner Australia 33 43
36
Name Country Age Centuries
8 Dwayne Bravo West Indies 36 5
38
Name Country Age Centuries
4 Watson Australia 38 14
39
Name Country Age Centuries
7 Kevin Pietersen England 39 32
46
Name Country Age Centuries
0 Sachin India 46 100

 

As a result, we are getting the data grouped with age as output. In the above program sort_values function is used to sort the groups. It takes the column names as input. Therefore it sorts the values according to the column.

Also, read: Python Drop Rows and Columns in Pandas

Finally, In the above output, we are getting some numbers as a result, before the columns of the data. These numbers are the names of the age groups.

Leave a Reply

Your email address will not be published. Required fields are marked *