Sorting Data Frame objects in Python

In this tutorial, we will be going to discuss sorting data frames in the pandas library in Python. So basically what is a data frame?

A data frame is a two-dimensional representation of data organized in the form of rows and columns. A data frame can be created by using the pandas.DataFrame() method of the pandas package. For example,

import pandas 
my_data = {'Name':['Sachin', 'Sourabh', 'Subhojeet', 'Anirudh', 
            'Vedant', 'Abhishek', 'Shivam']}
df = pandas.DataFrame(my_data)
print(df)
print(type(df))

Output:

        Name
0     Sachin
1    Sourabh
2  Subhojeet
3    Anirudh
4     Vedant
5   Abhishek
6     Shivam
<class 'pandas.core.frame.DataFrame'>

Here we have created a data frame object of the data of a group of people. You can see the type of data frame object created.

Sorting DataFrame object in Python

Now let’s take a look at how to sort the data frame object. For sorting the data frame we use pandas.DataFrame.sort() method. Pandas sort_values() function sorts values in required order(either ascending or descending).

Syntax: DataFrame.sort_values(by, axis , ascending , inplace , kind , na_position)
  • by -> name of the column/columns to be sorted.
  • axis -> determines the axis to be sorted. Default: 0
  • ascending -> boolean value. If true sorts the given data frame in ascending order otherwise in descending order. Default: True
  • inplace -> boolean value. If true sorts the given data frame in place otherwise not in place. Default: False.
  • kind -> It determines the type of sorting technique used. It can take quicksort, heapsort, mergesort as the argument. Default: quicksort
  • na_position -> If first It puts all NaN’s in first. If last puts all the NaN’s in last.

Let’s first import our dataset into the program.

import pandas 
my_data = pandas.read_excel("Cricket World Cup Winners.xlsx")  
my_data
Year Host Venue for Final Team-1 Team-2 Winner Margin
0 1975 England Lord’s WI Aus WI 17 runs
1 1979 England Lord’s WI Eng WI 92 runs
2 1983 England Lord’s Ind WI Ind 43 runs
3 1987 India Kolkata Aus Eng Aus 7 runs
4 1992 Australia, New Zealand Melbourne Pak Eng Pak 22 runs
5 1996 India, Pakistan, Sri Lanka Lahore (Gdffi) Aus SL SL 7 wickets
6 1999 England Lord’s Pak Aus Aus 8 wickets
7 2003 South Africa Wanderers Aus Ind Aus 125 runs
8 2007 West Indies Bridgetown Aus SL Aus 53 runs
9 2011 India, Pakistan, Sri Lanka, Bangladesh Wankhede SL Ind Ind 6 wickets
10 2015 Australia, New Zealand Melbourne NZ Aus Aus 7 wickets

Here is our data set which consists of all the world cup winners of cricket. Download the excel file here cricket
Now we can use the Dataframe.sort_values method to sort a particular column. For instance, here I have sorted the hostname columns in ascending order.

import pandas 
my_data = pandas.read_excel("Cricket World Cup Winners.xlsx")  
my_data.sort_values("Host", axis = 0, ascending = True,inplace = True, na_position ='last') 
print(my_data)
    Year                                    Host   Venue for Final Team-1  Team-2  Winner    Margin
4   1992                  Australia, New Zealand       Melbourne    Pak    Eng     Pak       22 runs   
10  2015                  Australia, New Zealand       Melbourne    NZ     Aus     Aus       7 wickets
0   1975                                 England          Lord's    WI     Aus     WI        17 runs
1   1979                                 England          Lord's    WI     Eng     WI        92 runs
2   1983                                 England          Lord's    Ind    WI      Ind       43 runs
6   1999                                 England          Lord's    Pak    Aus     Aus       8 wickets
3   1987                                   India         Kolkata    Aus    Ind     Aus       7 runs
5   1996              India, Pakistan, Sri Lanka  Lahore (Gdffi)    Aus    SL      SL        7 wickets
9   2011  India, Pakistan, Sri Lanka, Bangladesh        Wankhede    SL     Ind     Ind       6 wickets
7   2003                            South Africa       Wanderers    Aus    Ind     Aus       125 runs
8   2007                             West Indies      Bridgetown    Aus    SL      Aus       53 runs

Here you can see the Host column is sorted in ascending order.

You can also sort two multiple columns simultaneously.

Also read:

JSON to Pandas DataFrame in Python

Python list into a Pandas DataFrame

Leave a Reply

Your email address will not be published. Required fields are marked *