Sorting Data Frame objects in Python

In this tutorial, we will be going to discuss sorting data frames in the pandas library in Python. So basically what is a data frame?

A data frame is a two-dimensional representation of data organized in the form of rows and columns. A data frame can be created by using the pandas.DataFrame() method of the pandas package. For example,

import pandas 
my_data = {'Name':['Sachin', 'Sourabh', 'Subhojeet', 'Anirudh', 
            'Vedant', 'Abhishek', 'Shivam']}
df = pandas.DataFrame(my_data)
print(df)
print(type(df))

Output:

        Name
0     Sachin
1    Sourabh
2  Subhojeet
3    Anirudh
4     Vedant
5   Abhishek
6     Shivam
<class 'pandas.core.frame.DataFrame'>

Here we have created a data frame object of the data of a group of people. You can see the type of data frame object created.

Sorting DataFrame object in Python

Now let’s take a look at how to sort the data frame object. For sorting the data frame we use pandas.DataFrame.sort() method. Pandas sort_values() function sorts values in required order(either ascending or descending).

Syntax: DataFrame.sort_values(by, axis , ascending , inplace , kind , na_position)
  • by -> name of the column/columns to be sorted.
  • axis -> determines the axis to be sorted. Default: 0
  • ascending -> boolean value. If true sorts the given data frame in ascending order otherwise in descending order. Default: True
  • inplace -> boolean value. If true sorts the given data frame in place otherwise not in place. Default: False.
  • kind -> It determines the type of sorting technique used. It can take quicksort, heapsort, mergesort as the argument. Default: quicksort
  • na_position -> If first It puts all NaN’s in first. If last puts all the NaN’s in last.

Let’s first import our dataset into the program.

import pandas 
my_data = pandas.read_excel("Cricket World Cup Winners.xlsx")  
my_data
YearHostVenue for FinalTeam-1Team-2WinnerMargin
01975EnglandLord’sWIAusWI17 runs
11979EnglandLord’sWIEngWI92 runs
21983EnglandLord’sIndWIInd43 runs
31987IndiaKolkataAusEngAus7 runs
41992Australia, New ZealandMelbournePakEngPak22 runs
51996India, Pakistan, Sri LankaLahore (Gdffi)AusSLSL7 wickets
61999EnglandLord’sPakAusAus8 wickets
72003South AfricaWanderersAusIndAus125 runs
82007West IndiesBridgetownAusSLAus53 runs
92011India, Pakistan, Sri Lanka, BangladeshWankhedeSLIndInd6 wickets
102015Australia, New ZealandMelbourneNZAusAus7 wickets

Here is our data set which consists of all the world cup winners of cricket. Download the excel file here cricket
Now we can use the Dataframe.sort_values method to sort a particular column. For instance, here I have sorted the hostname columns in ascending order.

import pandas 
my_data = pandas.read_excel("Cricket World Cup Winners.xlsx")  
my_data.sort_values("Host", axis = 0, ascending = True,inplace = True, na_position ='last') 
print(my_data)
    Year                                    Host   Venue for Final Team-1  Team-2  Winner    Margin
4   1992                  Australia, New Zealand       Melbourne    Pak    Eng     Pak       22 runs   
10  2015                  Australia, New Zealand       Melbourne    NZ     Aus     Aus       7 wickets
0   1975                                 England          Lord's    WI     Aus     WI        17 runs
1   1979                                 England          Lord's    WI     Eng     WI        92 runs
2   1983                                 England          Lord's    Ind    WI      Ind       43 runs
6   1999                                 England          Lord's    Pak    Aus     Aus       8 wickets
3   1987                                   India         Kolkata    Aus    Ind     Aus       7 runs
5   1996              India, Pakistan, Sri Lanka  Lahore (Gdffi)    Aus    SL      SL        7 wickets
9   2011  India, Pakistan, Sri Lanka, Bangladesh        Wankhede    SL     Ind     Ind       6 wickets
7   2003                            South Africa       Wanderers    Aus    Ind     Aus       125 runs
8   2007                             West Indies      Bridgetown    Aus    SL      Aus       53 runs

Here you can see the Host column is sorted in ascending order.

You can also sort two multiple columns simultaneously.

Also read:

JSON to Pandas DataFrame in Python

Python list into a Pandas DataFrame

Leave a Reply

Your email address will not be published. Required fields are marked *