Sorting Data Frame objects in Python
In this tutorial, we will be going to discuss sorting data frames in the pandas library in Python. So basically what is a data frame?
A data frame is a two-dimensional representation of data organized in the form of rows and columns. A data frame can be created by using the pandas.DataFrame() method of the pandas package. For example,
import pandas my_data = {'Name':['Sachin', 'Sourabh', 'Subhojeet', 'Anirudh', 'Vedant', 'Abhishek', 'Shivam']} df = pandas.DataFrame(my_data) print(df) print(type(df))
Output:
Name 0 Sachin 1 Sourabh 2 Subhojeet 3 Anirudh 4 Vedant 5 Abhishek 6 Shivam
<class 'pandas.core.frame.DataFrame'>
Here we have created a data frame object of the data of a group of people. You can see the type of data frame object created.
Sorting DataFrame object in Python
Now let’s take a look at how to sort the data frame object. For sorting the data frame we use pandas.DataFrame.sort() method. Pandas sort_values() function sorts values in required order(either ascending or descending).
Syntax: DataFrame.sort_values(by, axis , ascending , inplace , kind , na_position)
- by -> name of the column/columns to be sorted.
- axis -> determines the axis to be sorted. Default: 0
- ascending -> boolean value. If true sorts the given data frame in ascending order otherwise in descending order. Default: True
- inplace -> boolean value. If true sorts the given data frame in place otherwise not in place. Default: False.
- kind -> It determines the type of sorting technique used. It can take quicksort, heapsort, mergesort as the argument. Default: quicksort
- na_position -> If first It puts all NaN’s in first. If last puts all the NaN’s in last.
Let’s first import our dataset into the program.
import pandas my_data = pandas.read_excel("Cricket World Cup Winners.xlsx") my_data
Year | Host | Venue for Final | Team-1 | Team-2 | Winner | Margin | |
---|---|---|---|---|---|---|---|
0 | 1975 | England | Lord’s | WI | Aus | WI | 17 runs |
1 | 1979 | England | Lord’s | WI | Eng | WI | 92 runs |
2 | 1983 | England | Lord’s | Ind | WI | Ind | 43 runs |
3 | 1987 | India | Kolkata | Aus | Eng | Aus | 7 runs |
4 | 1992 | Australia, New Zealand | Melbourne | Pak | Eng | Pak | 22 runs |
5 | 1996 | India, Pakistan, Sri Lanka | Lahore (Gdffi) | Aus | SL | SL | 7 wickets |
6 | 1999 | England | Lord’s | Pak | Aus | Aus | 8 wickets |
7 | 2003 | South Africa | Wanderers | Aus | Ind | Aus | 125 runs |
8 | 2007 | West Indies | Bridgetown | Aus | SL | Aus | 53 runs |
9 | 2011 | India, Pakistan, Sri Lanka, Bangladesh | Wankhede | SL | Ind | Ind | 6 wickets |
10 | 2015 | Australia, New Zealand | Melbourne | NZ | Aus | Aus | 7 wickets |
Here is our data set which consists of all the world cup winners of cricket. Download the excel file here cricket
Now we can use the Dataframe.sort_values method to sort a particular column. For instance, here I have sorted the hostname columns in ascending order.
import pandas my_data = pandas.read_excel("Cricket World Cup Winners.xlsx") my_data.sort_values("Host", axis = 0, ascending = True,inplace = True, na_position ='last') print(my_data)
Year Host Venue for Final Team-1 Team-2 Winner Margin 4 1992 Australia, New Zealand Melbourne Pak Eng Pak 22 runs 10 2015 Australia, New Zealand Melbourne NZ Aus Aus 7 wickets 0 1975 England Lord's WI Aus WI 17 runs 1 1979 England Lord's WI Eng WI 92 runs 2 1983 England Lord's Ind WI Ind 43 runs 6 1999 England Lord's Pak Aus Aus 8 wickets 3 1987 India Kolkata Aus Ind Aus 7 runs 5 1996 India, Pakistan, Sri Lanka Lahore (Gdffi) Aus SL SL 7 wickets 9 2011 India, Pakistan, Sri Lanka, Bangladesh Wankhede SL Ind Ind 6 wickets 7 2003 South Africa Wanderers Aus Ind Aus 125 runs 8 2007 West Indies Bridgetown Aus SL Aus 53 runs
Here you can see the Host column is sorted in ascending order.
You can also sort two multiple columns simultaneously.
Also read:
Leave a Reply