pandas.DataFrame.sort_values in Python

In Data Science, Pandas has now become a great tool for handling an extremely huge amount of data with ease like any array. It is often needed to sort data for analysis. Though iterating through the rows of the data set and sorting them is possible, it might take time for large data sets. Pandas DataFrame object has a method called sort_values that allows sorting data in the way it is needed.

The sort_values() method

The sort_values() method of Pandas DataFrame has the signature,

DataFrame.sort_values(self, by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False)

Arguments


self
  • The DataFrame instance on which this method has to be invoked.

axis
  • The axis on which sorting has to be performed.
  • Possible values: {‘index’,0,’columns’,1}
  • If axis is 0 or ‘index’, Pandas sorts the data column-wise.
  • If axis is 1 or ‘columns’, Pandas sorts the data row-wise.
  • It defaults to 0.

by
  • It tells Pandas the ‘columns’ or ‘rows’ based on which sort should be performed.
  • If axis is 1, by may take an index label(or a list of index labels) or a column level(or a list of column levels) as values.
  • If axis is 0, by may take an index level(or a list of index levels) or a column label(or a list of column labels) as values.
  • This argument is required.

ascending
  • Boolean that says the order in which data has to be sorted.
  • Possible values: a boolean(either True or False) or a list of boolean values.
  • It can be a single value.
  • If by is a list, then the value of this argument can also be a list of boolean values of length equal to the length of the value of the argument by.
  • Defaults to True.

inplace
  • Boolean that says how to return the sorted data.
  • If True, the data is sorted in-place.
  • If False, a copy of the data is sorted and returned.
  • Defaults to False.

kind
  • Sorting algorithm to follow.
  • Possible values: {‘quicksort’,’mergesort’,’heapsort’}.
  • Defaults to quicksort.

na_position
  • It tells Pandas the position to keep the NaN values, if any, after sorting.
  • Possible values: {‘first’,’last’}
  • If ‘first’, NaNs are placed at the beginning, at the end otherwise.
  • Defaults to last

ignore_index
  • Boolean value that says Pandas whether or not to keep index values
  • If True, the resulting data will contain 0,1…n-1 as the index, the same index is retained otherwise.
  • Defaults to False

Example

For example, consider the dataset,

              0       1      2       3        4       5      6
column1  Violet  Indigo   Blue   Green      NaN  Orange    Red
column2       0       1      2       3        4       5      6
column3   Table   Chair  Phone  Laptop  Desktop  Tablet  Bench
>>>import pandas as pd
>>>import numpy as np
>>>col3 = ['Table',' Chair', 'Phone', 'Laptop', 'Desktop', 'Tablet',' Bench']
>>>col2 = [0, 1, 2, 3, 4, 5, 6]
>>>col1 = [ 'Violet', 'Indigo', 'Blue', 'Green', np.NaN, 'Orange', 'Red']
>>>df = pd.DataFrame({'column1':col1,'column2':col2,'column3':col3})
>>> df
  column1  column2  column3
0  Violet        0    Table
1  Indigo        1    Chair
2    Blue        2    Phone
3   Green        3   Laptop
4     NaN        4  Desktop
5  Orange        5   Tablet
6     Red        6    Bench
>>> df.sort_values(['column1'])
  column1  column2  column3
2    Blue        2    Phone
3   Green        3   Laptop
1  Indigo        1    Chair
5  Orange        5   Tablet
6     Red        6    Bench
0  Violet        0    Table
4     NaN        4  Desktop

The data are sorted in ascending order based on the values in ‘column1’. If NaN has to appear at the top, set na_position='first'

>>> df.sort_values(['column1'],na_position='first')
  column1  column2  column3
4     NaN        4  Desktop
2    Blue        2    Phone
3   Green        3   Laptop
1  Indigo        1    Chair
5  Orange        5   Tablet
6     Red        6    Bench
0  Violet        0    Table

Setting ascending=False,

>>> df.sort_values(['column1'],na_position='first',ascending=False)
  column1  column2  column3
4     NaN        4  Desktop
0  Violet        0    Table
6     Red        6    Bench
5  Orange        5   Tablet
1  Indigo        1    Chair
3   Green        3   Laptop
2    Blue        2    Phone

The data are sorted alphabetically in descending order based on the values in ‘column1’. Note that the NaN value is retained at the top because na_position is set to ‘first’, otherwise,  NaN value will be at the bottom,

>>> df.sort_values(['column1'],ascending=False)
  column1  column2  column3
0  Violet        0    Table
6     Red        6    Bench
5  Orange        5   Tablet
1  Indigo        1    Chair
3   Green        3   Laptop
2    Blue        2    Phone
4     NaN        4  Desktop

Changing the value of the argument kind will not affect small datasets. They all will give the same result as before,

>>> df.sort_values(['column1'],kind='heapsort')
  column1  column2  column3
2    Blue        2    Phone
3   Green        3   Laptop
1  Indigo        1    Chair
5  Orange        5   Tablet
6     Red        6    Bench
0  Violet        0    Table
4     NaN        4  Desktop

>>> df.sort_values(['column1'],kind='mergesort')
  column1  column2  column3
2    Blue        2    Phone
3   Green        3   Laptop
1  Indigo        1    Chair
5  Orange        5   Tablet
6     Red        6    Bench
0  Violet        0    Table
4     NaN        4  Desktop

So far, the axis was set to default(0 or ‘index’). To be able to understand the effect of changing the axis to 1, change the index with the set_index() method to ‘column2’. The method set_index can also set the index of the data set to one of the columns in the data set.

>>> df.set_index('column2')
        column1  column3
column2
0        Violet    Table
1        Indigo    Chair
2          Blue    Phone
3         Green   Laptop
4           NaN  Desktop
5        Orange   Tablet
6           Red    Bench

If the data are sorted with index value 1 and axis 1,

>>> df.set_index('column2').sort_values([1],axis=1)
         column3 column1
column2
0          Table  Violet
1          Chair  Indigo
2          Phone    Blue
3         Laptop   Green
4        Desktop     NaN
5         Tablet  Orange
6          Bench     Red

Previously, when the axis was 0, and when the data were sorted, the rows of the data changed accordingly. Now when the data is sorted with axis=1, the columns of the data change based on the values in the column. The data is sorted based on the row with index 1. Note the difference before and after the sort. This is similar to sorting the transpose of the data with axis=0. In the examples above, when the data were sorted with axis=0, the indices also changed along with the data. Setting the value of ignore_index to True, the index values can be retained as such.

>>> df.sort_values(['column1'],ignore_index=True)
  column1 column2  column3
0    Blue       2    Phone
1   Green       3   Laptop
2  Indigo       1    Chair
3  Orange       5   Tablet
4     Red       6    Bench
5  Violet       0    Table
6     NaN       4  Desktop

Otherwise,

>>> df.sort_values(['column1'],ignore_index=False)
  column1 column2  column3
2    Blue       2    Phone
3   Green       3   Laptop
1  Indigo       1    Chair
5  Orange       5   Tablet
6     Red       6    Bench
0  Violet       0    Table
4     NaN       4  Desktop

Note the difference between the indices of the above two examples.

So far, the value of the argument inplace was set to False. So the Python interpreter printed the data frame that was sorted and returned by the method sort_values. If the value of inplace is set to True, the method will no longer return the sorted data. Instead, it will sort the data and store it in the same object.

>>> df.sort_values(['column1'],inplace=True)
>>> df
  column1 column2  column3
2    Blue       2    Phone
3   Green       3   Laptop
1  Indigo       1    Chair
5  Orange       5   Tablet
6     Red       6    Bench
0  Violet       0    Table
4     NaN       4  Desktop

Notice that after the execution of the statement, the DataFrame is not printed.

Leave a Reply

Your email address will not be published.