pandas.DataFrame.sort_values in Python
In Data Science, Pandas has now become a great tool for handling an extremely huge amount of data with ease like any array. It is often needed to sort data for analysis. Though iterating through the rows of the data set and sorting them is possible, it might take time for large data sets. Pandas DataFrame object has a method called sort_values
that allows sorting data in the way it is needed.
The sort_values()
method
The sort_values()
method of Pandas DataFrame has the signature,
DataFrame.sort_values(self, by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False)
Arguments
self |
|
axis |
|
by |
|
ascending |
|
inplace |
|
kind |
|
na_position |
|
ignore_index |
|
Example
For example, consider the dataset,
0 1 2 3 4 5 6 column1 Violet Indigo Blue Green NaN Orange Red column2 0 1 2 3 4 5 6 column3 Table Chair Phone Laptop Desktop Tablet Bench
>>>import pandas as pd >>>import numpy as np >>>col3 = ['Table',' Chair', 'Phone', 'Laptop', 'Desktop', 'Tablet',' Bench'] >>>col2 = [0, 1, 2, 3, 4, 5, 6] >>>col1 = [ 'Violet', 'Indigo', 'Blue', 'Green', np.NaN, 'Orange', 'Red'] >>>df = pd.DataFrame({'column1':col1,'column2':col2,'column3':col3}) >>> df column1 column2 column3 0 Violet 0 Table 1 Indigo 1 Chair 2 Blue 2 Phone 3 Green 3 Laptop 4 NaN 4 Desktop 5 Orange 5 Tablet 6 Red 6 Bench
>>> df.sort_values(['column1']) column1 column2 column3 2 Blue 2 Phone 3 Green 3 Laptop 1 Indigo 1 Chair 5 Orange 5 Tablet 6 Red 6 Bench 0 Violet 0 Table 4 NaN 4 Desktop
The data are sorted in ascending order based on the values in ‘column1’. If NaN
has to appear at the top, set na_position='first'
>>> df.sort_values(['column1'],na_position='first') column1 column2 column3 4 NaN 4 Desktop 2 Blue 2 Phone 3 Green 3 Laptop 1 Indigo 1 Chair 5 Orange 5 Tablet 6 Red 6 Bench 0 Violet 0 Table
Setting ascending=False
,
>>> df.sort_values(['column1'],na_position='first',ascending=False) column1 column2 column3 4 NaN 4 Desktop 0 Violet 0 Table 6 Red 6 Bench 5 Orange 5 Tablet 1 Indigo 1 Chair 3 Green 3 Laptop 2 Blue 2 Phone
The data are sorted alphabetically in descending order based on the values in ‘column1’. Note that the NaN
value is retained at the top because na_position
is set to ‘first’, otherwise, NaN
value will be at the bottom,
>>> df.sort_values(['column1'],ascending=False) column1 column2 column3 0 Violet 0 Table 6 Red 6 Bench 5 Orange 5 Tablet 1 Indigo 1 Chair 3 Green 3 Laptop 2 Blue 2 Phone 4 NaN 4 Desktop
Changing the value of the argument kind
will not affect small datasets. They all will give the same result as before,
>>> df.sort_values(['column1'],kind='heapsort') column1 column2 column3 2 Blue 2 Phone 3 Green 3 Laptop 1 Indigo 1 Chair 5 Orange 5 Tablet 6 Red 6 Bench 0 Violet 0 Table 4 NaN 4 Desktop >>> df.sort_values(['column1'],kind='mergesort') column1 column2 column3 2 Blue 2 Phone 3 Green 3 Laptop 1 Indigo 1 Chair 5 Orange 5 Tablet 6 Red 6 Bench 0 Violet 0 Table 4 NaN 4 Desktop
So far, the axis
was set to default(0 or ‘index’). To be able to understand the effect of changing the axis
to 1, change the index with the set_index()
method to ‘column2’. The method set_index
can also set the index of the data set to one of the columns in the data set.
>>> df.set_index('column2') column1 column3 column2 0 Violet Table 1 Indigo Chair 2 Blue Phone 3 Green Laptop 4 NaN Desktop 5 Orange Tablet 6 Red Bench
If the data are sorted with index value 1 and axis
1,
>>> df.set_index('column2').sort_values([1],axis=1) column3 column1 column2 0 Table Violet 1 Chair Indigo 2 Phone Blue 3 Laptop Green 4 Desktop NaN 5 Tablet Orange 6 Bench Red
Previously, when the axis
was 0, and when the data were sorted, the rows of the data changed accordingly. Now when the data is sorted with axis=1
, the columns of the data change based on the values in the column. The data is sorted based on the row with index 1. Note the difference before and after the sort. This is similar to sorting the transpose of the data with axis=0. In the examples above, when the data were sorted with axis=0
, the indices also changed along with the data. Setting the value of ignore_index
to True
, the index values can be retained as such.
>>> df.sort_values(['column1'],ignore_index=True) column1 column2 column3 0 Blue 2 Phone 1 Green 3 Laptop 2 Indigo 1 Chair 3 Orange 5 Tablet 4 Red 6 Bench 5 Violet 0 Table 6 NaN 4 Desktop
Otherwise,
>>> df.sort_values(['column1'],ignore_index=False) column1 column2 column3 2 Blue 2 Phone 3 Green 3 Laptop 1 Indigo 1 Chair 5 Orange 5 Tablet 6 Red 6 Bench 0 Violet 0 Table 4 NaN 4 Desktop
Note the difference between the indices of the above two examples.
So far, the value of the argument inplace
was set to False
. So the Python interpreter printed the data frame that was sorted and returned by the method sort_values
. If the value of inplace
is set to True
, the method will no longer return the sorted data. Instead, it will sort the data and store it in the same object.
>>> df.sort_values(['column1'],inplace=True) >>> df column1 column2 column3 2 Blue 2 Phone 3 Green 3 Laptop 1 Indigo 1 Chair 5 Orange 5 Tablet 6 Red 6 Bench 0 Violet 0 Table 4 NaN 4 Desktop
Notice that after the execution of the statement, the DataFrame is not printed.
Leave a Reply