Dataframe.describe() in Pandas

The pandas describe method is used to provide a detailed description of the data. It is used with series or data frames. It works with different data types. This method is used with numeric data and strings/objects. It provides information about the mean, count, standard deviation, min, max and percentiles of the data when dealing with numeric data. When dealing with strings/objects the information about count, unique, top, frequency is provided.

describe() in Pandas

This method has three arguments. All three of them are optional arguments.

percentiles: A list argument must be provided. By default 25,50,75 percentiles are returned.

include: A list argument with the data types of columns required to be included must be specified. If all columns must be returned, use ‘all’.

exclude: A list argument with the data types of columns that must be excluded must be specified.

Creating a data frame with numeric data and using describe()

Create a data frame by importing the pandas library. Call the describe() method on the created data frame and observe the results.

import pandas as p
data = [1,20.54,672,333,-1.678]
d = p.DataFrame(data)
print(d.describe())

OUTPUT:

      0
count 5.000000
mean  204.972400
std   296.997594
min   -1.678000
25%   1.000000
50%   20.540000
75%   333.000000
max   672.000000

Creating a data frame with string/object data and using describe()

Create a data frame with string data. Call the describe() method on the created data frame and observe the results.

import pandas as p
data1=['h','e','l','l','o']
d1 = p.DataFrame(data1)
print(d1.describe())

OUTPUT:

       0
count  5
unique 4
top    l
freq   2

Creating a data frame with string/object and numeric data and using describe()

Create a data frame with different types of data. Based on the requirement, use different arguments to get statistical information from the data. When we use numeric and string/object data, only the statistics of the numeric data is returned.

import pandas as p
data3={'first':[20.12,-33,-240], 'second':['h','a','i']}
d3 = p.DataFrame(data3)
print(d3.describe(percentiles=[0.25,0.5]))

OUTPUT:

       first
count  3.000000
mean  -84.293333
std    137.436742
min   -240.000000
25%   -136.500000
50%   -33.000000
max    20.120000

Here we have used the percentiles argument providing a list of values of the required percentiles of the data. We provided the values for 25th and 50th percentile and in the output, we can find only those percentile values.

Using the include argument

If we want the details of both numeric and string/object data, we should use the ‘include’ argument. The value for the argument must be given as ‘all’.

import pandas as p
data3={'first':[20.12,-33,-240], 'second':['h','a','i']}
d3 = p.DataFrame(data3)
print(d3.describe(include='all'))

OUTPUT:

       first       second
count  3.000000    3
unique NaN         3
top    NaN         i
freq   NaN         1
mean   -84.293333  NaN
std    137.436742  NaN
min    -240.000000 NaN
25%    -136.500000 NaN
50%    -33.000000  NaN
75%    -6.440000   NaN
max    20.120000   NaN

If a specific type of data is to be returned, we must use the NumPy library on it and use the ‘include’ argument to return the information.

import pandas as p
import numpy as n
data3={'first':[20.12,-33,-240], 'second':['h','a','i']}
d3 = p.DataFrame(data3)
print(d3.describe(include= n.object))

OUTPUT:

       second
count  3
unique 3
top    i
freq   1

Using the exclude argument

To exclude the types of data from the results returned by the describe method, we use this argument.

import pandas as p
import numpy as n
data3={'first':[20.12,-33,-240], 'second':['h','a','i']}
d3 = p.DataFrame(data3)
print(d3.describe(exclude= n.number))

OUTPUT:

       second
count  3
unique 3
top    i
freq   1

Also read: pandas.get_dummies in Python

Leave a Reply

Your email address will not be published.