Dataframe.describe() in Pandas
The pandas describe method is used to provide a detailed description of the data. It is used with series or data frames. It works with different data types. This method is used with numeric data and strings/objects. It provides information about the mean, count, standard deviation, min, max and percentiles of the data when dealing with numeric data. When dealing with strings/objects the information about count, unique, top, frequency is provided.
describe() in Pandas
This method has three arguments. All three of them are optional arguments.
percentiles: A list argument must be provided. By default 25,50,75 percentiles are returned.
include: A list argument with the data types of columns required to be included must be specified. If all columns must be returned, use ‘all’.
exclude: A list argument with the data types of columns that must be excluded must be specified.
Creating a data frame with numeric data and using describe()
Create a data frame by importing the pandas library. Call the describe()
method on the created data frame and observe the results.
import pandas as p data = [1,20.54,672,333,-1.678] d = p.DataFrame(data) print(d.describe())
OUTPUT:
0 count 5.000000 mean 204.972400 std 296.997594 min -1.678000 25% 1.000000 50% 20.540000 75% 333.000000 max 672.000000
Creating a data frame with string/object data and using describe()
Create a data frame with string data. Call the describe() method on the created data frame and observe the results.
import pandas as p data1=['h','e','l','l','o'] d1 = p.DataFrame(data1) print(d1.describe())
OUTPUT:
0 count 5 unique 4 top l freq 2
Creating a data frame with string/object and numeric data and using describe()
Create a data frame with different types of data. Based on the requirement, use different arguments to get statistical information from the data. When we use numeric and string/object data, only the statistics of the numeric data is returned.
import pandas as p data3={'first':[20.12,-33,-240], 'second':['h','a','i']} d3 = p.DataFrame(data3) print(d3.describe(percentiles=[0.25,0.5]))
OUTPUT:
first count 3.000000 mean -84.293333 std 137.436742 min -240.000000 25% -136.500000 50% -33.000000 max 20.120000
Here we have used the percentiles argument providing a list of values of the required percentiles of the data. We provided the values for 25th and 50th percentile and in the output, we can find only those percentile values.
Using the include argument
If we want the details of both numeric and string/object data, we should use the ‘include’ argument. The value for the argument must be given as ‘all’.
import pandas as p data3={'first':[20.12,-33,-240], 'second':['h','a','i']} d3 = p.DataFrame(data3) print(d3.describe(include='all'))
OUTPUT:
first second count 3.000000 3 unique NaN 3 top NaN i freq NaN 1 mean -84.293333 NaN std 137.436742 NaN min -240.000000 NaN 25% -136.500000 NaN 50% -33.000000 NaN 75% -6.440000 NaN max 20.120000 NaN
If a specific type of data is to be returned, we must use the NumPy library on it and use the ‘include’ argument to return the information.
import pandas as p import numpy as n data3={'first':[20.12,-33,-240], 'second':['h','a','i']} d3 = p.DataFrame(data3) print(d3.describe(include= n.object))
OUTPUT:
second count 3 unique 3 top i freq 1
Using the exclude argument
To exclude the types of data from the results returned by the describe
method, we use this argument.
import pandas as p import numpy as n data3={'first':[20.12,-33,-240], 'second':['h','a','i']} d3 = p.DataFrame(data3) print(d3.describe(exclude= n.number))
OUTPUT:
second count 3 unique 3 top i freq 1
Also read: pandas.get_dummies in Python
Leave a Reply