How to Calculate Skewness and Kurtosis in Python
In this tutorial, we will be looking into How to Calculate Skewness and Kurtosis in Python.
We know, in Python when we plot the probability distribution of data, we get bell-shaped graph, similar to the one shown below:
This is also known as the Normal distribution of data.
But not all normal distribution come in this shape, i.e., proper bell shape. So, we need some ways to measure the shape of the distribution.
This is where Skewness and Kurtosis come into the picture.
What is Skewness?
Skewness is a statistical measure used to measure the shape of distribution in a set of data. It measures the distortion in the bell curve.
The bell curve can be positively skewed or negatively skewed or symmetrically distributed.
In Positive Skew, the bell curve is more towards the right side, for example:
Here Skewness >0.
In Negative Skew, the bell curve is more towards the left side, for example:
Here Skewness < 0.
The symmetrical distribution will be as shown below. It will be neither to the right nor to the left. It will be exactly in middle.
Here Skewness = 0.
Also read:
How to find skewness of data using Python
What is Kurtosis?
Kurtosis measures how long the tail of the distribution is. They are:
- Mesokurtic – Normal distribution, kurtosis = 3.
- Platykurtic – Negative or light-tailed kurtosis, kurtosis < 3. Here the distribution will have flat peak.
- Leptokurtic – Positive or heavy-tailed kurtosis, kurtosis > 3. Here the distribution will have sharp peak.
Implementation in Python
Let’s write Python code to find the skew and kurtosis of data.
First, we will install the necessary library.
pip install scipy
Now we will import important libraries.
import numpy as np from scipy.stats import skew, kurtosis
Now, we need some data for calculation. Here I am generating some random sample data.
data = np.random.normal(size=1000)
Now we will calculate the skewness and kurtosis for this data.
skewness = skew(data) kurt = kurtosis(data) print("The Skewness of the sample data is:", skewness) print("The Kurtosis of the sample data is:", kurt)
Note: If the data is present as dataframe or series, we can use pandas to find the skewness and kurtosis as shown below:
import pandas as pd data_series = pd.Series(data) # Calculate skewness and kurtosis skewness_pandas = data_series.skew() kurt_pandas = data_series.kurtosis() print("The Skewness of data using pandas:", skewness_pandas) print("The Kurtosis of data using pandas:", kurt_pandas)
To print the type of skewness and kurtosis, we can add the following code:
# Interpret the skewness if skewness > 0: print("The distribution is positively skewed.") elif skewness < 0: print("The distribution is negatively skewed.") else: print("The distribution is approximately symmetric.") # Interpret the kurtosis if kurt > 3: print("The distribution is leptokurtic (heavy-tailed).") elif kurt < 3: print("The distribution is platykurtic (light-tailed).") else: print("The distribution has a normal kurtosis.")
Output:
Leave a Reply