Calculate Mean, Median and Mode in Pandas DataFrame – Python
In this tutorial, you will learn how to calculate the Mean, Median, and Mode in Pandas DataFrame using Python. These metrics are important to calculate as they measure central tendency and provide various insights into the dataset.
Let’s first import all the necessary libraries and the dataset. I am here loading the demo data file from my computer.
import pandas as pd df = pd.read_csv("data/Data.csv") df
Take a look at my data file:
Mean calculation in Pandas DataFrame
It is also known as the average value and is obtained by adding all the values in a column of the dataset divided by the number of values in that column.
df['Column Name'].mean()
is the function to calculate the mean in Python. Here, I am calculating the mean of the Salary column.
mean_salary = df['Salary'].mean() print(mean_salary)
63777.77777777778
Median calculation
When the data is sorted, the value at the middle position is called the Median value.
df['Column Name'].median()
is the function to calculate the median in Python. Here, I am calculating the median of the Salary column.
median_salary = df['Salary'].median() print(median_salary)
61000.0
Mode calculation
The value with the highest frequency in the data is called the Mode value.
df['Column Name'].mode()
is the function to calculate the mode in Python. Here, I am calculating the mode of the Salary column.
mode_salary = df['Salary'].mode() print(mode_salary)
0 48000.0 1 52000.0 2 54000.0 3 58000.0 4 61000.0 5 67000.0 6 72000.0 7 79000.0 8 83000.0
If you are surprised by the output, then notice the frequency of each value in the dataset. All have a frequency equal to 1. Thus, all are mode values. As you can see, there are indices on the left side. The reason is the function returns the output in a series format. Now, I am replacing a value in the Salary column so that one value can have a frequency equal to 2, and then we will again check the mode value. If you don’t know how to replace values in the dataset, check out my post here.
df['Salary'] = df['Salary'].replace(48000,72000) df
mode_salary = df['Salary'].mode() print(mode_salary)
0 72000.0
Now, as the frequency of 72000 is 2, we get only one output as we have only one mode value.
Leave a Reply