Calculate Mean, Median and Mode in Pandas DataFrame – Python

In this tutorial, you will learn how to calculate the Mean, Median, and Mode in Pandas DataFrame using Python. These metrics are important to calculate as they measure central tendency and provide various insights into the dataset.

Let’s first import all the necessary libraries and the dataset. I am here loading the demo data file from my computer.

import pandas as pd

df = pd.read_csv("data/Data.csv")
df

Take a look at my data file:

sample dataset

Mean calculation in Pandas DataFrame

It is also known as the average value and is obtained by adding all the values in a column of the dataset divided by the number of values in that column.
df['Column Name'].mean() is the function to calculate the mean in Python. Here, I am calculating the mean of the Salary column.

mean_salary = df['Salary'].mean()
print(mean_salary)
63777.77777777778

Median calculation

When the data is sorted, the value at the middle position is called the Median value.
df['Column Name'].median() is the function to calculate the median in Python. Here, I am calculating the median of the Salary column.

median_salary = df['Salary'].median()
print(median_salary)
61000.0

Mode calculation

The value with the highest frequency in the data is called the Mode value.
df['Column Name'].mode() is the function to calculate the mode in Python. Here, I am calculating the mode of the Salary column.

mode_salary = df['Salary'].mode()
print(mode_salary)
0 48000.0
1 52000.0 
2 54000.0 
3 58000.0 
4 61000.0 
5 67000.0 
6 72000.0 
7 79000.0 
8 83000.0

If you are surprised by the output, then notice the frequency of each value in the dataset. All have a frequency equal to 1. Thus, all are mode values. As you can see, there are indices on the left side. The reason is the function returns the output in a series format. Now, I am replacing a value in the Salary column so that one value can have a frequency equal to 2, and then we will again check the mode value. If you don’t know how to replace values in the dataset, check out my post here.

df['Salary'] = df['Salary'].replace(48000,72000)
df

updated sample dataset

mode_salary = df['Salary'].mode()
print(mode_salary)
0 72000.0

Now, as the frequency of 72000 is 2, we get only one output as we have only one mode value.

Leave a Reply

Your email address will not be published. Required fields are marked *