Replace specific column values using pandas in Python

Post Views: 665

In this tutorial, I will discuss replacing the values from your dataframe using Pandas in Python. There are various ways to do it but the easiest and simplest way is using the function provided by the pandas library.

Using df.replace() function

This function takes 2 parameters: the first is an old value, and the second is a new value. If you want to replace multiple values simultaneously, then pass the values in the form of a dictionary. Here, df is the DataFrame, which the pandas library created from your data.

import pandas as pd
import numpy as np

df = pd.read_csv("data/Data.csv")
df

I am here loading the demo data file from my computer.

Replacing values from column

Let’s replace Spain from the Country column with the USA. This will replace all the values of Spain, and you can no longer see Spain in the Country column.

df['Country'] = df['Country'].replace("Spain","USA")
df

Replacing multiple values simultaneously

Let’s simultaneously replace Yes from the Purchased column with the No and No with the Yes so that all the values within the Purchased column get swapped.

df = df.replace({"No":"Yes", "Yes":"No"})
df

Replacing a particular value at row and column – Pandas

Suppose you want to replace a value. But similar values exist at other places in the dataframe, too. But you want to replace only at a specific place. So, to handle this case, we use the df.loc[]. It takes the row index and Column name.
Let’s replace France from the Country Column at row index 8 with India.

df.loc[8,"Country"] = "India"
df

Replacing throughout the dataframe

If you want to replace the values through the entire dataframe and not just in one column. You can avoid mentioning the column name in the syntax. Let’s replace all the No values in the dataframe with the Yes value.

df = df.replace("No","Yes")
df

Additional tip: Replacing NaN values

As you can see, in my dataset, some NaN (Not a Number) values are present. Generally, these values are replaced by mean values. So, let’s calculate the mean for the Age and Salary column first and then replace the NaN value with the mean value. We use the .fillna()function to fill the NaN values.

mean_salary = df['Salary'].mean()
mean_age = df['Age'].mean()

df['Salary'] = df['Salary'].fillna(mean_salary)
df['Age'] = df['Age'].fillna(mean_age)
df