Replace specific column values using pandas in Python
In this tutorial, I will discuss replacing the values from your dataframe using Pandas in Python. There are various ways to do it but the easiest and simplest way is using the function provided by the pandas library.
Using df.replace() function
This function takes 2 parameters: the first is an old value, and the second is a new value. If you want to replace multiple values simultaneously, then pass the values in the form of a dictionary. Here, df is the DataFrame, which the pandas
library created from your data.
import pandas as pd import numpy as np df = pd.read_csv("data/Data.csv") df
I am here loading the demo data file from my computer.
Replacing values from column
Let’s replace Spain from the Country column with the USA. This will replace all the values of Spain, and you can no longer see Spain in the Country column.
df['Country'] = df['Country'].replace("Spain","USA") df
Replacing multiple values simultaneously
Let’s simultaneously replace Yes from the Purchased column with the No and No with the Yes so that all the values within the Purchased column get swapped.
df = df.replace({"No":"Yes", "Yes":"No"}) df
Replacing a particular value at row and column – Pandas
Suppose you want to replace a value. But similar values exist at other places in the dataframe, too. But you want to replace only at a specific place. So, to handle this case, we use the df.loc[]
. It takes the row index and Column name.
Let’s replace France from the Country Column at row index 8 with India.
df.loc[8,"Country"] = "India" df
Replacing throughout the dataframe
If you want to replace the values through the entire dataframe and not just in one column. You can avoid mentioning the column name in the syntax. Let’s replace all the No values in the dataframe with the Yes value.
df = df.replace("No","Yes") df
Additional tip: Replacing NaN values
As you can see, in my dataset, some NaN (Not a Number) values are present. Generally, these values are replaced by mean values. So, let’s calculate the mean for the Age and Salary column first and then replace the NaN value with the mean value. We use the .fillna()
function to fill the NaN values.
mean_salary = df['Salary'].mean() mean_age = df['Age'].mean() df['Salary'] = df['Salary'].fillna(mean_salary) df['Age'] = df['Age'].fillna(mean_age) df
Leave a Reply