Transform Pandas columns using map and apply
In this tutorial, we will use map()
and apply()
methods to transform Pandas columns. While working with datasets, there will be many situations where you need to transform and manipulate data. These methods are helpful when you need to map one set of values to another set of values by using a function.
Using map() method
The map function is handy as it can take three different shapes i.e., dictionaries, functions, and series. Refer Mapping values in a Pandas Dataframe to know step by step mapping process. Let’s look at each of these scenarios,
Using the map method to map a dictionary
When you pass a dictionary to map()
method, it matches column values with keys in the dictionary. Now, the corresponding key values are added to our data frame.
Example:
import pandas as pd data = pd.DataFrame({ 'e_name': ['Janu', 'Yash', 'Misande', 'Edward', 'Kate'], 'e_age': [30, 40, 32, 67, 43], 'e_score': [900, 950, 970, 820, 870], 'e_income':[100000, 80000, 550000, 62000, 50000]}) print(data) genders = {'Janu': 'Female', 'Yash': 'Male', 'Misande': 'Female', 'Edward': 'Male', 'Kate': 'Female'} data['gender'] = data['e_name'].map(genders) print(data)
Output:
e_name e_age e_score e_income 0 Janu 30 900 100000 1 Yash 40 950 80000 2 Misande 32 970 550000 3 Edward 67 820 62000 4 Kate 43 870 50000 e_name e_age e_score e_income gender 0 Janu 30 900 100000 Female 1 Yash 40 950 80000 Male 2 Misande 32 970 550000 Female 3 Edward 67 820 62000 Male 4 Kate 43 870 50000 Female
Using the map method to map a Function
Here, we pass a function to map()
method. It takes in a value from the series and returns a new value that will be part of a new series.
Example:
mean_score = data['e_score'].mean() def high_score(x): return x > mean_score data['higher_score'] = data['e_score'].map(high_score) print(data)
Output:
e_name e_age e_score e_income gender higher_score 0 Janu 30 900 100000 Female False 1 Yash 40 950 80000 Male True 2 Misande 32 970 550000 Female True 3 Edward 67 820 62000 Male False 4 Kate 43 870 50000 Female False
- We can also pass an anonymous Lambda function as we are using the function only once. Then the piece of code will be simpler like,
data['higher_score'] = data['e_score'].map(lambda x: x > mean_score)
Using the map method to map an Indexed Series
Finally, we are going to learn how to pass a Pandas Series to map()
method. It overwrites the values in the series applied using the values from the series passed.
Example:
last_names = pd.Series(['Smith', 'Taylor', 'Jones', 'Harris', 'Parker'], index=data['e_name']) data['last_name'] = data['e_name'].map(last_names) print(data)
Output:
Using apply() method
The apply()
method can be used on either a Pandas series or a Data frame. Unlike map()
method, it can only take a function.
Example:
def project(row): return row['e_age'] < 45 and row['e_income'] > 75000 data['project'] = data.apply(project, axis=1) print(data)
Output:
e_name e_age e_score e_income project 0 Janu 30 900 100000 True 1 Yash 40 950 80000 True 2 Misande 32 970 550000 True 3 Edward 67 820 62000 False 4 Kate 43 870 50000 False
There is also a way of passing arguments to the function in apply method. It is done by using ‘args
‘ parameter inside the function.
Also read,
Leave a Reply