Intersection of two DataFrames in Pandas Python

In this tutorial, we will learn how to perform the intersection of two DataFrames in Pandas Python. By the end of this tutorial, you will learn the intersection of two data frames and also be able to perform other operations on the data frames without any difficulty.

This will help in the process of data analysis and can be used for understanding the relationship between the data frames. So, let’s begin the tutorial.

Install Pandas

This is the prerequisite to proceed to use Pandas. If you have not installed it, you can install it by using the below command in the command prompt.

pip install pandas

Creating Data Frame in Pandas

Here are some of the most common ways to create a data frame in Pandas.

To create a data frame in pandas first, we have to import the Pandas library. It is done in the following way

import pandas as p

Creating a DataFrame using a dictionary of lists

First import pandas. Then gather the attributes and the data related to the attributes and assign them to a variable.

data1 = { ‘0’:[1,2,3,4,5], ‘1’:[‘Hyderabad’,’Delhi’,’Mumbai’,’Chennai’,’Kerela’] }

The next step will be creating the data frame. For this purpose, we use the statement,

d1 = p.DataFrame(data1)

Putting everything together we have,

import pandas as p
data1 = { '0':[1,2,3,4,5], '1':['Hyderabad','Delhi','Mumbai','Chennai','Kerela'] }
d1 = p.DataFrame(data1) 
print(d1)

The output is:

       0                   1
0      1                   Hyderabad
1      2                   Delhi
2      3                   Mumbai
3      4                   Chennai
4      5                   Kerela

Creating a Data Frame using a list of lists

Here, the process used for creating the data frame is the same as above with the only difference being in the process of creation of the data. Here, we have used a list of lists instead of a dictionary of lists.

data1 = [ [1,’Hyderabad’], [2,’Delhi’], [3,’Mumbai’], [4,’Chennai’], [5,’Kerela’] ]

The code for data frame creation is,

import pandas as p
data1 = [ [1,'Hyderabad'], [2,'Delhi'], [3,'Mumbai'], [4,'Chennai'], [5,'Kerela'] ]
d1 = p.DataFrame(data1) 
print(d1)

The output is:

       0                   1
0      1                   Hyderabad
1      2                   Delhi
2      3                   Mumbai
3      4                   Chennai
4      5                   Kerela

The intersection of  two DataFrames

To get the intersection of two DataFrames in Pandas we use a function called merge(). This function has an argument named ‘how’. On specifying the details of ‘how’,  various actions are performed. If ‘how’ = inner, then we will get the intersection of two data frames. The argument ‘on’ is used to specify the attributes on which the intersection process is to be performed.  Let us demonstrate this with an example:

import pandas as p
dat1 = {'Person': [1, 2, 3, 4],
         'Place': ['Hyderabad', 'Delhi', 'Mumbai', 'Chennai']} 
dat2 = {'Person': [1, 2, 3, 4 ],
         'Place': ['Delhi', 'America', 'Mumbai', 'Chennai'],
         'Name':['Ravi', 'Raju', 'Ram', 'Sham']} 
d1 = p.DataFrame(dat1)
d2 = p.DataFrame(dat2) 
res = p.merge(d1, d2, how='inner', on=['Place', 'Person'])
print(res)

The final output is:

    Person    Place     Name
0   3         Mumbai    Ram
1   4         Chennai   Sham

From the above output, we can observe that Person and Place have the same values in the first and second data frames, so the intersection of the data frames consists of the common data to both the data frames.

Also, read: Join Two DataFrames in Pandas with Python

Leave a Reply

Your email address will not be published. Required fields are marked *