pandas.query() method with examples

We know how the pandas library written in Python plays an important role in Data manipulation and Analysis. In this tutorial, We will understand Dataframe.query(),one of the filtering operations of pandas DataFrame.

The main uses of this filtering query() function to query the columns of a DataFrame with a boolean expression like |, &, <, >, ==.

Syntax: pandas.query() method

DataFrame.query(self, expr, inplace=False, **kwargs)
expr: The query string to evaluate
inplace: Whether the query should modify the data in place to not
**kwargs: allows you to pass the keyworded variable length of arguments to a function.
return: It returns DataFrame for the query.

 

Examples:

Here we have custumer.csv data which we use in our example.
We import custumer.csv data using pandas and see the preview of customer dataset.

import pandas as pd
data = pd.read_csv("./Custumer_Segmentation.csv")
data.head()

Output:

 

Let’s say we want to know all informations about customers who have income > $200k and DebtIncomeRatio<5. for this, we now will filter using pandas.query() using boolean & operation on these two.

Below is the query for above conditions:

data.query("DebtIncomeRatio<5 & Income>200")

Output:

 

What actually it does is that it takes the result of boolean values for given conditions. Here we have and of two operations, first for DebtIncomeRatio<5 conditions and second for Income>200.

Ok let me give a clear explanation say if we have below expression:

data["DebtIncomeRatio"]<5

It will return either true or false like this for each customer id:

We will get either true or false for each condition in query.
Like this, every query will give their own result within quotes inside query(). If all conditions satisfy we will get the corresponding output as Dataframe.

 

Example 2:

Now we will use the more complex expression.
Say, we want to know the information of customers whose DebtIncomeRatio<10 and whose Income>210 or Education>7 and Address==’NBA014′.

We will use the following query in which we put all the above conditions within the query:

data.query("DebtIncomeRatio<10 & (Income>210 | Edu>7) & Address=='NBA014'")

Output:

Customer_Id Age Edu Years_Employed Income Debt Address DebtIncomeRatio
443    444   51   2     31   249  19.422   NBA014    7.8
532    533   43   4     18   446  28.990   NBA014    6.5

 

We get only two customers for the given conditions in the query.

Now we can see how easy and clean to write such code using DataFrame.query() filter or select rows datagram with the specified conditions.

Also read: DataFrame, date_range(), slice() in Python Pandas library

Leave a Reply

Your email address will not be published. Required fields are marked *