Understanding Boolean Indexing in Pandas

In this blog ‘Understanding Boolean Indexing in Pandas’, we are going to discuss Boolean Indexing. Pandas is a Python package. It is commonly used for data manipulation and analysis. It provides fast, flexible, and powerful tools for data analysis. This package mainly uses the data structures Series and DataFrame, which are 1-dimensional and  2-dimensional labeled data structures with different types of columns respectively.

Boolean Indexing in Pandas – Python

Boolean Indexing is a type of Indexing in which we filter data by the use of boolean vectors. We can use boolean indexing to filter data in Series as well as DataFrame.

Boolean Indexing on Series

1.

import pandas as pd
import numpy as np

s = pd.Series(np.random.randn(7))
s

Output-

0    1.833026
1   -0.799166
2    0.820638
3    0.596972
4   -0.874675
5   -0.996899
6    0.238722
dtype: float64

2.

s[s>0]

Output-

0    1.833026
2    0.820638
3    0.596972
6    0.238722
dtype: float64

3.

In boolean indexing, we use the operators | for or, & for and, and ~ for not. We write the sentence s>0 | s<0 using parenthesis as (s>0) | (s<0) because Python interprets s>0 | s<0 as s > (0 | (s < 0)).

cond = (s>0) | (s<0)
s[cond]

Output-

0    1.833026
1   -0.799166
2    0.820638
3    0.596972
4   -0.874675
5   -0.996899
6    0.238722
dtype: float64

Boolean Indexing on DataFrame

For performing boolean indexing on a DataFrame, we have to use boolean vectors with the same length as the DataFrame’s index. Commonly, programmers use something derived from the columns of the DataFrame in the boolean vector.

list = ['one', 'two', 'three', 'four', 'five', 'six', 'seven']
indexList = [1,2,3,4,5,6,7]
df = pd.DataFrame({'a': list,
                   'b':  np.random.randn(7)}, index = indexList)
df

Output-

   a     b
1	one	0.444295
2	two	-1.069635
3	three	-0.635911
4	four	1.559587
5	five	0.662912
6	six	0.256670
7	seven	1.028248

Let’s pass boolean vectors using the methods loc[] and iloc[]

1.  using .loc[]

cond = df['b'] > 0
df.loc[cond]

Output-

    a	  b
1	one	0.444295
4	four	1.559587
5	five	0.662912
6	six	0.256670
7	seven	1.028248

2.  using .iloc[]

In iloc, we cannot pass boolean values, rather we have to pass integer values.

df.iloc[6]

Output-

a      seven
b    1.02825
Name: 7, dtype: object

Also read: Pandas loc vs iloc in Python Data Frame

Leave a Reply

Your email address will not be published. Required fields are marked *