# Understanding Boolean Indexing in Pandas

In this blog ‘Understanding Boolean Indexing in Pandas’, we are going to discuss Boolean Indexing. Pandas is a Python package. It is commonly used for data manipulation and analysis. It provides fast, flexible, and powerful tools for data analysis. This package mainly uses the data structures Series and DataFrame, which are 1-dimensional and 2-dimensional labeled data structures with different types of columns respectively.

## Boolean Indexing in Pandas – Python

Boolean Indexing is a type of Indexing in which we filter data by the use of boolean vectors. We can use boolean indexing to filter data in Series as well as DataFrame.

### Boolean Indexing on Series

1.

import pandas as pd import numpy as np s = pd.Series(np.random.randn(7)) s

Output-

0 1.833026 1 -0.799166 2 0.820638 3 0.596972 4 -0.874675 5 -0.996899 6 0.238722 dtype: float64

2.

s[s>0]

Output-

0 1.833026 2 0.820638 3 0.596972 6 0.238722 dtype: float64

3.

In boolean indexing, we use the operators `|`

for or, `&`

for and, and `~`

for not. We write the sentence `s>0 | s<0`

using parenthesis as `(s>0) | (s<0)`

because Python interprets `s>0 | s<0`

as `s > (0 | (s < 0))`

.

cond = (s>0) | (s<0) s[cond]

Output-

0 1.833026 1 -0.799166 2 0.820638 3 0.596972 4 -0.874675 5 -0.996899 6 0.238722 dtype: float64

### Boolean Indexing on DataFrame

For performing boolean indexing on a DataFrame, we have to use boolean vectors with the same length as the DataFrame’s index. Commonly, programmers use something derived from the columns of the DataFrame in the boolean vector.

list = ['one', 'two', 'three', 'four', 'five', 'six', 'seven'] indexList = [1,2,3,4,5,6,7] df = pd.DataFrame({'a': list, 'b': np.random.randn(7)}, index = indexList) df

Output-

a b 1 one 0.444295 2 two -1.069635 3 three -0.635911 4 four 1.559587 5 five 0.662912 6 six 0.256670 7 seven 1.028248

Let’s pass boolean vectors using the methods `loc[]`

and `iloc[]`

#### 1. using .loc[]

cond = df['b'] > 0 df.loc[cond]

Output-

a b 1 one 0.444295 4 four 1.559587 5 five 0.662912 6 six 0.256670 7 seven 1.028248

#### 2. using .iloc[]

In iloc, we cannot pass boolean values, rather we have to pass integer values.

df.iloc[6]

Output-

a seven b 1.02825 Name: 7, dtype: object

Also read: Pandas loc vs iloc in Python Data Frame

## Leave a Reply