Understanding Boolean Indexing in Pandas
In this blog ‘Understanding Boolean Indexing in Pandas’, we are going to discuss Boolean Indexing. Pandas is a Python package. It is commonly used for data manipulation and analysis. It provides fast, flexible, and powerful tools for data analysis. This package mainly uses the data structures Series and DataFrame, which are 1-dimensional and 2-dimensional labeled data structures with different types of columns respectively.
Boolean Indexing in Pandas – Python
Boolean Indexing is a type of Indexing in which we filter data by the use of boolean vectors. We can use boolean indexing to filter data in Series as well as DataFrame.
Boolean Indexing on Series
1.
import pandas as pd import numpy as np s = pd.Series(np.random.randn(7)) s
Output-
0 1.833026 1 -0.799166 2 0.820638 3 0.596972 4 -0.874675 5 -0.996899 6 0.238722 dtype: float64
2.
s[s>0]
Output-
0 1.833026 2 0.820638 3 0.596972 6 0.238722 dtype: float64
3.
In boolean indexing, we use the operators |
for or, &
for and, and ~
for not. We write the sentence s>0 | s<0
using parenthesis as (s>0) | (s<0)
because Python interprets s>0 | s<0
as s > (0 | (s < 0))
.
cond = (s>0) | (s<0) s[cond]
Output-
0 1.833026 1 -0.799166 2 0.820638 3 0.596972 4 -0.874675 5 -0.996899 6 0.238722 dtype: float64
Boolean Indexing on DataFrame
For performing boolean indexing on a DataFrame, we have to use boolean vectors with the same length as the DataFrame’s index. Commonly, programmers use something derived from the columns of the DataFrame in the boolean vector.
list = ['one', 'two', 'three', 'four', 'five', 'six', 'seven'] indexList = [1,2,3,4,5,6,7] df = pd.DataFrame({'a': list, 'b': np.random.randn(7)}, index = indexList) df
Output-
a b 1 one 0.444295 2 two -1.069635 3 three -0.635911 4 four 1.559587 5 five 0.662912 6 six 0.256670 7 seven 1.028248
Let’s pass boolean vectors using the methods loc[]
and iloc[]
1. using .loc[]
cond = df['b'] > 0 df.loc[cond]
Output-
a b 1 one 0.444295 4 four 1.559587 5 five 0.662912 6 six 0.256670 7 seven 1.028248
2. using .iloc[]
In iloc, we cannot pass boolean values, rather we have to pass integer values.
df.iloc[6]
Output-
a seven b 1.02825 Name: 7, dtype: object
Also read: Pandas loc vs iloc in Python Data Frame
Leave a Reply