Group by and count in Pandas Python
In this tutorial, we will learn how to use groupby() and count() function provided by Pandas Python library. Pandas is a very useful library provided by Python. This library provides various useful functions for data analysis and also data visualization.
The strength of this library lies in the simplicity of its functions and methods. If you have an intermediate knowledge of coding in Python, you can easily play with this library.
count() in Pandas
Pandas provide a count() function which can be used on a data frame to get initial knowledge about the data. When you use this function alone with the data frame it can take 3 arguments.
a count can be defined as,
axis: it can take two predefined values 0,1. When axis=0 it will return the number of rows present in the column. Axis=1 returns the number of column with non-none values.
level: If the data frame contains multi-index then this value can be specified. By default, it is set to None.
numeric_only: by default when we set this attribute to True, the function will return the number of rows in a column with numeric values only, else it will return the count of all columns.
Note: All these attributes are optional, they can be specified if we want to study data in a specific manner.
import pandas as pd df = pd.read_csv("data.csv")
here we have imported pandas library and read a CSV(comma separated values) file containing our data frame. Pandas provide a built-in function for this purpose i.e read_csv(“filename”).
Name 457 Team 457 Number 457 Position 457 Age 457 Height 457 Weight 457 College 373 Salary 446 dtype: int64
Columns and their total number of fields are mentioned in the output. Here the default value of the axis =0, numeric_only=False and level=None. You can try and change the value of the attributes by yourself to observe the results and understand the concept in a better way.
groupby() in Pandas
While analysing huge dataframes this groupby() functionality of pandas is quite a help. When we want to study some segment of data from the data frame this groupby() is used. This function splits the data frame into segments according to some criteria specified during the function call.
dataframe.groupby(self,by:= None,axis:= 0,level: = None,as_index: = True,sort: = True,group_keys: = True,squeeze: = False,observed: = False,**kwargs)
by: its a mapping function, by default set to None
axis: int type of attribute with default value 0.
level: this used when the axis is multi-index
as_index: it takes two boolean values, by default True. If set to False it will show the index column.
group_keys: It is used when we want to add group keys to the index to identify pieces.
squeeze: When it is set True then if possible the dimension of dataframe is reduced.
groupby() function returns a group by an object.
import pandas as pd df = pd.read_csv("data.csv") df_use=df.groupby('College')
here we have used groupby() function over a CSV file. We have grouped by ‘College’, this will form the segments in the data frame according to College.
Now, let’s say we want to know how many teams a College has,
This will show us the number of teams in a College.
Output: College Alabama 3 Arizona 13 Arizona State 2 Arkansas 3 Baylor 1
So this is how we can easily segment the data frame and use it according to our need.