Group by and count in Pandas Python

In this tutorial, we will learn how to use groupby() and count() function provided by Pandas Python library. Pandas is a very useful library provided by Python. This library provides various useful functions for data analysis and also data visualization.

The strength of this library lies in the simplicity of its functions and methods. If you have an intermediate knowledge of coding in Python, you can easily play with this library.

count() in Pandas

Pandas provide a count() function which can be used on a data frame to get initial knowledge about the data.  When you use this function alone with the data frame it can take 3 arguments.
a count can be defined as,
dataframe. count(axis=0,level=None,numeric_only=False)

axis: it can take two predefined values 0,1. When axis=0 it will return the number of rows present in the column. Axis=1 returns the number of column with non-none values.

level: If the data frame contains multi-index then this value can be specified. By default, it is set to None.

numeric_only: by default when we set this attribute to True, the function will return the number of rows in a column with numeric values only, else it will return the count of all columns.

Note: All these attributes are optional, they can be specified if we want to study data in a specific manner.

import pandas as pd

df = pd.read_csv("data.csv")

here we have imported pandas library and read a CSV(comma separated values) file containing our data frame. Pandas provide a built-in function for this purpose i.e read_csv(“filename”).

print(df.count())

Output:

Name        457
Team        457
Number      457
Position    457
Age         457
Height      457
Weight      457
College     373
Salary      446
dtype: int64

Columns and their total number of fields are mentioned in the output. Here the default value of the axis =0, numeric_only=False and level=None. You can try and change the value of the attributes by yourself to observe the results and understand the concept in a better way.

groupby() in Pandas

While analysing huge dataframes this groupby() functionality of pandas is quite a help. When we want to study some segment of data from the data frame this groupby() is used. This function splits the data frame into segments according to some criteria specified during the function call.
dataframe.groupby(self,by:= None,axis:= 0,level: = None,as_index: = True,sort: = True,group_keys: = True,squeeze: = False,observed: = False,**kwargs)

by: its a mapping function, by default set to None
axis: int type of attribute with default value 0.
level: this used when the axis is multi-index
as_index: it takes two boolean values, by default True. If set to False it will show the index column.
group_keys: It is used when we want to add group keys to the index to identify pieces.
squeeze: When it is set True then if possible the dimension of dataframe is reduced.
groupby() function returns a group by an object.

import pandas as pd
df = pd.read_csv("data.csv")
df_use=df.groupby('College')

here we have used groupby() function over a CSV file. We have grouped by ‘College’, this will form the segments in the data frame according to College.
Now, let’s say we want to know how many teams a College has,

print(df_use.Team.count())

This will show us the number of teams in a College.

Output:
College
Alabama              3
Arizona             13
Arizona State        2
Arkansas             3
Baylor               1

So this is how we can easily segment the data frame and use it according to our need.

 

Leave a Reply

Your email address will not be published. Required fields are marked *