Crosstab in Python using Pandas – Full Guide

This tutorial will show you how to crosstab two or more factors in Python using the Pandas library function. Cross-tabulation is used to analyze the relationship between the factors. It is mainly used on categorical data. Categorical data means the data that can be divided into groups- e.g. Gender. This can be easily done using the Pandas library’s crosstab function. Using the crosstab function you can manipulate the data.

The crosstab function is similar to the pandas ‘pivot table’ function. But the crosstab function has some advantages over it.

  • There is no need to create the DataFrame of data before cross-tabulation.
  • You can normalize the result as a mean, sum, percentage, etc.

crosstab function

The syntax and parameters of the function are as follows.

Syntax with the default values:

pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margin=False, margins_name: str= 'All', dropna: bool =True, normalize=False)

Note: The parameters with default values are optional while others are mandatory to pass to function.

  1. index: A collection of values to group by in rows.
  2. columns: A collection of values to group by in columns.
  3. values: A collection of values to aggregate.
  4. rownames: A collection of names should be given to rows.
  5. colnames: A collection of names should be given to rows.
  6. aggfunc: A function to aggregate the data. e.g.-mean, sum, etc.
  7. margin: row or column margins(total).
  8. margins_name: name of row or column margin.
  9. dropna: to drop the nan/null values from data.
  10. normalize: normalize the data by dividing each value by the sum of all values.

Python crosstab function example

let’s see the example of cross-tabulation using the pandas crosstab function.
The sample data to use in the code is (abc.xlsx):

Crosstab in Python using Pandas

Here is the code for cross-tabulation:

import pandas as pd
data = pd.read_excel('abc.xlsx')
pd.crosstab(data.Class, [data.Result, data.Gender], rownames=['Class'], colnames=['Result', 'Gender'])

The cross-tabulation took ‘Class’ row-wise and ‘Result’ and ‘Gender’ column-wise. You will get the relation between these factors.

Leave a Reply

Your email address will not be published. Required fields are marked *