# Crosstab in Python using Pandas – Full Guide

This tutorial will show you how to crosstab two or more factors in Python using the Pandas library function. Cross-tabulation is used to analyze the relationship between the factors. It is mainly used on categorical data. Categorical data means the data that can be divided into groups- e.g. Gender. This can be easily done using the Pandas library’s * crosstab *function. Using the

*crosstab*function you can manipulate the data.

The crosstab function is similar to the pandas ‘pivot table’ function. But the crosstab function has some advantages over it.

- There is no need to create the DataFrame of data before cross-tabulation.
- You can normalize the result as a mean, sum, percentage, etc.

## crosstab function

The syntax and parameters of the function are as follows.

**Syntax with the default values:**

`pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margin=False, margins_name: str= 'All', dropna: bool =True, normalize=False)`

**Parameters:**

*Note*: The parameters with default values are optional while others are mandatory to pass to function.

*index*: A collection of values to group by in rows.*columns*: A collection of values to group by in columns.*values*: A collection of values to aggregate.*rownames*: A collection of names should be given to rows.*colnames*: A collection of names should be given to rows.*aggfunc*: A function to aggregate the data. e.g.-mean, sum, etc.*margin*: row or column margins(total).*margins_name*: name of row or column margin.*dropna*: to drop the nan/null values from data.*normalize*: normalize the data by dividing each value by the sum of all values.

### Python crosstab function example

let’s see the example of cross-tabulation using the pandas crosstab function.

The sample data to use in the code is (abc.xlsx):

Here is the code for cross-tabulation:

import pandas as pd data = pd.read_excel('abc.xlsx') pd.crosstab(data.Class, [data.Result, data.Gender], rownames=['Class'], colnames=['Result', 'Gender'])

**Output**:

The cross-tabulation took ‘Class’ row-wise and ‘Result’ and ‘Gender’ column-wise. You will get the relation between these factors.

## Leave a Reply