Contingency Table in Python

In this module, we will discuss the Contingency Table in Python.

When there is only one variable is involved it is easy to analyze those problems. Basically, the Contingency Table is used to analyze two or more variables. It will make the data analysis and visualization easier and more clear.

Let us look at a sample contingency table. It will be better to use Jupyter Notebook for its implementation.

First, we have to load the required libraries.

import numpy as np 
import pandas as pd 
import matplotlib as plt

Reading data table which we have to visualize into the variable data. Here I use some random data table.

df = pd.read_csv("random_data.csv") 
df.head()

Output:

Country Region Population (millions) HDI GDP per Capita Cropland Footprint Grazing Footprint Forest Footprint Carbon Footprint Fish Footprint Cropland Grazing Land Forest Land Fishing Water Urban Land Total Biocapacity Biocapacity Deficit or Reserve Earths Required Countries Required Data Quality
0 Afghanistan Middle East/Central Asia 29.82 0.46 $614.66 0.30 0.20 0.08 0.18 0.00 0.24 0.20 0.02 0.00 0.04 0.50 -0.30 0.46 1.60 6
1 Albania Northern/Eastern Europe 3.16 0.73 $4,534.37 0.78 0.22 0.25 0.87 0.02 0.55 0.21 0.29 0.07 0.06 1.18 -1.03 1.27 1.87 6
2 Algeria Africa 38.48 0.73 $5,430.57 0.60 0.16 0.17 1.14 0.01 0.24 0.27 0.03 0.01 0.03 0.59 -1.53 1.22 3.61 5
3 Angola Africa 20.82 0.52 $4,665.91 0.33 0.15 0.12 0.20 0.09 0.20 1.42 0.64 0.26 0.04 2.55 1.61 0.54 0.37 6
4 Antigua and Barbuda Latin America 0.09 0.78 $13,205.10 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.94 -4.44 3.11 5.70 2

Checking the data info.

df.info()

 

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 188 entries, 0 to 187
Data columns (total 21 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   Country                         188 non-null    object 
 1   Region                          188 non-null    object 
 2   Population (millions)           188 non-null    float64
 3   HDI                             172 non-null    float64
 4   GDP per Capita                  173 non-null    object 
 5   Cropland Footprint              173 non-null    float64
 6   Grazing Footprint               173 non-null    float64
 7   Forest Footprint                173 non-null    float64
 8   Carbon Footprint                173 non-null    float64
 9   Fish Footprint                  173 non-null    float64
 10  Total Ecological Footprint      188 non-null    float64
 11  Cropland                        173 non-null    float64
 12  Grazing Land                    173 non-null    float64
 13  Forest Land                     173 non-null    float64
 14  Fishing Water                   173 non-null    float64
 15  Urban Land                      173 non-null    float64
 16  Total Biocapacity               188 non-null    float64
 17  Biocapacity Deficit or Reserve  188 non-null    float64
 18  Earths Required                 188 non-null    float64
 19  Countries Required              188 non-null    float64
 20  Data Quality                    188 non-null    object 
dtypes: float64(17), object(4)
memory usage: 31.0+ KB

Now we use the contingency table to show the correlation between multiple variables

crosstab = pd.crosstab(df['Open'], 
                            df['Volume'],  
                               margins = False)
Data Quality 3B 3L 4 5 6
Cropland
0.00 0 1 0 1 0
0.01 0 2 0 0 0
0.02 0 2 0 0 1
0.04 2 0 0 0 0
0.05 0 1 0 0 0

Leave a Reply

Your email address will not be published. Required fields are marked *