Contingency Table in Python

Post Views: 955

In this module, we will discuss the Contingency Table in Python.

When there is only one variable is involved it is easy to analyze those problems. Basically, the Contingency Table is used to analyze two or more variables. It will make the data analysis and visualization easier and more clear.

Let us look at a sample contingency table. It will be better to use Jupyter Notebook for its implementation.

First, we have to load the required libraries.

import numpy as np 
import pandas as pd 
import matplotlib as plt

Reading data table which we have to visualize into the variable data. Here I use some random data table.

df = pd.read_csv("random_data.csv") 
df.head()

Output:

Country	Region	Population (millions)	HDI	GDP per Capita	Cropland Footprint	Grazing Footprint	Forest Footprint	Carbon Footprint	Fish Footprint	…	Cropland	Grazing Land	Forest Land	Fishing Water	Urban Land	Total Biocapacity	Biocapacity Deficit or Reserve	Earths Required	Countries Required	Data Quality
0	Afghanistan	Middle East/Central Asia	29.82	0.46	$614.66	0.30	0.20	0.08	0.18	0.00	…	0.24	0.20	0.02	0.00	0.04	0.50	-0.30	0.46	1.60	6
1	Albania	Northern/Eastern Europe	3.16	0.73	$4,534.37	0.78	0.22	0.25	0.87	0.02	…	0.55	0.21	0.29	0.07	0.06	1.18	-1.03	1.27	1.87	6
2	Algeria	Africa	38.48	0.73	$5,430.57	0.60	0.16	0.17	1.14	0.01	…	0.24	0.27	0.03	0.01	0.03	0.59	-1.53	1.22	3.61	5
3	Angola	Africa	20.82	0.52	$4,665.91	0.33	0.15	0.12	0.20	0.09	…	0.20	1.42	0.64	0.26	0.04	2.55	1.61	0.54	0.37	6
4	Antigua and Barbuda	Latin America	0.09	0.78	$13,205.10	NaN	NaN	NaN	NaN	NaN	…	NaN	NaN	NaN	NaN	NaN	0.94	-4.44	3.11	5.70	2

Checking the data info.

df.info()

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 188 entries, 0 to 187
Data columns (total 21 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   Country                         188 non-null    object 
 1   Region                          188 non-null    object 
 2   Population (millions)           188 non-null    float64
 3   HDI                             172 non-null    float64
 4   GDP per Capita                  173 non-null    object 
 5   Cropland Footprint              173 non-null    float64
 6   Grazing Footprint               173 non-null    float64
 7   Forest Footprint                173 non-null    float64
 8   Carbon Footprint                173 non-null    float64
 9   Fish Footprint                  173 non-null    float64
 10  Total Ecological Footprint      188 non-null    float64
 11  Cropland                        173 non-null    float64
 12  Grazing Land                    173 non-null    float64
 13  Forest Land                     173 non-null    float64
 14  Fishing Water                   173 non-null    float64
 15  Urban Land                      173 non-null    float64
 16  Total Biocapacity               188 non-null    float64
 17  Biocapacity Deficit or Reserve  188 non-null    float64
 18  Earths Required                 188 non-null    float64
 19  Countries Required              188 non-null    float64
 20  Data Quality                    188 non-null    object 
dtypes: float64(17), object(4)
memory usage: 31.0+ KB

Now we use the contingency table to show the correlation between multiple variables

crosstab = pd.crosstab(df['Open'], 
                            df['Volume'],  
                               margins = False)

Data Quality	3B	3L	4	5	6
Cropland
0.00	0	1	0	1	0
0.01	0	2	0	0	0
0.02	0	2	0	0	1
0.04	2	0	0	0	0
0.05	0	1	0	0	0

Contingency Table in Python

Leave a Reply Cancel reply

Related Posts