Contingency Table in Python

In this module, we will discuss the Contingency Table in Python.

When there is only one variable is involved it is easy to analyze those problems. Basically, the Contingency Table is used to analyze two or more variables. It will make the data analysis and visualization easier and more clear.

Let us look at a sample contingency table. It will be better to use Jupyter Notebook for its implementation.

First, we have to load the required libraries.

import numpy as np 
import pandas as pd 
import matplotlib as plt

Reading data table which we have to visualize into the variable data. Here I use some random data table.

df = pd.read_csv("random_data.csv") 
df.head()

Output:

CountryRegionPopulation (millions)HDIGDP per CapitaCropland FootprintGrazing FootprintForest FootprintCarbon FootprintFish FootprintCroplandGrazing LandForest LandFishing WaterUrban LandTotal BiocapacityBiocapacity Deficit or ReserveEarths RequiredCountries RequiredData Quality
0AfghanistanMiddle East/Central Asia29.820.46$614.660.300.200.080.180.000.240.200.020.000.040.50-0.300.461.606
1AlbaniaNorthern/Eastern Europe3.160.73$4,534.370.780.220.250.870.020.550.210.290.070.061.18-1.031.271.876
2AlgeriaAfrica38.480.73$5,430.570.600.160.171.140.010.240.270.030.010.030.59-1.531.223.615
3AngolaAfrica20.820.52$4,665.910.330.150.120.200.090.201.420.640.260.042.551.610.540.376
4Antigua and BarbudaLatin America0.090.78$13,205.10NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.94-4.443.115.702

Checking the data info.

df.info()

 

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 188 entries, 0 to 187
Data columns (total 21 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   Country                         188 non-null    object 
 1   Region                          188 non-null    object 
 2   Population (millions)           188 non-null    float64
 3   HDI                             172 non-null    float64
 4   GDP per Capita                  173 non-null    object 
 5   Cropland Footprint              173 non-null    float64
 6   Grazing Footprint               173 non-null    float64
 7   Forest Footprint                173 non-null    float64
 8   Carbon Footprint                173 non-null    float64
 9   Fish Footprint                  173 non-null    float64
 10  Total Ecological Footprint      188 non-null    float64
 11  Cropland                        173 non-null    float64
 12  Grazing Land                    173 non-null    float64
 13  Forest Land                     173 non-null    float64
 14  Fishing Water                   173 non-null    float64
 15  Urban Land                      173 non-null    float64
 16  Total Biocapacity               188 non-null    float64
 17  Biocapacity Deficit or Reserve  188 non-null    float64
 18  Earths Required                 188 non-null    float64
 19  Countries Required              188 non-null    float64
 20  Data Quality                    188 non-null    object 
dtypes: float64(17), object(4)
memory usage: 31.0+ KB

Now we use the contingency table to show the correlation between multiple variables

crosstab = pd.crosstab(df['Open'], 
                            df['Volume'],  
                               margins = False)
Data Quality3B3L456
Cropland
0.0001010
0.0102000
0.0202001
0.0420000
0.0501000

Leave a Reply

Your email address will not be published. Required fields are marked *