Contingency Table in Python
In this module, we will discuss the Contingency Table in Python.
When there is only one variable is involved it is easy to analyze those problems. Basically, the Contingency Table is used to analyze two or more variables. It will make the data analysis and visualization easier and more clear.
Let us look at a sample contingency table. It will be better to use Jupyter Notebook for its implementation.
First, we have to load the required libraries.
import numpy as np import pandas as pd import matplotlib as plt
Reading data table which we have to visualize into the variable data. Here I use some random data table.
df = pd.read_csv("random_data.csv") df.head()
Output:
Country | Region | Population (millions) | HDI | GDP per Capita | Cropland Footprint | Grazing Footprint | Forest Footprint | Carbon Footprint | Fish Footprint | … | Cropland | Grazing Land | Forest Land | Fishing Water | Urban Land | Total Biocapacity | Biocapacity Deficit or Reserve | Earths Required | Countries Required | Data Quality | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | Middle East/Central Asia | 29.82 | 0.46 | $614.66 | 0.30 | 0.20 | 0.08 | 0.18 | 0.00 | … | 0.24 | 0.20 | 0.02 | 0.00 | 0.04 | 0.50 | -0.30 | 0.46 | 1.60 | 6 |
1 | Albania | Northern/Eastern Europe | 3.16 | 0.73 | $4,534.37 | 0.78 | 0.22 | 0.25 | 0.87 | 0.02 | … | 0.55 | 0.21 | 0.29 | 0.07 | 0.06 | 1.18 | -1.03 | 1.27 | 1.87 | 6 |
2 | Algeria | Africa | 38.48 | 0.73 | $5,430.57 | 0.60 | 0.16 | 0.17 | 1.14 | 0.01 | … | 0.24 | 0.27 | 0.03 | 0.01 | 0.03 | 0.59 | -1.53 | 1.22 | 3.61 | 5 |
3 | Angola | Africa | 20.82 | 0.52 | $4,665.91 | 0.33 | 0.15 | 0.12 | 0.20 | 0.09 | … | 0.20 | 1.42 | 0.64 | 0.26 | 0.04 | 2.55 | 1.61 | 0.54 | 0.37 | 6 |
4 | Antigua and Barbuda | Latin America | 0.09 | 0.78 | $13,205.10 | NaN | NaN | NaN | NaN | NaN | … | NaN | NaN | NaN | NaN | NaN | 0.94 | -4.44 | 3.11 | 5.70 | 2 |
Checking the data info.
df.info()
Output:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 188 entries, 0 to 187 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Country 188 non-null object 1 Region 188 non-null object 2 Population (millions) 188 non-null float64 3 HDI 172 non-null float64 4 GDP per Capita 173 non-null object 5 Cropland Footprint 173 non-null float64 6 Grazing Footprint 173 non-null float64 7 Forest Footprint 173 non-null float64 8 Carbon Footprint 173 non-null float64 9 Fish Footprint 173 non-null float64 10 Total Ecological Footprint 188 non-null float64 11 Cropland 173 non-null float64 12 Grazing Land 173 non-null float64 13 Forest Land 173 non-null float64 14 Fishing Water 173 non-null float64 15 Urban Land 173 non-null float64 16 Total Biocapacity 188 non-null float64 17 Biocapacity Deficit or Reserve 188 non-null float64 18 Earths Required 188 non-null float64 19 Countries Required 188 non-null float64 20 Data Quality 188 non-null object dtypes: float64(17), object(4) memory usage: 31.0+ KB
Now we use the contingency table to show the correlation between multiple variables
crosstab = pd.crosstab(df['Open'], df['Volume'], margins = False)
Data Quality | 3B | 3L | 4 | 5 | 6 |
---|---|---|---|---|---|
Cropland | |||||
0.00 | 0 | 1 | 0 | 1 | 0 |
0.01 | 0 | 2 | 0 | 0 | 0 |
0.02 | 0 | 2 | 0 | 0 | 1 |
0.04 | 2 | 0 | 0 | 0 | 0 |
0.05 | 0 | 1 | 0 | 0 | 0 |
Leave a Reply