Create a Seaborn Correlation Heatmap in Python
In this tutorial, I’ll guide you through creating a Correlation Heatmap using Seaborn, and we will explore the customization part of Heatmap to make it more informative. Let’s first have a quick introduction about Seaborn. So, Seaborn is a Python library built over matplotlib and is widely used to create quick, attractive, high-quality plots. Whether you are a beginner or an experienced one, you have come to the right place to explore the Correlation Heatmap.
Inference of Correlation Heatmap
Correlation Heatmap is highly used in Data Analysis and Statistics to analyze and illustrate pairwise correlation coefficients between different variables in a dataset. Mathematically, the correlation coefficient between two variables, often denoted as r
quantifies the strength and direction of linear relationship between the variables. It ranges from -1 to 1 where:
r = 1 indicates a perfect positive linear relationship
r = -1 indicates a perfect negative linear relationship
r = 0 indicates no linear relationship
To calculate the correlation coefficient, take the covariance (measure of how much variable1
and variable2
vary together) between variable1
and variable2
and normalize it by the product of their standard deviations.
Now as we understand the term, let’s dive into creating it.
Prerequisites
Before we begin, make sure you have the necessary libraries installed:
pip install matplotlib seaborn
If not installed, copy the above line in your terminal or command prompt and hit Enter. You can replace pip with pip3 according to your system version.
Step 1: Importing libraries
First things first, you can just import all the necessary libraries.
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import random
Step 2: Loading Dataset
Now, load the dataset which contains columns between which you want to generate the Correlation. Here, I am creating two arrays of 20 unique random integers using Numpy
and random
library for demonstration purposes.
X = np.random.choice(range(1,101),size=20, replace=False) Y = np.random.choice(range(1,101),size=20, replace=False) print(X) print(Y)
It will generate the following output:
[ 80 15 58 100 33 84 99 23 45 77 21 53 44 30 60 20 74 89 55 92] [95 82 56 36 67 12 54 91 93 25 85 50 40 76 9 35 57 22 29 41]
Step 3: Creating Seaborn Correlation Heatmap
Let’s create the Correlation heat map to visualize the correlation of variable X with variable Y.
correlation_mat = np.corrcoef(X,Y) plt.figure(figsize=(8, 6)) sns.heatmap(correlation_mat) plt.title('Correlation Heatmap') plt.show()
np.corrcoef(X,Y)
directly calculates the correlation coefficients from array without converting into dataframe.
It will generate the following output:
If you want to create the correlation heatmap from the Dataframe then use the below code:
df = pd.DataFrame({'Variable 1':X, 'Variable 2': Y}) corr_mat = df.corr() plt.figure(figsize=(8, 6)) sns.heatmap(corr_mat) plt.title('Correlation Heatmap') plt.show()
pd.DataFrame
is used to convert the arrays into DataFrame. Then, you can use df.corr
to generate the correlation matrix.
It will generate the following output:
Step 4: Customizing the Heatmap
The heat map generated from the dataframe contains labels to their side by default while the heatmap generated directly from numpy array doesn’t. To add the desired label add some parameters :
plt.figure(figsize=(8, 6)) sns.heatmap(correlation_matrix,xticklabels=['Variable1', 'Variable2'], yticklabels=['Variable1', 'Variable2']) plt.title('Correlation Heatmap') plt.show()
The output will be:
You can also add the correlation matrix values on the heatmap to make it more informative. Just use the below code :
plt.figure(figsize=(8, 6)) sns.heatmap(correlation_mat , annot=True) plt.title('Correlation Heatmap') plt.show()
Below is the generated output:
Apart from these parameters, you can also input :
fmt
: This is a parameter that allows formatting string for labels. When you want to write in decimals or percentages, this parameter comes in handy there.linewidth
: To control the width of the line dividing the matrix into submatricescmap
: To change the color theme of the heatmap.
Sample output using all the above parameters:
Conclusion
Congratulations! You have successfully developed a Seaborn Correlation Heatmap. Now you know how to generate and, more importantly, what significance it has in your dataset. In this tutorial, we first examined the mathematics behind the correlation matrix. Then we looked at two ways to generate a Heatmap: one is from a numpy array, and the other is from a dataframe. Further, we customized the appearance of the Heatmap to make it more appealing and informative.
Please feel free to experiment with other parameters.
Leave a Reply