Create a Seaborn Correlation Heatmap in Python

In this tutorial, I’ll guide you through creating a Correlation Heatmap using Seaborn, and we will explore the customization part of Heatmap to make it more informative. Let’s first have a quick introduction about Seaborn. So, Seaborn is a Python library built over matplotlib and is widely used to create quick, attractive, high-quality plots. Whether you are a beginner or an experienced one, you have come to the right place to explore the Correlation Heatmap.

Inference of Correlation Heatmap

Correlation Heatmap is highly used in Data Analysis and Statistics to analyze and illustrate pairwise correlation coefficients between different variables in a dataset. Mathematically, the correlation coefficient between two variables, often denoted as r quantifies the strength and direction of linear relationship between the variables. It ranges from -1 to 1 where:
r = 1 indicates a perfect positive linear relationship
r = -1 indicates a perfect negative linear relationship
r = 0 indicates no linear relationship
To calculate the correlation coefficient, take the covariance (measure of how much variable1 and variable2 vary together) between variable1 and variable2 and normalize it by the product of their standard deviations.

Now as we understand the term, let’s dive into creating it.

Prerequisites

Before we begin, make sure you have the necessary libraries installed:

pip install matplotlib seaborn

If not installed, copy the above line in your terminal or command prompt and hit Enter. You can replace pip with pip3 according to your system version.

Step 1: Importing libraries

First things first, you can just import all the necessary libraries.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import random

Step 2: Loading Dataset

Now, load the dataset which contains columns between which you want to generate the Correlation. Here, I am creating two arrays of 20 unique random integers using Numpy and random library for demonstration purposes.

X = np.random.choice(range(1,101),size=20, replace=False)
Y = np.random.choice(range(1,101),size=20, replace=False)

print(X)
print(Y)

It will generate the following output:

[ 80 15 58 100 33 84 99 23 45 77 21 53 44 30 60 20 74 89  55 92] 
[95 82 56 36 67 12 54 91 93 25 85 50 40 76 9 35 57 22 29 41]

Step 3: Creating Seaborn Correlation Heatmap

Let’s create the Correlation heat map to visualize the correlation of variable X with variable Y.

correlation_mat = np.corrcoef(X,Y)

plt.figure(figsize=(8, 6))
sns.heatmap(correlation_mat)
plt.title('Correlation Heatmap')
plt.show()

np.corrcoef(X,Y) directly calculates the correlation coefficients from array without converting into dataframe.

It will generate the following output:

seaborn heatmap

If you want to create the correlation heatmap from the Dataframe then use the below code:

df = pd.DataFrame({'Variable 1':X, 'Variable 2': Y})

corr_mat = df.corr()

plt.figure(figsize=(8, 6))
sns.heatmap(corr_mat)
plt.title('Correlation Heatmap')
plt.show()

pd.DataFrame is used to convert the arrays into DataFrame. Then, you can use df.corr to generate the correlation matrix.

It will generate the following output:

Create a Seaborn Correlation Heatmap in Python

Step 4: Customizing the Heatmap

The heat map generated from the dataframe contains labels to their side by default while the heatmap generated directly from numpy array doesn’t. To add the desired label add some parameters :

plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix,xticklabels=['Variable1', 'Variable2'], yticklabels=['Variable1', 'Variable2'])
plt.title('Correlation Heatmap')
plt.show()

The output will be:

Customizing the Heatmap

You can also add the correlation matrix values on the heatmap to make it more informative. Just use the below code :

plt.figure(figsize=(8, 6))
sns.heatmap(correlation_mat , annot=True)
plt.title('Correlation Heatmap')
plt.show()

Below is the generated output:

Customizing the Heatmap

Apart from these parameters, you can also input :

  • fmt: This is a parameter that allows formatting string for labels. When you want to write in decimals or percentages, this parameter comes in handy there.
  • linewidth: To control the width of the line dividing the matrix into submatrices
  • cmap: To change the color theme of the heatmap.

Sample output using all the above parameters:

Create a Seaborn Correlation Heatmap

Conclusion

Congratulations! You have successfully developed a Seaborn Correlation Heatmap. Now you know how to generate and, more importantly, what significance it has in your dataset. In this tutorial, we first examined the mathematics behind the correlation matrix. Then we looked at two ways to generate a Heatmap: one is from a numpy array, and the other is from a dataframe. Further, we customized the appearance of the Heatmap to make it more appealing and informative.

Please feel free to experiment with other parameters.

Leave a Reply

Your email address will not be published. Required fields are marked *