How to Plot Correlation Matrix in Python
A dataset contains many variables. Where some variables depend on one another, and some may be independent. For creating a better model we must understand how variables of the dataset related to one another. A correlation matrix help to learn about the relationship between the variables of the dataset. In this article, we will learn how to calculate and plot a correlation matrix using Python.
A correlation can be positive or negative and sometimes it can be neutral also.
- Positive correlation: Both variables depend on one another
- Negative correlation: Both variables are not dependent on each other.
- Neutral correlation: Both variables are independent.
The dataset used for the demo can download from here.
Correlation Matrix in Python
We will Seaborn module to plot the correlation matrix. Python has an inbuilt corr() method to calculate the correlation of a dataset
Step1: Import the required modules
import numpy as np # pandas used to read CSV files import pandas as pd import matplotlib.pyplot as plt import seaborn as sns sns.set() %matplotlib inline
Step2: Import the data
- Use the read_csv() method to read the CSV file.
- Use the head() method to print the first n rows of the dataset.
train_data = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Dataset/mobile_price.csv') train_data.head()
Output
Step3: Select the columns
The dataset contains many columns, but we are going to select only a few columns.
Note: You can also try on all the columns of the dataset.
columns_show = ['battery_power', 'dual_sim', 'four_g', 'touch_screen', 'price_range', 'ram']
Step4: Generate a correlation matrix
We directly use corr() method to calculate the correlation of the dataset
# train_data[columns_show] used to select the columns of the train_data that are only in coloumns_show corr_matrix = train_data[columns_show].corr() corr_matrix
Step5: Plot the Correlation matrix
The heatmap is used to plot the correlation matrix. annot = True helps to show correlation value in the plot.
sns.heatmap(corr_matrix, annot= True) plt.show()
Output
Also, refer
Leave a Reply