# Correlation calculation between variables in Python

Hi guys, In this article, we will be looking at the steps to calculate the correlation between variables in Python. In simple language, a correlation is a relationship between two random variables basically with respect to statistics.

Refer to the following article for more details on correlation: Correlation in Python

Below are some common correlation defined in statistics.

- Pearson’s correlation
- Spearman’s correlation
- Kendall’s correlation

## Calculating Correlation in Python

We can measure the correlation between two or more variables using the Pingouin module. The very first step is to install the package by using the basic command

pip install --upgrade pingouin

Once you have installed the package import it in the program

import pingouin as pi

Now let’s take a random data set that contains the outcome of personality tests of 200 individuals also including their age, height, weight and IQ. (If you want I can give you the code to generate the random dataset)

We have calculated the correlation between the height and weight of the individuals using the pingouin.corr function.

`pi.corr(x=df['Height'], y=df['Weight'])`

Full code

import pingouin as pi import pandas df = pandas.read_csv('myDataset.csv') print('%i people and %x columns' % df.shape) df.head() pi.corr(x=df['Height'], y=df['Weight'])

The output of the above code will be

Here r is the correlation coefficient.

This method is a little confusing. We have one easy method(The above module is based on this method). In this we simply have to create the dataframe(df) and call **df.corr(method=” “) **in which the method takes three arguments(‘pearson’ , ‘kendall’ , ‘spearman’). For instance, look below for the implementation.

import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sb df = pandas.read_csv('myDataset.csv') df.head() pearson_correlation = df.corr(method='pearson') print(pearson_correlation) sb.heatmap(pearson_correlation, xticklabels=pearson_correlation.columns, yticklabels=pearson_correlation.columns, cmap="YlGnBu", annot=True, linewidth=0.5) spearman_correlation=df.corr(method='spearman') print(spearman_correlation) kendall_correlation=df.corr(method='kendall') print(kendall_correlation)

Output:

As you can see the diagonal values are 1 which represents a strong positive relationship between the two same variables. To determine the correlation between two different variables just search the corresponding row name to the corresponding column name.

## Leave a Reply

You must be logged in to post a comment.