# Correlation calculation between variables in Python

Hi guys, In this article, we will be looking at the steps to calculate the correlation between variables in Python. In simple language, a correlation is a relationship between two random variables basically with respect to statistics.

Refer to the following article for more details on correlation:  Correlation in Python

Below are some common correlation defined in statistics.

• Pearson’s correlation
• Spearman’s correlation
• Kendall’s correlation

## Calculating Correlation in Python

We can measure the correlation between two or more variables using the Pingouin module. The very first step is to install the package by using the basic command

`pip install --upgrade pingouin`

Once you have installed the package import it in the program

`import pingouin as pi`

Now let’s take a random data set that contains the outcome of personality tests of 200 individuals also including their age, height, weight and IQ. (If you want I can give you the code to generate the random dataset)
We have calculated the correlation between the height and weight of the individuals using the pingouin.corr function.

`pi.corr(x=df['Height'], y=df['Weight'])`

Full code

```import pingouin as pi
import pandas

print('%i people and %x columns' % df.shape)

pi.corr(x=df['Height'], y=df['Weight'])```

The output of the above code will be

`200 subjects and 4 columns`
pearson2000.485[0.37, 0.58]0.2350.2273.595866e-132.179e+101.0

Here r is the correlation coefficient.
This method is a little confusing. We have one easy method(The above module is based on this method). In this we simply have to create the dataframe(df) and call df.corr(method=” “)  in which the method takes three arguments(‘pearson’ , ‘kendall’ , ‘spearman’). For instance, look below for the implementation.

```import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb

pearson_correlation = df.corr(method='pearson')
print(pearson_correlation)
sb.heatmap(pearson_correlation,
xticklabels=pearson_correlation.columns,
yticklabels=pearson_correlation.columns,
cmap="YlGnBu",
annot=True,
linewidth=0.5)
spearman_correlation=df.corr(method='spearman')
print(spearman_correlation)
kendall_correlation=df.corr(method='kendall')
print(kendall_correlation)```

Output:

```    Age        IQ    Height    Weight
Age     1.000000 -0.091642 -0.037185  0.062123
IQ     -0.091642  1.000000 -0.027006 -0.008442
Height -0.037185 -0.027006  1.000000  0.484540
Weight  0.062123 -0.008442  0.484540  1.000000
Age        IQ    Height    Weight
Age     1.000000 -0.061948 -0.018034  0.038593
IQ     -0.061948  1.000000 -0.029939  0.015395
Height -0.018034 -0.029939  1.000000  0.457071
Weight  0.038593  0.015395  0.457071  1.000000
Age        IQ    Height    Weight
Age     1.000000 -0.041663 -0.009941  0.029109
IQ     -0.041663  1.000000 -0.017685  0.011402
Height -0.009941 -0.017685  1.000000  0.315211
Weight  0.029109  0.011402  0.315211  1.000000
``` Here I have used the seaborn and matplotlib module to show the above picture as the output gets little messy to study directly. Here I have drawn the heatmap only for the Pearson correlation.

As you can see the diagonal values are 1 which represents a strong positive relationship between the two same variables. To determine the correlation between two different variables just search the corresponding row name to the corresponding column name.