Calculate Non Parametric Correlation in Python

Learn how to calculate Non Parametric Correlation in Python.

The Correlation coefficient is a measure of the quantitative relationship between two variables. It measures the strength and direction of the association between two variables.

When two variables follow the Gaussian distribution it is easy to calculate the correlation coefficient. Normally correlation coefficient is measured by normalizing the covariance between two variables by the standard deviation of both variables. Therefore the range of correlation coefficient is from -1 to +1. -ve 1 indicates a strong negative correlation while +ve 1 indicates a perfect correlation and 0 means neutral. It is denoted by r and also called Pearson’s Correlation Coefficient.

But when we want to calculate the relationship between the two non-gaussian variables or Non Parametric Correlation it is not straight forward. Therefore in this tutorial, we will learn to calculate the correlation coefficient for non-gaussian variables. This tutorial has two parts:

  1. Spearman’s correlation coefficient
  2. Kendall’s correlation coefficient

Non Parametric Correlation in Python

So, let’s begin…

Charles Spearman introduced a method called Spearman’s Rank Correlation. As the name suggests,
– It first calculates the rank of both the variables.
– After getting the ranks of the variables it measures the Pearson’s correlation.

We will use spearmanr() function from the SciPy library in Python to calculate the correlation coefficient.
Simply we will pass the two samples as an argument in the function which will return the correlation coefficient and p-value to check the significance of correlation value.

# calculate the spearman's correlation between two variables
import numpy as np
from numpy.random import seed 
from scipy.stats import spearmanr # seed random number generator seed(1) # prepare data 
record1 = np.random.rand(500) * 20 
record2 = record1 + (np.random.rand(500) * 10) # calculate spearman's correlation
coeff, pvalue = spearmanr(record1, record1)
print('Spearmans correlation coefficient: %.3f' % coeff) # interpret the significance 
significance = 0.05 
if pvalue > significance:
    print('Samples are uncorrelated with p=%.3f' % pvalue)
else:
    print('Samples are correlated with p=%.3f' % pvalue)
Output:
Spearmans correlation coefficient: 0.900
Samples are correlated (reject H0) p=0.000

Kendall’s Correlation

Another simple concept by Maurice Kendall named Kendall’s correlation coefficient. It basically calculates the normalized score of the concordant rankings and discordant rankings between the two samples i.e (c – d)/(c +d). c is concordant and d refers to discordant. Therefore, Kendall’s concordance test.

In Python kendalltau() function calculates Kendall’s Correlation. It returns the correlation value and p-value for significance test.

from scipy.stats import kendalltau
k_coef, k_p = kendalltau(record1, record2) 
print('Kendall correlation coefficient: %.3f' % k_coef) # interpret the significance
significance = 0.05
if k_p > significance:
    print('Samples are uncorrelated with p=%.3f' % k_p)
else: 
    print('Samples are correlated with p=%.3f' % k_p)
Output:
Kendall correlation coefficient: 0.709
Samples are correlated (reject H0) p=0.000

Hence we have now successfully learned to calculate Non Parametric Correlation in Python.

Leave a Reply

Your email address will not be published. Required fields are marked *