Calculate Non Parametric Correlation in Python
Learn how to calculate Non Parametric Correlation in Python.
The Correlation coefficient is a measure of the quantitative relationship between two variables. It measures the strength and direction of the association between two variables.
When two variables follow the Gaussian distribution it is easy to calculate the correlation coefficient. Normally correlation coefficient is measured by normalizing the covariance between two variables by the standard deviation of both variables. Therefore the range of correlation coefficient is from -1 to +1. -ve 1 indicates a strong negative correlation while +ve 1 indicates a perfect correlation and 0 means neutral. It is denoted by r and also called Pearson’s Correlation Coefficient.
But when we want to calculate the relationship between the two non-gaussian variables or Non Parametric Correlation it is not straight forward. Therefore in this tutorial, we will learn to calculate the correlation coefficient for non-gaussian variables. This tutorial has two parts:
- Spearman’s correlation coefficient
- Kendall’s correlation coefficient
Non Parametric Correlation in Python
So, let’s begin…
Charles Spearman introduced a method called Spearman’s Rank Correlation. As the name suggests,
– It first calculates the rank of both the variables.
– After getting the ranks of the variables it measures the Pearson’s correlation.
We will use spearmanr() function from the SciPy library in Python to calculate the correlation coefficient.
Simply we will pass the two samples as an argument in the function which will return the correlation coefficient and p-value to check the significance of correlation value.
# calculate the spearman's correlation between two variables import numpy as np from numpy.random import seed from scipy.stats import spearmanr # seed random number generator seed(1) # prepare data record1 = np.random.rand(500) * 20 record2 = record1 + (np.random.rand(500) * 10) # calculate spearman's correlation coeff, pvalue = spearmanr(record1, record1) print('Spearmans correlation coefficient: %.3f' % coeff) # interpret the significance significance = 0.05 if pvalue > significance: print('Samples are uncorrelated with p=%.3f' % pvalue) else: print('Samples are correlated with p=%.3f' % pvalue)
Output: Spearmans correlation coefficient: 0.900 Samples are correlated (reject H0) p=0.000
Another simple concept by Maurice Kendall named Kendall’s correlation coefficient. It basically calculates the normalized score of the concordant rankings and discordant rankings between the two samples i.e (c – d)/(c +d). c is concordant and d refers to discordant. Therefore, Kendall’s concordance test.
In Python kendalltau() function calculates Kendall’s Correlation. It returns the correlation value and p-value for significance test.
from scipy.stats import kendalltau k_coef, k_p = kendalltau(record1, record2) print('Kendall correlation coefficient: %.3f' % k_coef) # interpret the significance significance = 0.05 if k_p > significance: print('Samples are uncorrelated with p=%.3f' % k_p) else: print('Samples are correlated with p=%.3f' % k_p)
Output: Kendall correlation coefficient: 0.709 Samples are correlated (reject H0) p=0.000
Hence we have now successfully learned to calculate Non Parametric Correlation in Python.