NumPy Correlation in Python
Hi, guys today we will be going to learn about the correlation in Python using the NumPy library. So basically, what is a correlation?
In simple language, a correlation is a relationship between two random variables basically with respect to statistics.
One important term while learning correlation is the correlation coefficient. A correlation coefficient is a statistical measure of the change in one variable defined by another variable. In a simple meaning, you can say that the degree of intensity of the relationship between two variables is defined by the coefficient of correlation.
Positive correlation: The relationship between two variables where both the variables either decreases or increases
Negative correlation: The relationship between two variables in which if one variable changes the other variable changes inversely i.e. if variable1 increases then variable2 decreases.
There are different correlation defined in statistics. For instance, some of them are:
- Pearson’s correlation
- Spearman’s correlation
- Kendall’s correlation
A correlation linear in nature is measured by the first one, while the ranks of data is compared by the other two. Different NumPy correlation function and methods are there to calculate the above coefficients, Matplotlib can be used to display the results.
NumPy Correlation Calculation in Python
NumPy has np.corrcoef(), which returns a Pearson correlation coefficient’s matrix. For these, Let’s first import the NumPy library and define two arrays.
import numpy as np x=np.arange(30,40) y=np.array([5,3,7,6,10,14,19,35,94,58])
We use np.arange() to create an array x of integers between 10 (inclusive) and 20 (exclusive). The array y can be created by using the array() method of nd array.
Now let’s call np.corrcoef() function as we have two arrays. The argument to this np.corrcoef() function will be the two arrays that we have created.
r=np.corrcoef(x,y) print(r) print(r[0,1]) print(r[1,0])
[[1. 0.80323888] [0.80323888 1. ]] 0.8032388831482586 0.8032388831482586
The corrcoef() returns the correlation matrix, which is a two-dimensional array with the correlation coefficients. (Understanding NumPy array dimensions in Python )
The main diagonal of the matrix is equal to 1. The upper left value is the correlation coefficient for x and x. Similarly, the lower right value is the correlation coefficient for y and y. They are always equal to 1.
The lower left and upper right values of the correlation matrix are equal and represent the Pearson correlation coefficient for x and y In this case, it’s approximately 0.80.
In conclusion, we can say that the corrcoef() method of the NumPy library is used to calculate the correlation in Python.