NumPy Correlation in Python

Hi, guys today we will be going to learn about the correlation in Python using the NumPy library. So basically, what is a correlation?
In simple language, a correlation is a relationship between two random variables basically with respect to statistics.

One important term while learning correlation is the correlation coefficient. A correlation coefficient is a statistical measure of the change in one variable defined by another variable. In a simple meaning, you can say that the degree of intensity of the relationship between two variables is defined by the coefficient of correlation.

Positive correlation: The relationship between two variables where both the variables either decreases or increases

Negative correlation: The relationship between two variables in which if one variable changes the other variable changes inversely i.e. if variable1 increases then variable2 decreases.

There are different correlation defined in statistics. For instance, some of them are:

  • Pearson’s correlation
  • Spearman’s correlation
  • Kendall’s correlation

A correlation linear in nature is measured by the first one, while the ranks of data is compared by the other two. Different NumPy correlation function and methods are there to calculate the above coefficients, Matplotlib can be used to display the results.

NumPy Correlation Calculation in Python

NumPy has np.corrcoef(), which returns a Pearson correlation coefficient’s matrix. For these, Let’s first import the NumPy library and define two arrays.

import numpy as np
x=np.arange(30,40)
y=np.array([5,3,7,6,10,14,19,35,94,58])

We use np.arange() to create an array x of integers between 10 (inclusive) and 20 (exclusive). The array y can be created by using the array() method of nd array.

Now let’s call np.corrcoef() function as we have two arrays. The argument to this np.corrcoef() function will be the two arrays that we have created.

r=np.corrcoef(x,y)
print(r)
print(r[0,1])
print(r[1,0])

Output:

[[1.         0.80323888]
 [0.80323888 1.        ]]
0.8032388831482586
0.8032388831482586

Explanation

The corrcoef() returns the correlation matrix, which is a two-dimensional array with the correlation coefficients. (Understanding NumPy array dimensions in Python )

The main diagonal of the matrix is equal to 1. The upper left value is the correlation coefficient for x and x. Similarly, the lower right value is the correlation coefficient for y and y. They are always equal to 1.

The lower left and upper right values of the correlation matrix are equal and represent the Pearson correlation coefficient for x and y In this case, it’s approximately 0.80.

In conclusion, we can say that the corrcoef() method of the NumPy library is used to calculate the correlation in Python.

One response to “NumPy Correlation in Python”

  1. Daniel Bachrach says:

    Within a given .txt file there are thousands of words. I would like to be able to calculate the correlation between sets of words. For example, I would like to be able to calculate how frequently the set of words (x1, x2, x3, and x4) correlates with the set of words (y1, y2, y3, and y4.) I would like to be able to define the sets iteratively, so I can evaluate the correlation between different sets of words. I would also like to limit constraints on the number of words in the sets, so that they can be different lengths. Do you happen to have syntax you could send that I could work with? Thank you!

Leave a Reply

Your email address will not be published. Required fields are marked *