How to compute the covariance of a given data frame using Dataframe.cov() in Pandas

In this tutorial, we will learn how to compute the covariance of a given data frame. The output will be a covariance matrix. This is commonly used in the process of computing the data. If the data frame consists of Nan values, in the final result these values are removed by having the values of covariance. It computes the covariance of the columns. So, let’s begin the tutorial.

Parameters of dataframe.cov()

This method has the following parameters

pandas.Dataframe.cov(min_periods)

min_periods: It is an optional parameter.  Minimum number of observations required per pair of columns in order to bring a valid result.

If no parameter is passed, simply the covariance matrix is given as output.

Example 1

Let us consider a data frame consisting of the following two columns.

import pandas as p
data={'f':[30,190,583,200,1], 's':[9,35,678,265,909]}
d=p.DataFrame(data)
print(d)

OUTPUT:

  f   s
0 30  9
1 190 35
2 583 678
3 200 265
4 1   909

Using cov() without any parameters

We will now use the cov() method on the above data frame.

import pandas as p
data={'f':[30,190,583,200,1], 's':[9,35,678,265,909]}
d=p.DataFrame(data)
print(d.cov())

OUTPUT:

  f        s
f 53821.70 18846.55
s 18846.55 159633.20

This is the covariance matrix.

Example 2

Let us consider the data frame consisting of the following two columns.

import pandas as p
data={'f':[30,None,583,None,1], 's':[9,None,678,265,909]}
d=p.DataFrame(data)
print(d)

OUTPUT:

  f     s
0 30.0  9.0
1 NaN   NaN
2 583.0 678.0
3 NaN   265.0
4 1.0   909.0

Using cov() with min_periods parameter

We will now use the cov() method on the above data frame.

import pandas as p
data={'f':[30,None,583,None,1], 's':[9,None,678,265,909]}
d=p.DataFrame(data)
print(d.cov(min_periods=3))

OUTPUT:

  f             s
f 107562.333333 34902.50
s 34902.500000  163480.25

Here, in the final matrix, there are no Nan values. The value of min_periods is 3.

So, we have observed the ways to determine the covariance of a data frame.

Leave a Reply

Your email address will not be published.