Find variance of a list in Python

This article is going to help you understand how to find variance of numbers ordered in a list. We will look at 3 methods to find the variance of a list in Python. You can implement any of the three discussed methods you like. Each of the method is simple and straightforward.

Let’s consider a common, simple list for all the 3 examples.
arr = [4,5,6,7]

It is important to know the formula of variance when implementing it in a program. Variance refers to the average of squared differences from the mean.

variance = Σ (Xi – Xm)2 / N  ; where,
Xi = ith observation ;
Xm = mean of all observations ;
N = total number of observation

Let’s calculate variance for over list arr in Python.

Method 1: Mean -> List Comprehension -> Variance


This method can be enlisted in simple steps:

  1. Find mean of all the elements in the list
  2. Using List comprehension find the squared differences of each element with mean
  3. Calculate variance as the sum of all the squared differences divided by mean
def variance_1(arr):
  mean = sum(arr)/len(arr)  #step 1
  temp = [(i-mean)**2 for i in arr]  #step 2
  variance = sum(temp)/len(arr)  #step 3
  return variance

Method 2: Using statistics module of Python


The function statistics.pvariance(array) returns the variance of the inputted “array” as a parameter.

import statistics
def variance_2(arr):
  return statistics.pvariance(arr)

Method 3: Using NumPy library


The NumPy library can be used to calculate variance for 1-D as well as higher dimensional array (2-D, 3-D, etc.). It uses the function NumPy.var(array) and returns the variance of the inputted “array” as a parameter.

import numpy as np
def variance_3(arr):
  return np.var(arr)

Now that we have defined 3 functions to calculate variance, let’s see their results for our list arr.

arr = [4,5,6,7]
print("original array: ", arr)
print("Variance of the data using method 1: ", variance_1(arr))
print("Variance of the data using method 2: ", variance_3(arr))
print("Variance of the data using method 3: ", variance_3(arr))

Output:

original array: [4, 5, 6, 7]
Variance of the data using method 1: 1.25
Variance of the data using method 2: 1.25
Variance of the data using method 3: 1.25

Extra Tip: When using arrays in dimensions higher than 1D, use NumPy library and set parameter “axis=0(default)”. Change the axis parameter along which you need to calculate variance.

Also, go ahead and modify the code above to use it for your own data. I hope you learned something new. Let me know in the comments if you have any doubts. Cheers!

Further Reading:

Leave a Reply

Your email address will not be published. Required fields are marked *