Find variance of a list in Python
This article is going to help you understand how to find variance of numbers ordered in a list. We will look at 3 methods to find the variance of a list in Python. You can implement any of the three discussed methods you like. Each of the method is simple and straightforward.
Let’s consider a common, simple list for all the 3 examples.
arr = [4,5,6,7]
It is important to know the formula of variance when implementing it in a program. Variance refers to the average of squared differences from the mean.
variance = Σ (Xi – Xm)2 / N ; where,
Xi = ith observation ;
Xm = mean of all observations ;
N = total number of observation
Let’s calculate variance for over list arr in Python.
Method 1: Mean -> List Comprehension -> Variance
This method can be enlisted in simple steps:
- Find mean of all the elements in the list
- Using List comprehension find the squared differences of each element with mean
- Calculate variance as the sum of all the squared differences divided by mean
def variance_1(arr): mean = sum(arr)/len(arr) #step 1 temp = [(i-mean)**2 for i in arr] #step 2 variance = sum(temp)/len(arr) #step 3 return variance
Method 2: Using statistics module of Python
The function statistics.pvariance(array) returns the variance of the inputted “array” as a parameter.
import statistics def variance_2(arr): return statistics.pvariance(arr)
Method 3: Using NumPy library
The NumPy library can be used to calculate variance for 1-D as well as higher dimensional array (2-D, 3-D, etc.). It uses the function NumPy.var(array) and returns the variance of the inputted “array” as a parameter.
import numpy as np def variance_3(arr): return np.var(arr)
Now that we have defined 3 functions to calculate variance, let’s see their results for our list arr.
arr = [4,5,6,7] print("original array: ", arr) print("Variance of the data using method 1: ", variance_1(arr)) print("Variance of the data using method 2: ", variance_3(arr)) print("Variance of the data using method 3: ", variance_3(arr))
Output:
original array: [4, 5, 6, 7] Variance of the data using method 1: 1.25 Variance of the data using method 2: 1.25 Variance of the data using method 3: 1.25
Extra Tip: When using arrays in dimensions higher than 1D, use NumPy library and set parameter “axis=0(default)”. Change the axis parameter along which you need to calculate variance.
Also, go ahead and modify the code above to use it for your own data. I hope you learned something new. Let me know in the comments if you have any doubts. Cheers!
Further Reading:
Leave a Reply