Machine Learning | Bias Variance Tradeoff

In this article, we are going to learn the Bias Variance Tradeoff in machine learning. If you master this concept it will greatly help you to understand your Machine Learning model’s performance.

Bias and Variance

Bias – In simplified language, Bias refers to the difference between your average predicted data and your true data. Either of the two situations can arise when you are constructing your machine learning model.

  1. High bias- High bias means not understanding your training set in a generalized manner to make predictions. This happens because your model fails to understand the relevant relations between your input and output data. As a result, your model’s predictions will be way off your true values. This usually happens due to lack of necessary amount  of training data
  2. Low bias- Low bias refers to too much understanding of your data. The model handles the relationship between the input and output data so well that you will get maximum accuracy for your predictions. However, this accuracy will only be limited(biased) to a particular training data you selected

Variance-Variance is the estimation of by what amount your prediction will change if you use a different data set. It refers to a change in accuracy when you shift from one data set to another. Variance refers to your model’s aim towards the target. Two cases of variance are explained below

  1. High Variance-High variance means your predictions for different datasets are not in harmony. You will observe your predictions are stretched apart for different input data is your aim towards the target is not consistent.
  2. Low variance- Low variance means your predictions are consistent even if you change your input data. Your aim is pretty consistent towards the specific region. However, your aim being consistent does not always necessarily mean that it is accurate.

Bias Variance Trade-off

Low Bias and Low Variance are both desirable features, however are incompatible with each other. One can achieve low bias by maximizing the amount of input data for the model, but this causes variance to rise. This means results won’t be consistent for different inputs(apart from the training data).  If we reduce the data to rectify variance the bias will start rising simultaneously. We call this conflict the Bias Variance Tradeoff. We can understand these concepts better with the pictorial representation below

bias variance trade off

From the above diagram, it is clear that we should always strive towards attaining low bias and variance. This is an ideal situation though because we cannot input more and less amount of training data simultaneously. To make a good model we should tackle the Bias Variance Tradeoff. This means trying to find the equilibrium between bias and variance. A favorable balance of bias and variance would never result in over-fitting or under-fitting of data.

To understand the mathematical formula behind Bias and Variance you can refer here

Leave a Reply

Your email address will not be published. Required fields are marked *