Calculate Residuals in Regression Analysis in Python

In this tutorial, we will learn about the calculation of residuals in regression analysis, an important part in any Regression analysis.

Residuals

The residuals basically refer to the difference between the actual value and the predicted value from the model. It plays a crucial role in assessing the validity of the Regression model and analysing the best fit line that is the relationship between independent and dependent variables.
Residuals help in analyzing:

  • Model fit: Smaller residuals mean a good model whereas large residuals mean that the model is not able to capture the relationship between variables.
  • Outliers: Residuals help in identifying the outliers present in the dataset by indicating large residual value which is out of range with other residual values.
  • Detection of model assumptions: Residuals should be normally distributed around zero line and should be independent of each other. Correlation between residuals signifies that the model is not good to identify the relations. Many models work on the principle of homoscedasticity ( variance of residuals should be constant).

If you are a beginner in Regression analysis you might find these useful:
OLS Regression in Python
Weighted Least Squares Regression in Python

Python Code

For calculating the residual, you have to take the difference between y_train and y_pred. Following the general nomenclature, y_train contains the actual value whereas y_pred contains the predicted value, predicted by the Regression model.

residuals = y_train - y_pred

 

Leave a Reply

Your email address will not be published. Required fields are marked *