How to Plot Line of Best Fit in Python

Hello fellow Python coder! Today, in this tutorial we’ll be learning about plotting a line of Best Fit in Python. If you are confused about what a Best Fit line is, the same will be explained in the upcoming sections.

So before any delay, let’s get right on with it!

Introduction to Line of Best Fit

To understand the concept of Line of Best Fit, let’s take an example of the night sky above us. We know that there are a lot of stars spread across the vast sky above our heads. Consider these stars as data points the space of our graphs each having a pair of x and y coordinates.

You can compare the Line of Best Fit with a constellation. How? We know that a constellation is formed with the help of finding the patterns of the stars nearby and can help you visualize the hidden patterns that exist in how the stars are positioned.

The Line of Best Fit does the same but for the data points in a graph, it is constructed using the patterns and characteristics of the data points present in the space. It helps discover hidden patterns present in these data points as well.

Implementing Line of Best Fit using Python

For this tutorial, let’s take the example of No. of hours a student has studied v/s exam scores achieved by the student against it. Before Best Fit Line, we will start by creating a dataset using the data below.

Data Creation

For the creation of data, we will make use of numpy library and the functions present inside the library. To get values under a certain range and random step we will make use of the uniform function. Also for our simplicity, we will be rounding off the values using the round function. Have a look at the code below:

import numpy as np

noHoursStudied = np.sort(np.round(np.random.uniform(12, 30, 100), 1))
examScoresGained = np.sort(np.round(np.random.uniform(30, 100, 100), 2))

print("No. of Hours : ", noHoursStudied)
print("Exams Scores : ",examScoresGained)

The code mentioned above is self-explanatory and will result in similar data as shown in the image below:

Line of Best Fit - DATA

Data Visualization

Now that we have the data with us, let’s plot the data using the scatter plot under the matplotlib library using the code snippet below. If you are unaware of scatter plots you can go through the tutorial mentioned below as well.

Also Read: Matplotlib scatter plot in Python

import matplotlib.pyplot as plt

plt.style.use('seaborn')
plt.figure(figsize=(5,5))

plt.scatter(noHoursStudied, examScoresGained, color='green')
plt.xlabel('No. of Hours Studied by the Student')
plt.ylabel('Scores gained in Exam by the Student')
plt.title('Scatter Plot of No. of Hours Studied vs Scores gained in Exam by the Student')

plt.show()

We will be styling the plot by using the seaborn theme to make the plot look pretty and also set the figsize according to our preference. You can change the same according to your liking. The resulting plot looks similar to shown below:

Line of Best Fit - SCATTER PLOT

Your plot might be different if you used a different set of initial datasets.

Plotting Line of Best Fit

Now the next step involves getting the Line of Best fit which implies a line that will be able to pass through a maximum of the points and also will be able to identify the hidden patterns or common features within the points I have mentioned.

To achieve the same, we will make use of the polyfit and ployval functions. Let’s know more about them one by one. The function of polyfit function is to take two different variables (in our case they are noHoursStudied and examScoresGained) and give us coefficients of the line to be plotted.

When it comes to plotting a line the general equation is: y = a + bx where a and b are the intercept and slope respectively (that’s just basic linear algebra). So in this case a and b are known as coefficients. For our case, our x and y are noHoursStudied and examScoresGained respectively.

Plot Line of Best Fit in Python

polyfit Function

So let’s make use of polyfit function to get coefficients of the straight line using the code below:

coefficients = np.polyfit(noHoursStudied, examScoresGained, 1)
slope, intercept = coefficients

print("The resulting equation is : y = ",intercept," + ",slope,"x")

The variable coefficients takes both the slope and intercept which we will unpack in the next line. We will also print the equation of the line using the print statement. The result of the code execution is shown below:

The resulting equation is : y =  -25.874420483006524  +  4.239954967053389 x

polyval Function

Let’s move on to the functionality of the polyval function which helps to get the straight line data points to plot the line later into the plot. The function will take help of the coefficients and the no of hours to get resulting scores of exams.

line_of_best_fit = np.polyval(coefficients, noHoursStudied)
print("Line of Best Fit is : ", line_of_best_fit)

The screen displays the resulting data points for the best fit line as shown below.

Line of Best Fit - DATA POINTS

Final Visualization of Line of Best Fit

Finally, we will plot all the data points along with line of best fit in a single plot using the code snippet below:

plt.scatter(noHoursStudied, examScoresGained, color='green')
plt.plot(noHoursStudied, line_of_best_fit, color='blue')
plt.xlabel('No. of Hours Studied by the Student')
plt.ylabel('Scores gained in Exam by the Student')
plt.title('LINE OF BEST FIT for No. of Hours Studied vs Scores gained in Exam by the Student')
plt.show()

The screen displays the resulting plot as shown below.

Line of Best Fit - FINAL PLOT

I hope you learned something new through this tutorial.

Also Read:

How to Add an Average Line to Plot in Matplotlib

Happy Learning!

Leave a Reply

Your email address will not be published. Required fields are marked *