Univariate Linear Regression in Python
Hi! Today, we’ll be learning Univariate Linear Regression with Python. This is one of the most novice machine learning algorithms. Univariate Linear Regression is a statistical model having a single dependant variable and an independent variable.
We use Linear Regression in predicting the quality of yield of agriculture, which is dependant on the amount of rainfall, climatic conditions, etc, the price of houses based on total area, number of bedrooms, number of bathrooms, number of floors, etc or we can also use Linear Regression to predict the resell value of cars based on the number of kilometers driven, how old the car is based on it’s buying date, number of owners of the car, the manufacturing company of the car.
All the above-mentioned are the examples of Multivariate Linear Regression as the independent variable is considered to be dependant on more the one variable.
Univariate Linear Regression
Let’s consider a dataset having the area of houses and it’s corresponding prices. We have a value of response y (the price of house) for every feature x (area of the house). The dataset can be found at: data.csv
Similarly, we have such m training examples. For all the training data set we define hypothesis:
y = hθ(x) = θ0 + θ1 * x
The curve or line which passes through a maximum possible data set and fits is known as the line of regression or regression line. But the sum of the predicted values by the model may differ from actual value, that error is calculated by cost function also known as squared mean error. The cost function can be given by:
J(θ0, θ1) = 1 / 2m * ∑(hθ * x – y)2
After calculating cost function we need to determine (theta) with minimum change which is calculated using partial differentiation of cost function. Its also known as gradient descent. For calculating gradient descent we need to define the learning rate (alpha ) for the model. learning rate is nothing but an interval over which changes will be observed. It has generally low value to avoid troubleshooting. Gradient descent can be represented as:
θ1 = θ1 – α / m * ∑((hθ * x – y) * x)
The minimal value of gradient descent is considered to be the best fit for the model to get a desired predictable variables value.
Below is our Python program for Univariate Linear Regression:
import numpy as np import csv import matplotlib.pyplot as plt def read_data(filename): x, y = list(), list() with open(filename, 'r') as csv_file: csv_reader = csv.reader(csv_file) for row in csv_reader: x.append(float(row)) y.append(float(row)) x, y = np.array(x), np.array(y) return x, y class LinearRegression: def __init__(self, x, y): self.x = self.add_ones(x) self.y = y self.theta = self.initialize_theta() self.m = len(y) def initialize_theta(self): return np.zeros(2) def add_ones(self, x): return np.array([(1, ele) for ele in x]) def cost_function(self): J = np.sum(np.power((np.dot(self.x, self.theta) - self.y), 2)) / (2 * self.m) return J def fit(self, alpha, num_iters): self.alpha = alpha self.num_iters = num_iters self.gradient_descent() def gradient_descent(self): self.J_history = list() for i in range(self.num_iters): self.theta = self.theta - (self.alpha / self.m * np.dot((np.dot(self.x, self.theta) - self.y), self.x)) J = self.cost_function() if (i % 100 == 0): self.J_history.append(J) def predict(self, x): x = self.add_ones(x) return (np.dot(x, self.theta)) def compare(self): plt.plot(self.x[:, 1], self.y, 'ro') plt.plot(self.x[:, 1], np.dot(self.x, self.theta)) if __name__ == "__main__": x, y = read_data('data.csv') lr = LinearRegression(x, y) lr.fit(alpha= 0.01, num_iters= 15000) lr.compare()
That’s it. I hope you will like this tutorial…