Locally Weighted Linear Regression in Python
In this tutorial, we will discuss a special form of linear regression – locally weighted linear regression in Python. We will go through the simple Linear Regression concepts at first, and then advance onto locally weighted linear regression concepts. Finally, we will see how to code this particular algorithm in Python.
Simple Linear Regression
Linear Regression is one of the most popular and basic algorithms of Machine Learning. It is used to predict numerical data. It depicts a relationship between a dependent variable (generally called as ‘x’) on an independent variable ( generally called as ‘y’). The general equation for Linear Regression is,
y = β0 + β1*x + ε
Why we need Locally Weighted Linear Regression?
Linear Regression works accurately only on data has a linear relationship between them. In cases where the independent variable is not linearly related to the dependent variable we cannot use simple Linear Regression, hence we resort to Locally Weighted Linear Regression (LWLR).
Locally Weighted Linear Regression Principle
It is a very simple algorithm with only a few modifications from Linear Regression. The algorithm is as follows :
- assign different weights to the training data
- assign bigger weights to the data points that are closer to the data we are trying to predict
In LWLR, we do not split the dataset into training and test data. We use the entire dataset at once and hence this takes a lot of time, space and computational exercise.
We use Kernel Smoothing to find out the weights to be assigned to the training data. This is much like the Gaussian Kernel but offers a “bell-shaped kernel”. It uses the following formula :
D = a * e ^ – (||X-X0||/(2c^2))
- We find a weight matrix for each training input X. The weight matrix is always a diagonal matrix.
- The weight decreases as the distance between the predicting data and the training data.
Predicting the Results
We use the following formula to find out the values of the dependent variables :
β = ((x’*w*x)^-1 ) * x’ * w * y
y = β * x0
LWLR in Python
import numpy as np import pandas as pd import matplotlib.pyplot as plt # kernel smoothing function def kernel(point, xmat, k): m,n = np.shape(xmat) weights = np.mat(np.eye((m))) for j in range(m): diff = point - X[j] weights[j, j] = np.exp(diff * diff.T / (-2.0 * k**2)) return weights # function to return local weight of eah traiining example def localWeight(point, xmat, ymat, k): wt = kernel(point, xmat, k) W = (X.T * (wt*X)).I * (X.T * wt * ymat.T) return W # root function that drives the algorithm def localWeightRegression(xmat, ymat, k): m,n = np.shape(xmat) ypred = np.zeros(m) for i in range(m): ypred[i] = xmat[i] * localWeight(xmat[i], xmat, ymat, k) return ypred #import data data = pd.read_csv('tips.csv') # place them in suitable data types colA = np.array(data.total_bill) colB = np.array(data.tip) mcolA = np.mat(colA) mcolB = np.mat(colB) m = np.shape(mcolB) one = np.ones((1, m), dtype = int) # horizontal stacking X = np.hstack((one.T, mcolA.T)) print(X.shape) # predicting values using LWLR ypred = localWeightRegression(X, mcolB, 0.8) # plotting the predicted graph xsort = X.copy() xsort.sort(axis=0) plt.scatter(colA, colB, color='blue') plt.plot(xsort[:, 1], ypred[X[:, 1].argsort(0)], color='yellow', linewidth=5) plt.xlabel('Total Bill') plt.ylabel('Tip') plt.show()
Please follow the following link to see the entire code :
The results for the tips.csv dataset is :
This is a very simple method of using LWLR in Python.
Note: This algorithm gives accurate results only when non-linear relationships exist between dependent and independent variables.