Locally Weighted Linear Regression in Python

In this tutorial, we will discuss a special form of linear regression – locally weighted linear regression in Python. We will go through the simple Linear Regression concepts at first, and then advance onto locally weighted linear regression concepts. Finally, we will see how to code this particular algorithm in Python.

Simple Linear Regression

Linear Regression is one of the most popular and basic algorithms of Machine Learning. It is used to predict numerical data. It depicts a relationship between a dependent variable (generally called as ‘x’) on an independent variable ( generally called as ‘y’). The general equation for Linear Regression is,

y = β0 + β1*x + ε

Why we need Locally Weighted Linear Regression?

Linear Regression works accurately only on data has a linear relationship between them. In cases where the independent variable is not linearly related to the dependent variable we cannot use simple Linear Regression, hence we resort to Locally Weighted Linear Regression (LWLR).

 

Locally Weighted Linear Regression Principle

It is a very simple algorithm with only a few modifications from Linear Regression. The algorithm is as follows :

  • assign different weights to the training data
  • assign bigger weights to the data points that are closer to the data we are trying to predict

In LWLR, we do not split the dataset into training and test data. We use the entire dataset at once and hence this takes a lot of time, space and computational exercise.

Kernel Smoothing

We use Kernel Smoothing to find out the weights to be assigned to the training data. This is much like the Gaussian Kernel but offers a “bell-shaped kernel”. It uses the following formula :

D = a * e ^ – (||X-X0||/(2c^2))

  • We find a weight matrix for each training input X. The weight matrix is always a diagonal matrix.
  • The weight decreases as the distance between the predicting data and the training data.

Predicting the Results

We use the following formula to find out the values of the dependent variables :

β = ((x’*w*x)^-1 ) * x’ * w * y

y =  β * x0 

LWLR in Python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# kernel smoothing function
def kernel(point, xmat, k):
    m,n = np.shape(xmat)
    weights = np.mat(np.eye((m)))
    
    for j in range(m):
        diff = point - X[j]
        weights[j, j] = np.exp(diff * diff.T / (-2.0 * k**2))
    
    return weights

# function to return local weight of eah traiining example
def localWeight(point, xmat, ymat, k):
    wt = kernel(point, xmat, k)
    W = (X.T * (wt*X)).I * (X.T * wt * ymat.T)
    return W

# root function that drives the algorithm
def localWeightRegression(xmat, ymat, k):
    m,n = np.shape(xmat)
    ypred = np.zeros(m)
    
    for i in range(m):
        ypred[i] = xmat[i] * localWeight(xmat[i], xmat, ymat, k)
        
    return ypred

#import data
data = pd.read_csv('tips.csv')

# place them in suitable data types
colA = np.array(data.total_bill)
colB = np.array(data.tip)

mcolA = np.mat(colA)
mcolB = np.mat(colB)

m = np.shape(mcolB)[1]
one = np.ones((1, m), dtype = int)

# horizontal stacking
X = np.hstack((one.T, mcolA.T))
print(X.shape)

# predicting values using LWLR
ypred = localWeightRegression(X, mcolB, 0.8)

# plotting the predicted graph
xsort = X.copy()
xsort.sort(axis=0)
plt.scatter(colA, colB, color='blue')
plt.plot(xsort[:, 1], ypred[X[:, 1].argsort(0)], color='yellow', linewidth=5)
plt.xlabel('Total Bill')
plt.ylabel('Tip')
plt.show()

 

Please follow the following link to see the entire code :

LWLR.zip

The results for the tips.csv dataset is :

Locally Weighted Linear Regression in Python

This is a very simple method of using LWLR in Python.

Note: This algorithm gives accurate results only when non-linear relationships exist between dependent and independent variables.

Also Read: Predict Population Growth Using Machine Learning in Python

Leave a Reply

Your email address will not be published. Required fields are marked *