Multiple Regression in Machine Learning with example in Python

Post Views: 789

In this tutorial, We are going to understand Multiple Regression which is used as a predictive analysis tool in Machine Learning and see the example in Python.

We know that the Linear Regression technique has only one dependent variable and one independent variable. Unlike Linear Regression, Multiple Regression has more than one independent variable.

This technique is used where we have to consider more than one feature to predict our final outcome.

We have the following equation for Simple Linear Regression:

Y = α0 + α1X1
We have intercept α0 and α1 as coefficient of the given feature.

In Multiple Linear Regression, we have more than one independent feature, So every feature gives their coefficient separately as α1, α2 …. αn.

Y = α0 + α1X1 + α2X2 + α3X3 + … + αnXn

Now let’s see how this method works.

Pre-Requisite: Python, Pandas, sklearn

First import required Python libraries for analysis.

import pandas as pd
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

The pandas library is used to create pandas Dataframe object.

Now we will import the data file for our further processing.

data = pd.read_csv("boston.csv")
data.head()

Output:

	CRIM	INDUS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	LSTAT	PRICE
0	0.00632	2.31	0.538	6.575	65.2	4.0900	1.0	296.0	15.3	4.98	24.0
1	0.02731	7.07	0.469	6.421	78.9	4.9671	2.0	242.0	17.8	9.14	21.6
2	0.02729	7.07	0.469	7.185	61.1	4.9671	2.0	242.0	17.8	4.03	34.7
3	0.03237	2.18	0.458	6.998	45.8	6.0622	3.0	222.0	18.7	2.94	33.4
4	0.06905	2.18	0.458	7.147	54.2	6.0622	3.0	222.0	18.7	5.33	36.2

We imported “boston.csv” which is taken from sklearn dataset package. We can see the features that we can choose for prediction.

We want to predict the price of a house based on given features like crime rate, industry per acre, nitrogen oxide, number of rooms, age group, pupil-teacher ratio, price(in 1000$).

Now we will predict price by taking consideration of more than one independent features of the Boston dataset.

Feature data for our prediction are:

‘TAX’: Tax Rate for property
‘CRIM’: Crime rate by town
‘RM’: Number of rooms
‘INDUS’: Industry per acre

So, let’s choose our training data and testing data as 80% and 20% respectively.
Below is Code Implementation:

prices=data['PRICE']
features= data[['TAX','CRIM', 'RM', 'INDUS']]

# train test splitting
X_train,X_test, Y_train, Y_test = train_test_split(features, prices, test_size = .2, random_state = 10)

Now we will use the sameLinearRegression() from the sklearn module that we have used in Simple Linear Regression to create a linear regression object. We call the method of linear regression calledfit() that takes the independent variables(features in our code) and dependent values(i.e. price) as parameters.

regr = LinearRegression()
regr.fit(X_train, Y_train)

print('\nIntercept: ',regr.intercept_)
pd.DataFrame(data = regr.coef_, index = X_train.columns, columns=['coef'])

Output:

Intercept: -16.75997008917904

Out[29]:

coef

TAX -0.007028

CRIM -0.160369

RM 7.059748

INDUS -0.163317

	coef
TAX	-0.007028
CRIM	-0.160369
RM	7.059748
INDUS	-0.163317

We can see the intercept and coefficient of all independent variables as tabular form.

Let’s see the r-squared for training data and testing data

print('the r-squared Training Data: ', regr.score(X_train, Y_train))
print('the r-squared Testing Data: ', regr.score(X_test, Y_test))

Output:
the r-squared Training Data: 0.5933759752536086
the r-squared Testing Data: 0.5192685222588359

This is the r-square value according to our four independent features and we are getting approx half of the prediction accurately, but we can get a good result by taking more independent features for training and testing.

Let’s see one predicted price for the following value to each independent feature.

prediction = regr.predict([[250, 0.05, 8, 3.33]])
print(prediction)

Output:

37.40925199

So we get price = $ 37.40925199 thousands for given parameters TAX=250, CRIM=0.05, RM=8, INDUS=3.33 by using the formula mentioned above.

I hope you have understood what is multiple regression in machine learning and enjoyed the Python example.

Thanks for reading!!

Multiple Regression in Machine Learning with example in Python

Leave a Reply Cancel reply