Multiple Regression in Machine Learning with example in Python
In this tutorial, We are going to understand Multiple Regression which is used as a predictive analysis tool in Machine Learning and see the example in Python.
We know that the Linear Regression technique has only one dependent variable and one independent variable. Unlike Linear Regression, Multiple Regression has more than one independent variable.
This technique is used where we have to consider more than one feature to predict our final outcome.
We have the following equation for Simple Linear Regression:
Y = α0 + α1X1
We have intercept α0 and α1 as coefficient of the given feature.
In Multiple Linear Regression, we have more than one independent feature, So every feature gives their coefficient separately as α1, α2 …. αn.
Y = α0 + α1X1 + α2X2 + α3X3 + … + αnXn
Now let’s see how this method works.
Pre-Requisite: Python, Pandas, sklearn
First import required Python libraries for analysis.
import pandas as pd import sklearn from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression
The pandas library is used to create pandas Dataframe object.
Now we will import the data file for our further processing.
data = pd.read_csv("boston.csv") data.head()
We imported “boston.csv” which is taken from sklearn dataset package. We can see the features that we can choose for prediction.
We want to predict the price of a house based on given features like crime rate, industry per acre, nitrogen oxide, number of rooms, age group, pupil-teacher ratio, price(in 1000$).
Now we will predict price by taking consideration of more than one independent features of the Boston dataset.
Feature data for our prediction are:
‘TAX’: Tax Rate for property
‘CRIM’: Crime rate by town
‘RM’: Number of rooms
‘INDUS’: Industry per acre
So, let’s choose our training data and testing data as 80% and 20% respectively.
Below is Code Implementation:
prices=data['PRICE'] features= data[['TAX','CRIM', 'RM', 'INDUS']] # train test splitting X_train,X_test, Y_train, Y_test = train_test_split(features, prices, test_size = .2, random_state = 10)
Now we will use the same
LinearRegression() from the sklearn module that we have used in Simple Linear Regression to create a linear regression object. We call the method of linear regression called
fit() that takes the independent variables(features in our code) and dependent values(i.e. price) as parameters.
regr = LinearRegression() regr.fit(X_train, Y_train) print('\nIntercept: ',regr.intercept_) pd.DataFrame(data = regr.coef_, index = X_train.columns, columns=['coef'])
coef TAX -0.007028 CRIM -0.160369 RM 7.059748 INDUS -0.163317
We can see the intercept and coefficient of all independent variables as tabular form.
Let’s see the r-squared for training data and testing data
print('the r-squared Training Data: ', regr.score(X_train, Y_train)) print('the r-squared Testing Data: ', regr.score(X_test, Y_test))
the r-squared Training Data: 0.5933759752536086
the r-squared Testing Data: 0.5192685222588359
This is the r-square value according to our four independent features and we are getting approx half of the prediction accurately, but we can get a good result by taking more independent features for training and testing.
Let’s see one predicted price for the following value to each independent feature.
prediction = regr.predict([[250, 0.05, 8, 3.33]]) print(prediction)
So we get price = $ 37.40925199 thousands for given parameters TAX=250, CRIM=0.05, RM=8, INDUS=3.33 by using the formula mentioned above.
I hope you have understood what is multiple regression in machine learning and enjoyed the Python example.
Thanks for reading!!