Height-Weight Prediction By Using Linear Regression in Python
Hi everyone, in this tutorial we are going to discuss “Height-Weight Prediction By Using Linear Regression in Python“.
What is a Linear Regression?
In statistics, linear regression is a linear approach to modeling the relationship between a scalar response(or dependent variable ) and one or more explanatory variables(or independent variables). The case of one explanatory variable is called a simple linear regression. For more than one explanatory variable, The process is called multiple linear regression.
A linear regression line has an equation of the form y=mx+c, where x is the explanatory variable and y is the dependent variable. The slope of the line is m, and c is the intercept (the value of y when x=0)
Image of the linear model:
our dataset:
Implementation of Linear Regression Model: Height-Weight Prediction
In this problem, you need to find out the weight with respect to the height, when the height is 2.
step1:-
We have to add the dataset by using numpy, pandas Data science library. This is a CSV dataset that’s why we are adding the read_csv .head method use to add the first 5 rows.
import numpy as np import pandas as pd df=pd.read_csv("height-weight.csv") df.head()
output: Height Weight 0 1.47 52.21 1 1.5 53.12 2 1.52 54.48 3 1.55 55.84 4 1.57 57.2
step2: –
Now we have to check the column name of this dataset, the dimension of this data set and also check have any missing value or not.
df.columns df.shape df.isna().any()
output: Index(['Height', 'Weight'], dtype='object') (15, 2) Height False Weight False type: bool
step3:-
Now we need to find out the correlation between two variables
df.corr()
output: Height Weight Height 1.0000000 0.994584 Weight 0.994584 1.0000000
step4:-
Now, we need only the values of this independent variable and this independent variable should be 2 dimension array and we need also the dependent variable values. It is one dimension array
height=df.Height.values[:,np.newaxis] weight=df.Weight.values height weight
output: array([[1.47], [1.5 ], [1.52], [1.55], [1.57], [1.6 ], [1.63], [1.65], [1.68], [1.7 ], [1.73], [1.75], [1.78], [1.8 ], [1.83]]) array([52.21, 53.12, 54.48, 55.84, 57.2 , 58.57, 59.93, 61.29, 63.11, 64.47, 66.28, 68.1 , 69.92, 72.19, 74.46])
step5:-Now, we need to normalize the variables or max-mix scaling the variables.
Formula:- Xnormal=(X-Xmin)/(Xmax-Xmin), where X is the values, Xman is the maximum value of the X and Xmin is the minimum value of this X.
Heightmin=height.min() Heightmax=height.max() Heightnorm=(height-Heightmin)/(Heightmax-Heightmin) Weightmin=weight.min() Weightmax=weight.max() Weightnorm=(weight-Weightmin)/(Weightmax-Weightmin) Heightnorm Weightnorm
output: array([[0. ], [0.08333333], [0.13888889], [0.22222222], [0.27777778], [0.36111111], [0.44444444], [0.5 ], [0.58333333], [0.63888889], [0.72222222], [0.77777778], [0.86111111], [0.91666667], [1. ]]) array([0. , 0.04089888, 0.10202247, 0.16314607, 0.22426966, 0.2858427 , 0.34696629, 0.40808989, 0.48988764, 0.55101124, 0.63235955, 0.7141573 , 0.79595506, 0.89797753, 1. ])
step6:-
Now, we can apply the Linear Regression Model. In this sklearn library has an inbuilt method for this linear model.
import sklearn.linear_model as lm lr=lm.LinearRegression() lr.fit(height,weight)
output: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
step7:-
Now, we need to find out the value of the weight, when the height value is 2.
knownvalue=int(input("Enter the value of height:")) findvalue=lr.predict([[knownvalue]]) print("when the height value is",knownvalue,"that moment weight value is",findvalue)output:
output: Enter the value of height:2 when the height value is 2 that moment weight value is [83.48241717]
step8:-
We can insert the new predicted value into this dataset.
df["predicted_value"]=lr.predict(height) df.head()
output: Height Weight predicted_value 0 1.47 52.21 51.008158 1 1.50 53.12 52.846324 2 1.52 54.48 54.071768 3 1.55 55.84 55.909933 4 1.57 57.20 57.135377
step9:-
Now, finally, we need to calculate the model score.
from sklearn.metrics import r2_score accuracy=r2_score(weight,lr.predict(height)) print("the model accuracy is",accuracy*100,"%")
output: the model accuracy is 98.91969224457968 %
Finally, we applied the linear regression model and understand the concept of linear regression.
How can we test and train and plots. how to add in it?