# Solve Linear Regression Problem Mathematically in Python

Hello everyone, in this tutorial we will discuss how to solve a linear regression problem mathematically in Python.

## What is the mathematical formula of linear regression?

A linear regression line has an equation of the form y=mx+c, where x is the explanatory variable and y is the dependent variable. The slope of the line is m and c is the intercept (the value of y when x=0).

Mathematical Formula of slope(m1) and Intercept(c1):-

m=sum((x-mean(x))*(y-mean(y)))/sum((x-mean(x))^2)

c=mean(y)-m*mean(x)

So,by using this formula we can find out the value of y(unknow value).

Our Dataset:

Age-Glucose_Level – Age-Glucose_Level.csv

### Implementation of Mathematical formula of the Linear Regression Model: –

In this problem, you need to find out the glucose level with respect to the age, when the age is 22.

Step1:-

We have to add the dataset by using pandas. We can also use the numpy data science library. This is a CSV dataset that’s why we are adding read_csv.

```import numpy as np
import pandas as pd
df```

output:-

```Age    Glucose_Level
0   43         99
1   21         65
2   25         79
3   42         75
4   57         87
5   59         81

```

Step2:-

Now, we have to check the column name of this dataset, the dimension of this dataset and also check have any missing value or not.

```df.columns
df.shape
df.isna().any()```

output:-

```Index(['Age', 'Glucose_Level'], dtype='object')
(6, 2)
Age              False
Glucose_Level    False
dtype: bool

```

Step3:-

Now, we need to find out the correlation between the two variables.

`df.corr()`

output:-

```Age          Gluecose_Level
Age            1.000000        0.529809
Glucose_level  0.529809        1.000000```

Step4:-

Now, we have to find out the mean value of the age.

```df1=df
df["mean(Age)"]=df1["Age"].mean()
df1```

output:-

```Age       Glucose_Level    mean(Age)
0    43              99          41.166667
1    21              65          41.166667
2    25              79          41.166667
3    42              75          41.166667
4    57              87          41.166667
5    59              81          41.166667

```

Step5:-

Now, we need to calculate all the values.

```df1["Age-mean(Age)"]=df1["Age"]-df1["mean(Age)"]

df1["mean(Glucose_Level)"]=df1["Glucose_Level"].mean()

df1["Glucose_Level-mean(Glucose_Level)"]=df1["Glucose_Level"]-df1["mean(Glucose_Level)"]

df1["Age-mean(Age)*Glucose_Level-mean(Glucose_Level)"]=df1["Age-mean(Age)"]*df["Glucose_Level-mean(Glucose_Level)"]

df1["sum_of_(Age-mean(Age)*Glucose_Level-mean(Glucose_Level))"]=sum(df["Age-mean(Age)*Glucose_Level-mean(Glucose_Level)"])

df1["squre_of(Age-mean(Age))"]=df1["Age-mean(Age)"]**2

df1["sum_of_(squre_of(Age-mean(Age)))"]=sum(df1["squre_of(Age-mean(Age))"])

df1```

output:-

```    Age   Glucose_Level    mean(Age)   Age-mean(Age)  mean(Glucose_Level)  Glucose_Level-mean(Glucose_Level)   Age-mean(Age)*Glucose_Level-mean(Glucose_Level)  sum_of_(Age-mean(Age)*Glucose_Level-mean(Glucose_Level))  squre_of_(Age-mean(Age))  sum_of_(square_of(Age-mean(Age)))
0    43        99          41.166667      1.833333          81.0                      18.0                                           33.000000                                            478.0                                     3.361111                        1240.833333
1    21        65          41.166667     -20.166667         81.0                     -16.0                                          322.666667                                            478.0                                   406.694444                        1240.833333
2    25        79          41.166667     -16.166667         81.0                      -2.0                                           32.333333                                            478.0                                   261.361111                        1240.833333
3    42        75          41.166667       0.833333         81.0                      -6.0                                           -5.000000                                            478.0                                     0.694444                        1240.833333
4    57        87          41.166667      15.833333         81.0                       6.0                                           95.000000                                            478.0                                   250.694444                        1240.833333
5    59        81          41.166667      17.833333         81.0                       0.0                                            0.000000                                            478.0                                   318.027778                        1240.833333

```

Step6:-

Now, we can calculate the value of the slope (m).

```m=df["sum_of(Age-mean(Age)*Glucose_Level-mean(Glucose_Level))"]/df["sum_of_(square_of(Age-mean(Age)))"]
m=m.values.mean()
m```

Output:-

```0.3852249832102082
```

Step7:-

Now, this step we can calculate the value of the intercept(c).

```c=df["mean(Glucose_Level)"]-m*df["mean(Age)"]
c=c.mean()
c```

Output:-

```65.141572

```

Step8:-

Finally in this step, we can find out the unknown value(y).

```y=m*22+c
y```

Output:-

`73.23129617192747`

“Finally, we calculate the unknown value(y). We understand the mathematical formula of linear regression. We applied it without an inbuilt module.”