# Explain R Squared used In Machine Learning in Python

In simple words, R Square is a statistical formula. Here we get more details.

## What is R square?

R^{2 } is just a square of R. R is a correlation that is numbered between +1 and -1. It shows the relationship between the dependent variable and the independent variables. Values close to +1 or -1 shows a strong relationship between both variables. If value tends to 0 it shows the variable is less likely to depend on each other. R^{2} is only shown variation from 0 to 1. It does not show the direction(increase or decrease) of a variable. We are using R^{2 } because it’s interpretation is very easy. The calculation is also easy in R^{2}. Higher the R^{2 } value than the higher the information in variables.

## R square in machine learning in Python

Here, we implement the R square in machine learning using a house price dataset.

We importing libraries. We are using python library Numpy, Pandas, Sklearn, Scipy.

#imporing libraries import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.metrics import r2_score from sklearn.linear_model import LinearRegression from scipy import stats

Now, we preparing our data. Here, the House price dataset is used.

#Praparing Data data=pd.read_csv('data.csv') data=data.sort_values(["price"],ascending=True) data=data[['price','sqft_lot']][:10] data.head(12)

Output:-

Plotting our data:

#plotting the data X=np.array(data['price']).reshape(-1,1) Y=np.array(data['sqft_lot']) plt.scatter(X,Y)

Output:-

Now, we call function LinearRegression() and fits data into it and predicting Y value for value X. and then plotting the data with linear regression.

#performing linear regression LR=LinearRegression() LR.fit(X,Y) y_prediction=LR.predict(X)

#plotting linear Regression plt.scatter(X,Y) plt.plot(X,y_prediction,color='green')

Output:-

Creating a function for calculating variance. A variance of the mean calculated by all value subtracted by it’s mean and finding the sum of all numbers.

**var(mean)=sum(Y-mean) ^{2}**

#function for variaton def var(Y1,Y2): var=sum((Y1-Y2)*(Y1-Y2)) return var

Now, creating a function for calculating R^{2}. The formula for calculating R^{2} is as follows:

**R ^{2} = (var(mean) – var(line))/var(mean) = 1-(var(line)/var(mean))**

#function for clculating R squared def R_squared(y,y_prec): y_mean=[y.mean() for i in y] R_square=1-(var(y,y_prec)/var(y,y_mean)) return R_square

R2=R_squared(Y,y_prediction) print("R square: ",R2)

Output:-

For Download dataset: House Dataset

## Conclusion

In conclusion, We use R^{2} because of its easy interpretation and computation. R^{2} is based on what type of dataset is used. Sometimes it gives a biased result. So we must consider what type of data used.

## Leave a Reply