Predicting insurance using Scikit-Learn in Python

Post Views: 1,225

Today we’ll be predicting the insurance using Scikit-Learn and Pandas in Python. We will use the Linear Regression algorithm to predict insurance. The insurance money is calculated from a Medical Cost Dataset which has various features to work with. Predicting insurance using Scikit-Learn and Pandas in Python requires a combination of Data Analytics and Machine Learning.

Importing the .csv file using Pandas

First, download the dataset from this link. Then import the Pandas library and convert the .csv file to the Pandas dataframe. You can take any dataset of your choice. Preview your dataframe using the head() method.

import pandas as pd
df=pd.read_csv("insurance.csv")
df.head()

Output:

      age     sex     bmi  children smoker     region      charges
0      19  female  27.900         0    yes  southwest  16884.92400
1      18    male  33.770         1     no  southeast   1725.55230
2      28    male  33.000         3     no  southeast   4449.46200
3      33    male  22.705         0     no  northwest  21984.47061
4      32    male  28.880         0     no  northwest   3866.85520

Predict the charge for insurance using sklearn in Python

We will store the features we are using for prediction ie. age, BMI in the X variable. And, the target value to be predicted ie. the charges in the y variable. We are only taking two features for this tutorial, you can take as many as you want. The .values() function is to convert the resulting dataframe t0 a numpy array.

X=df[['age','bmi']].values
y=df['charges'].values
print(X)
print(y)

Output:

[[19.   27.9 ]
 [18.   33.77]
 [28.   33.  ]
 ...
 [18.   36.85]
 [21.   25.8 ]
 [61.   29.07]]
[16884.924   1725.5523  4449.462  ...  1629.8335  2007.945  29141.3603]

The next step is to import the LinearRegression package of the sklearn library to fit our regression model. Firstly, we create the regression model ‘regsr’. Then we train the model using the fit() method. We pass our features and target to our model.

from sklearn.linear_model import LinearRegression
regsr=LinearRegression()
regsr.fit(X,y)

Output:

LinearRegrsesion(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Predicting our insurance

Now that we have trained our model, we can start predicting values. For example, we want to determine the insurance cost for a person of age 20 and a Body-Mass Index of 30 ie. [20,30]. We will then convert the list to a numpy array and reshape the array. This array is then passed to the predict() method.

import numpy as np
prediction=regsr.predict(np.asarray([20,30]).reshape(-1,2))
print(prediction)

Output:

[8402.76367021]

Thus, the insurance money for this person is $8402.76.
You can also try using other algorithms like the KNN Classification algorithm and see which one works best.

Also, check out:

One response to “Predicting insurance using Scikit-Learn in Python”

Shubham Mishra says:

April 11, 2020 at 1:16 pm

Dear Sir,

I’m unable to use proper libary for the Programe “SBI Life Insurance” in Jupyter Notebook,I’m Getting error,
I don’t know which libary used to load the sbi life insurance Datasets ,the algorithm is used Logistic Regression but when i’m doing..

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets load_sbi_life_insurance
from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report,confusion_matrix
import warnings
warnings.filterwarnings(‘ignore’)

error
File “”, line 4
from sklearn.datasets load_sbi_life_insurance
^
SyntaxError: invalid syntax

Reply

Predicting insurance using Scikit-Learn in Python

Importing the .csv file using Pandas

Predict the charge for insurance using sklearn in Python

Predicting our insurance

One response to “Predicting insurance using Scikit-Learn in Python”

Leave a Reply Cancel reply

Related Posts