Predicting insurance using Scikit-Learn in Python

Today we’ll be predicting the insurance using Scikit-Learn and Pandas in Python. We will use the Linear Regression algorithm to predict insurance. The insurance money is calculated from a Medical Cost Dataset which has various features to work with. Predicting insurance using Scikit-Learn and Pandas in Python requires a combination of Data Analytics and Machine Learning.

Importing the .csv file using Pandas

First, download the dataset from this link. Then import the Pandas library and convert the .csv file to the Pandas dataframe. You can take any dataset of your choice. Preview your dataframe using the head() method.

import pandas as pd
df=pd.read_csv("insurance.csv")
df.head()

Output:

      age     sex     bmi  children smoker     region      charges
0      19  female  27.900         0    yes  southwest  16884.92400
1      18    male  33.770         1     no  southeast   1725.55230
2      28    male  33.000         3     no  southeast   4449.46200
3      33    male  22.705         0     no  northwest  21984.47061
4      32    male  28.880         0     no  northwest   3866.85520

Predict the charge for insurance using sklearn in Python

We will store the features we are using for prediction ie. age, BMI in the X variable. And, the target value to be predicted ie. the charges in the y variable. We are only taking two features for this tutorial, you can take as many as you want. The .values() function is to convert the resulting dataframe t0 a numpy array.

X=df[['age','bmi']].values
y=df['charges'].values
print(X)
print(y)

Output:

[[19.   27.9 ]
 [18.   33.77]
 [28.   33.  ]
 ...
 [18.   36.85]
 [21.   25.8 ]
 [61.   29.07]]
[16884.924   1725.5523  4449.462  ...  1629.8335  2007.945  29141.3603]

The next step is to import the LinearRegression package of the sklearn library to fit our regression model. Firstly, we create the regression model ‘regsr’. Then we train the model using the fit() method. We pass our features and target to our model.

from sklearn.linear_model import LinearRegression
regsr=LinearRegression()
regsr.fit(X,y)

Output:

LinearRegrsesion(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Predicting our insurance

Now that we have trained our model, we can start predicting values. For example, we want to determine the insurance cost for a person of age 20 and a Body-Mass Index of 30 ie. [20,30]. We will then convert the list to a numpy array and reshape the array. This array is then passed to the predict() method.

import numpy as np
prediction=regsr.predict(np.asarray([20,30]).reshape(-1,2))
print(prediction)

Output:

[8402.76367021]
Thus, the insurance money for this person is $8402.76.
You can also try using other algorithms like the KNN Classification algorithm and see which one works best.
Also, check out:

Leave a Reply

Your email address will not be published. Required fields are marked *