Predicting insurance using Scikit-Learn in Python

Today we’ll be predicting the insurance using Scikit-Learn and Pandas in Python. We will use the Linear Regression algorithm to predict insurance. The insurance money is calculated from a Medical Cost Dataset which has various features to work with. Predicting insurance using Scikit-Learn and Pandas in Python requires a combination of Data Analytics and Machine Learning.

Importing the .csv file using Pandas

First, download the dataset from this link. Then import the Pandas library and convert the .csv file to the Pandas dataframe. You can take any dataset of your choice. Preview your dataframe using the head() method.

import pandas as pd


      age     sex     bmi  children smoker     region      charges
0      19  female  27.900         0    yes  southwest  16884.92400
1      18    male  33.770         1     no  southeast   1725.55230
2      28    male  33.000         3     no  southeast   4449.46200
3      33    male  22.705         0     no  northwest  21984.47061
4      32    male  28.880         0     no  northwest   3866.85520

Predict the charge for insurance using sklearn in Python

We will store the features we are using for prediction ie. age, BMI in the X variable. And, the target value to be predicted ie. the charges in the y variable. We are only taking two features for this tutorial, you can take as many as you want. The .values() function is to convert the resulting dataframe t0 a numpy array.



[[19.   27.9 ]
 [18.   33.77]
 [28.   33.  ]
 [18.   36.85]
 [21.   25.8 ]
 [61.   29.07]]
[16884.924   1725.5523  4449.462  ...  1629.8335  2007.945  29141.3603]

The next step is to import the LinearRegression package of the sklearn library to fit our regression model. Firstly, we create the regression model ‘regsr’. Then we train the model using the fit() method. We pass our features and target to our model.

from sklearn.linear_model import LinearRegression


LinearRegrsesion(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Predicting our insurance

Now that we have trained our model, we can start predicting values. For example, we want to determine the insurance cost for a person of age 20 and a Body-Mass Index of 30 ie. [20,30]. We will then convert the list to a numpy array and reshape the array. This array is then passed to the predict() method.

import numpy as np


Thus, the insurance money for this person is $8402.76.
You can also try using other algorithms like the KNN Classification algorithm and see which one works best.
Also, check out:

One response to “Predicting insurance using Scikit-Learn in Python”

  1. Shubham Mishra says:

    Dear Sir,

    I’m unable to use proper libary for the Programe “SBI Life Insurance” in Jupyter Notebook,I’m Getting error,
    I don’t know which libary used to load the sbi life insurance Datasets ,the algorithm is used Logistic Regression but when i’m doing..

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    from sklearn.datasets load_sbi_life_insurance
    from sklearn.linear_model import LogisticRegression

    from sklearn.model_selection import train_test_split
    from sklearn.metrics import accuracy_score
    from sklearn.metrics import classification_report,confusion_matrix
    import warnings

    File “”, line 4
    from sklearn.datasets load_sbi_life_insurance
    SyntaxError: invalid syntax

Leave a Reply

Your email address will not be published. Required fields are marked *