# Predicting insurance using Scikit-Learn in Python

Today we’ll be predicting the insurance using Scikit-Learn and Pandas in Python. We will use the Linear Regression algorithm to predict insurance. The insurance money is calculated from a Medical Cost Dataset which has various features to work with. Predicting insurance using Scikit-Learn and Pandas in Python requires a combination of Data Analytics and Machine Learning.

## Importing the .csv file using Pandas

First, download the dataset from this link. Then import the Pandas library and convert the .csv file to the Pandas dataframe. You can take any dataset of your choice. Preview your dataframe using the head() method.

import pandas as pd df=pd.read_csv("insurance.csv") df.head()

Output:

age sex bmi children smoker region charges 0 19 female 27.900 0 yes southwest 16884.92400 1 18 male 33.770 1 no southeast 1725.55230 2 28 male 33.000 3 no southeast 4449.46200 3 33 male 22.705 0 no northwest 21984.47061 4 32 male 28.880 0 no northwest 3866.85520

## Predict the charge for insurance using sklearn in Python

We will store the features we are using for prediction ie. age, BMI in the X variable. And, the target value to be predicted ie. the charges in the y variable. We are only taking two features for this tutorial, you can take as many as you want. The .values() function is to convert the resulting dataframe t0 a numpy array.

X=df[['age','bmi']].values y=df['charges'].values print(X) print(y)

Output:

The next step is to import the LinearRegression package of the sklearn library to fit our regression model. Firstly, we create the regression model ‘regsr’. Then we train the model using the fit() method. We pass our features and target to our model.

from sklearn.linear_model import LinearRegression regsr=LinearRegression() regsr.fit(X,y)

Output:

LinearRegrsesion(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

## Predicting our insurance

Now that we have trained our model, we can start predicting values. For example, we want to determine the insurance cost for a person of age 20 and a Body-Mass Index of 30 ie. [20,30]. We will then convert the list to a numpy array and reshape the array. This array is then passed to the predict() method.

import numpy as np prediction=regsr.predict(np.asarray([20,30]).reshape(-1,2)) print(prediction)

Output:

You can also try using other algorithms like the KNN Classification algorithm and see which one works best.

Dear Sir,

I’m unable to use proper libary for the Programe “SBI Life Insurance” in Jupyter Notebook,I’m Getting error,

I don’t know which libary used to load the sbi life insurance Datasets ,the algorithm is used Logistic Regression but when i’m doing..

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.datasets load_sbi_life_insurance

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

from sklearn.metrics import classification_report,confusion_matrix

import warnings

warnings.filterwarnings(‘ignore’)

error

File “”, line 4

from sklearn.datasets load_sbi_life_insurance

^

SyntaxError: invalid syntax