Predicting insurance using Scikit-Learn in Python
Today we’ll be predicting the insurance using Scikit-Learn and Pandas in Python. We will use the Linear Regression algorithm to predict insurance. The insurance money is calculated from a Medical Cost Dataset which has various features to work with. Predicting insurance using Scikit-Learn and Pandas in Python requires a combination of Data Analytics and Machine Learning.
Importing the .csv file using Pandas
First, download the dataset from this link. Then import the Pandas library and convert the .csv file to the Pandas dataframe. You can take any dataset of your choice. Preview your dataframe using the head() method.
import pandas as pd df=pd.read_csv("insurance.csv") df.head()
Output:
age sex bmi children smoker region charges 0 19 female 27.900 0 yes southwest 16884.92400 1 18 male 33.770 1 no southeast 1725.55230 2 28 male 33.000 3 no southeast 4449.46200 3 33 male 22.705 0 no northwest 21984.47061 4 32 male 28.880 0 no northwest 3866.85520
Predict the charge for insurance using sklearn in Python
We will store the features we are using for prediction ie. age, BMI in the X variable. And, the target value to be predicted ie. the charges in the y variable. We are only taking two features for this tutorial, you can take as many as you want. The .values() function is to convert the resulting dataframe t0 a numpy array.
X=df[['age','bmi']].values y=df['charges'].values print(X) print(y)
Output:
The next step is to import the LinearRegression package of the sklearn library to fit our regression model. Firstly, we create the regression model ‘regsr’. Then we train the model using the fit() method. We pass our features and target to our model.
from sklearn.linear_model import LinearRegression regsr=LinearRegression() regsr.fit(X,y)
Output:
LinearRegrsesion(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
Predicting our insurance
Now that we have trained our model, we can start predicting values. For example, we want to determine the insurance cost for a person of age 20 and a Body-Mass Index of 30 ie. [20,30]. We will then convert the list to a numpy array and reshape the array. This array is then passed to the predict() method.
import numpy as np prediction=regsr.predict(np.asarray([20,30]).reshape(-1,2)) print(prediction)
Output:
You can also try using other algorithms like the KNN Classification algorithm and see which one works best.
Dear Sir,
I’m unable to use proper libary for the Programe “SBI Life Insurance” in Jupyter Notebook,I’m Getting error,
I don’t know which libary used to load the sbi life insurance Datasets ,the algorithm is used Logistic Regression but when i’m doing..
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets load_sbi_life_insurance
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report,confusion_matrix
import warnings
warnings.filterwarnings(‘ignore’)
error
File “”, line 4
from sklearn.datasets load_sbi_life_insurance
^
SyntaxError: invalid syntax