Loan Eligibility prediction using Machine Learning Models in Python

In this tutorial, you will learn how to predict the loan status using machine learning models in Python.

Loan Eligibility prediction using Machine Learning

Steps involved:

  1. Loading packages
  2. Understanding the data
  3. Data preprocessing
  4. Training the model
  5. Prediction

Loading packages:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")

Download the dataset that we have used here: dataset csv file

Reading the data:

df= pd.read_csv('PATH OF THE DATASET')
df.head()

Finding missing values:

df.isnull().sum()

By running this command we will get know to the count of missing values in each row.
To improve the accuracy we need to replace the missing value by mode of most
frequent value of respective attribute.We can achieve it by the following code:

df['Gender'].fillna(df['Gender'].mode()[0], inplace=True)
df['Married'].fillna(df['Married'].mode()[0], inplace=True)
df['Credit_History'].fillna(df['Credit_History'].mode()[0], inplace=True)
df['Self_Employed'].fillna(df['Self_Employed'].mode()[0], inplace=True)
df['Dependents'].fillna(df['Dependents'].mode()[0], inplace=True)
df['Loan_Amount_Term'].fillna(df['Loan_Amount_Term'].mode()[0], inplace=True)
df['LoanAmount'].fillna(df['LoanAmount'].median(), inplace=True)

Outlier treatment:
By visualising the data we will get to know that there are outliers in loan amount.
Removing outliers increases accuracy. We can acheive it by the following code:

df['LoanAmount_log']=np.log(df['LoanAmount'])
df['LoanAmount_log'].hist(bins=20)

Dropping irrelevant attributes:
Loan_ID in the dataset is irrelevant as it does not affect loan eligibility. We can drop it by the following code:

df = df.drop('Loan_ID',axis=1)

Let X be the independent variable and y be dependent variable.

X = df.drop('Loan_Status',1)
y = df.Loan_Status

Data splitting:
To train the model we split the data into train data and test data by splitting a factor of 0.3.

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X,y, test_size=0.3)

Model Creation & Prediction:
In this tutorial, we have used three classification techniques to predict loan eligibility.
1) Logistic regression model:

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
model = LogisticRegression(C=1)
model.fit(x_train, y_train)
y_pred= model.predict(x_test) 
print("Accuracy: ",accuracy_score(y_pred,y_test)
Accuracy: 0.94

2) Linear SVM:

from sklearn.linear_model import SGDClassifier
model = SGDClassifier(alpha=0.001, random_state=5, max_iter=15, tol=None)
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
print("Accuracy: ",accuracy_score(y_pred,y_test))
Accuracy: 0.95

3) Random forest classifier:

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100)
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
print("Accuracy: ",accuracy_score(y_pred,y_test))
Accuracy: 0.9621621621621622

So we can conclude that our predictions are almost 90% accurate.

Also read: Loan Prediction Project using Machine Learning in Python

Leave a Reply

Your email address will not be published. Required fields are marked *