Loan Eligibility prediction using Machine Learning Models in Python
In this tutorial, you will learn how to predict the loan status using machine learning models in Python.
Loan Eligibility prediction using Machine Learning
Steps involved:
- Loading packages
- Understanding the data
- Data preprocessing
- Training the model
- Prediction
Loading packages:
import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline import warnings warnings.filterwarnings("ignore")
Download the dataset that we have used here: dataset csv file
Reading the data:
df= pd.read_csv('PATH OF THE DATASET') df.head()
Finding missing values:
df.isnull().sum()
By running this command we will get know to the count of missing values in each row.
To improve the accuracy we need to replace the missing value by mode of most
frequent value of respective attribute.We can achieve it by the following code:
df['Gender'].fillna(df['Gender'].mode()[0], inplace=True) df['Married'].fillna(df['Married'].mode()[0], inplace=True) df['Credit_History'].fillna(df['Credit_History'].mode()[0], inplace=True) df['Self_Employed'].fillna(df['Self_Employed'].mode()[0], inplace=True) df['Dependents'].fillna(df['Dependents'].mode()[0], inplace=True) df['Loan_Amount_Term'].fillna(df['Loan_Amount_Term'].mode()[0], inplace=True) df['LoanAmount'].fillna(df['LoanAmount'].median(), inplace=True)
Outlier treatment:
By visualising the data we will get to know that there are outliers in loan amount.
Removing outliers increases accuracy. We can acheive it by the following code:
df['LoanAmount_log']=np.log(df['LoanAmount']) df['LoanAmount_log'].hist(bins=20)
Dropping irrelevant attributes:
Loan_ID in the dataset is irrelevant as it does not affect loan eligibility. We can drop it by the following code:
df = df.drop('Loan_ID',axis=1)
Let X be the independent variable and y be dependent variable.
X = df.drop('Loan_Status',1) y = df.Loan_Status
Data splitting:
To train the model we split the data into train data and test data by splitting a factor of 0.3.
from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test = train_test_split(X,y, test_size=0.3)
Model Creation & Prediction:
In this tutorial, we have used three classification techniques to predict loan eligibility.
1) Logistic regression model:
from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score model = LogisticRegression(C=1) model.fit(x_train, y_train) y_pred= model.predict(x_test) print("Accuracy: ",accuracy_score(y_pred,y_test)
Accuracy: 0.94
2) Linear SVM:
from sklearn.linear_model import SGDClassifier model = SGDClassifier(alpha=0.001, random_state=5, max_iter=15, tol=None) model.fit(x_train, y_train) y_pred = model.predict(x_test) print("Accuracy: ",accuracy_score(y_pred,y_test))
Accuracy: 0.95
3) Random forest classifier:
from sklearn.ensemble import RandomForestClassifier model = RandomForestClassifier(n_estimators=100) model.fit(x_train, y_train) y_pred = model.predict(x_test) print("Accuracy: ",accuracy_score(y_pred,y_test))
Accuracy: 0.9621621621621622
So we can conclude that our predictions are almost 90% accurate.
Also read: Loan Prediction Project using Machine Learning in Python
Please provide the dataset download link
Dataset link is already given, Kindly read the tutorial carefully.