Multinomial Logistic Regression in Python
In this tutorial, we will learn how to implement logistic regression using Python. Let us begin with the concept behind multinomial logistic regression. In the binary classification, logistic regression determines the probability of an object to belong to one class among the two classes.
If the predicted probability is greater than 0.5 then it belongs to a class that is represented by 1 else it belongs to the class represented by 0. In multinomial logistic regression, we use the concept of one vs rest classification using binary classification technique of logistic regression.
Now, for example, let us have “K” classes. First, we divide the classes into two parts, “1 “represents the 1st class and “0” represents the rest of the classes, then we apply binary classification in this 2 class and determine the probability of the object to belong in 1st class vs rest of the classes.
Similarly, we apply this technique for the “k” number of classes and return the class with the highest probability. By, this way we determine in which class the object belongs. In this way multinomial logistic regression works. Below there are some diagrammatic representation of one vs rest classification:-
Here there are 3 classes represented by triangles, circles, and squares.
Here we use the one vs rest classification for class 1 and separates class 1 from the rest of the classes.
Here we use the one vs rest classification for class 2 and separates class 2 from the rest of the classes.
Here we use the one vs rest classification for class 3 and separates class 3 from the rest of the classes.
The implementation of multinomial logistic regression in Python
1> Importing the libraries
Here we import the libraries such as numpy, pandas, matplotlib
#importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd
2>Importing the dataset
Here we import the dataset named “dataset.csv”
# Importing the dataset dataset = pd.read_csv('dataset.csv') X = dataset.iloc[:, :20].values y = dataset.iloc[:, 20].values
Here we can see that there are 2000 rows and 21 columns in the dataset, we then extract the independent variables in matrix “X” and dependent variables in matrix “y”. The picture of the dataset is given below:-
3> Splitting the dataset into the Training set and Test set
Here we divide the dataset into 2 parts namely “training” and “test”. Here we take 20% entries for test set and 80% entries for training set
# Splitting the dataset into the Training set and Test set from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=0)
Here we apply feature scaling to scale the independent variables
# Feature Scaling from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test)
5>Fitting classifier to the Training set
Here we fit the logistic classifier to the training set
# Fitting classifier to the Training set # Create your classifier here from sklearn.linear_model import LogisticRegression classifier = LogisticRegression(multi_class='multinomial',solver ='newton-cg') classifier.fit(X_train, y_train)
6> Predicting the Test set results
Here we predict the results for test set
# Predicting the Test set results y_pred = classifier.predict(X_test)
7> Making the Confusion Matrix
Here we make the confusion matrix for observing correct and incorrect predictions
# Making the Confusion Matrix from sklearn.metrics import confusion_matrix cm = confusion_matrix(y_test, y_pred)
Here is the confusion matrix
The above pictures represent the confusion matrix from which we can determine the accuracy of our model.
Here we calculate the accuracy by adding the correct observations and dividing it by total observations from the confusion matrix