Detection of COVID-19 From Chest X-Ray Images Using Machine Learning

In this tutorial, we will learn how to detect COVID-19 from chest X-ray images using machine learning in Python.

Undoubtedly Those who are reading this article are already familiar with the crisis of Coronavirus Whole over the World.

Build a Model that automatically Detect the Patient having Coronavirus or Not

Well! Can you distinguish between two x-ray images and tell which x-ray image is having coronavirus or not. I bet you can’t but a Machine Can.
In this tutorial, we are going to make a model that can predict whether the X-Ray image contains coronavirus or not.
Here is the Approach:

  • You have to create a Dataset contains two folders, in which one has sampled X-Ray images of Normal Patients (which you can get from this Kaggle Link). I have taken around 100 sampled X-ray images of Normal patients.
  • Then you have to create another folder in which you will put the X-Ray images of coronavirus patients. (For this you have to do some Data Analysis Stuffs.)
  • After creating two folders we will merge the images and set the labels
  • Then we will split that into training and testing set and create a VGG model that will predict our data.

So Let’s Deep Dive into the code!!

Get the X-ray Images of COVID-19 Patients

First, you need to collect the X-ray images of the patient’s results positive for coronavirus.
This Kaggle Link contains X-ray images of pneumonia, COVID-19, and Normal patients. We need to figure out the X-Rays Images of coronavirus.

Step-1: Read the Dataset metadata.csv

import numpy as np
import pandas as pd


The first 5 rows of the dataset.

Step-2: Drop the columns with NAN Values


Step-3: Analyze the Finding Column


Detection of COVID-19 From Chest X-Ray Images Using Machine Learning

Step-4: Extract The X-Ray Images that tested Positive for COVID-19

In this Step we will extract the X-rays of COVID-19 patients. for that we will iter over the dataset and count the rows where the finding is equal to COVID-19, and view should be PA(Posterioranterior).

import pandas as pd
import shutil
import os

# Selecting all combination of 'COVID-19' patients with 'PA' X-Ray view
coronavirus = "COVID-19" # Virus to look for
x_ray = "PA" # View of X-Ray

metadata = "metadata.csv" # Metadata.csv Directory
imageDir = "images" # Directory of images
outputDir = 'Data//Covid' # Output directory to store selected images

metadata_csv = pd.read_csv(metadata)

# loop over the rows of the COVID-19 data frame
for (i, row) in metadata_csv.iterrows():
    if row["finding"] != coronavirus or row["view"] != x_ray:

    filename = row['filename'].split(os.path.sep)[-1]
    filePath = os.path.sep.join([imageDir, filename])
    shutil.copy2(filePath, outputDir)



After you got all the X-ray images of COVID-19, you must put it in another folder that mentioned before. I have taken around 100 X-ray images of COVID-19 for this model.
Put the folder inside the dataset folder you have created. Therefore, inside the Dataset folder(But in my case it’s Data), Normal and COVID folders are there. However, you could rename the folders. Now ZIP the folder, as a result, to use Google colab.

Build The Model

Step-1: Mount your drive

from google.colab import drive


Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).

Step-2: Unzip your file

!unzip -q "/content/gdrive/My Drive/"

Step-3: Import all the necessary Libraries

import matplotlib.pyplot as plt
import argparse
import os
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import AveragePooling2D
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split

Step-4: Initialize the Epochs and the Batch Size

INIT_LR = 1e-3
BS = 8
dataset = "/content/Data" #The Dataset

Step-5: Set Labels into the images

import numpy as np
import cv2
iPaths = list(paths.list_images(args["dataset"]))  #image paths
data = []
labels = []
for iPath in iPaths:
    label = iPath.split(os.path.sep)[-2]   #split the image paths
    image = cv2.imread(iPath)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) #Convert images into RGB Channel
    image = cv2.resize(image, (224, 224))  #Resizing the images
data = np.array(data) / 255.0
labels = np.array(labels)

Firstly we will load the data, on the other hand, we will fetch the images present inside the Data. Then set labels according to the image. as a result, we then scale pixel intensities to the range [0,1] and convert both Data and Labels to NumPy array format.

Meanwhile, Let’s Have a Look of the X-rays

import os
Data_Dir = "Data//"
Cimages = os.listdir(Data_Dir+"Covid")
Nimages = os.listdir(Data_Dir+"Normal")
import matplotlib.pyplot as plt
import cv2
import skimage
from skimage.transform import resize
import numpy as np
def plotter(i):
    normal = cv2.imread(Data_Dir+"Normal//"+Nimages[i])
    normal = skimage.transform.resize(normal, (150, 150, 3))
    coronavirus = cv2.imread(Data_Dir+"Covid//"+Cimages[i])
    coronavirus = skimage.transform.resize(coronavirus, (150, 150, 3) , mode = 'reflect')
    pair = np.concatenate((normal, coronavirus), axis=1)
    print("Normal Chest X-ray Vs Covid-19 Chest X-ray")
for i in range(0,5):

X-rays normal vs covid-19

Here the output shows the first row. However, the Output will show up to 5 rows.

Step-6: Perform One Hot Encoding into the Labels

LB = LabelBinarizer()  #Initialize label binarizer
labels = LB.fit_transform(labels)
labels = to_categorical(labels); print(labels)
(X_train, X_test, Y_train, Y_test) = train_test_split(data, labels,test_size=0.20, stratify=labels, random_state=42)
trainAug = ImageDataGenerator(

Here we perform one-hot encoding. for instance, in addition, the COVID-19 label is 0 Likewise, Normal is 1. On the other hand, we split our data into training and testing sets. where the training set contains 80% of the data in the same vein test set contains 20%.

Step-7: Create The VGG Model

bModel = VGG16(weights="imagenet", include_top=False,input_tensor=Input(shape=(224, 224, 3)))  #base_Model
hModel = bModel.output #head_Model
hModel = AveragePooling2D(pool_size=(4, 4))(hModel)
hModel = Flatten(name="flatten")(hModel)
hModel = Dense(64, activation="relu")(hModel)
hModel = Dropout(0.5)(hModel)
hModel = Dense(2, activation="softmax")(hModel)
model = Model(inputs=bModel.input, outputs=hModel)
for layer in bModel.layers:
    layer.trainable = False

Create a VGG Model. In addition Left the Top layer empty(include_top=False). subsequently, construct a fully connected layer and append it on the top of the VGG model.

Subsequently, let’s check the training and test set. In other words, analyze the shape of training and test data.



((160, 224, 224, 3), (40, 224, 224, 3), (160, 2), (40, 2))

In short, we left with 160 images for training and 40 images for testing.

Let’s analyze the training data. To clarify what’s in the training data.

W_grid = 4 #width
L_grid = 4 #lenth
fig, axes = plt.subplots(L_grid, W_grid, figsize = (25, 25)) #subplots
axes = axes.ravel()
n_training = len(X_train)
for i in np.arange(0, L_grid * W_grid):
    index = np.random.randint(0, n_training) # pick a random number
plt.subplots_adjust(hspace = 0.4) #hspace indicates the space between the height of the images

analyze the training data

Here the output shows only the first row. But, you will get the output according to your range.

Step-8: Train and Compile the model

opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model.compile(loss="binary_crossentropy", optimizer=opt,metrics=["accuracy"])
print("Compiling Starts")
R = model.fit_generator(
    trainAug.flow(X_train, Y_train, batch_size=BS),
    steps_per_epoch=len(X_train) // BS,
    validation_data=(X_test, Y_test),
    validation_steps=len(X_test) // BS,

Train and Compile the model

Step-9: Predict the test set and compare it with the test data.

L = 6
W = 5
fig, axes = plt.subplots(L, W, figsize = (12, 12))
axes = axes.ravel()
y_pred = model.predict(X_test, batch_size=BS)
for i in np.arange(0,L*W):
    axes[i].set_title('Prediction = {}\n True = {}'.format(y_pred.argmax(axis=1)[i], Y_test.argmax(axis=1)[i]))
plt.subplots_adjust(wspace = 1, hspace=1)

Predict the test set and compare it with the test data

Last Step: Get the classification report and accuracy.

from sklearn.metrics import classification_report
y_pred = model.predict(X_test, batch_size=BS)
y_pred = np.argmax(y_pred, axis=1)
print(classification_report(Y_test.argmax(axis=1), y_pred,target_names=LB.classes_))

Get the classification report and accuracy

On the other hand, Construct the Confusion Matrix.

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(Y_test.argmax(axis=1), y_pred)
total = sum(sum(cm))
acc = (cm[0, 0] + cm[1, 1]) / total
sensitivity = cm[0, 0] / (cm[0, 0] + cm[0, 1])
specificity = cm[1, 1] / (cm[1, 0] + cm[1, 1])
print("acc: {:.4f}".format(acc))
print("sensitivity: {:.4f}".format(sensitivity))
print("specificity: {:.4f}".format(specificity))


[[19  1]
 [ 0 20]]
acc: 0.9750
sensitivity: 0.9500
specificity: 1.0000

So we got a good accuracy of around 97%. in short with 39 correct predictions, and 1 incorrect prediction.

Plot the loss and accuracy

# plot the loss
plt.plot(R.history['loss'], label='train loss')
plt.plot(R.history['val_loss'], label='val loss')

# plot the accuracy
plt.plot(R.history['accuracy'], label='train acc')
plt.plot(R.history['val_accuracy'], label='val acc')

Plot the loss and accuracy

Let’s Check Our Model

Firstly Save the model.

import tensorflow as tf
from keras.models import load_model'Covid_model.h5')

After that, Load and compile the model.

import tensorflow as tf 
model = tf.keras.models.load_model('Covid_model.h5')

Test the model with a new data

from keras.preprocessing import image
from keras.models import load_model
from keras.applications.vgg16 import preprocess_input
img = image.load_img('Data/Covid/1-s2.0-S1684118220300682-main.pdf-002-a1.png', target_size=(224, 224)) #insert a random covid-19 x-ray image
imgplot = plt.imshow(img)
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
img_data = preprocess_input(x)
classes = model.predict(img_data)
New_pred = np.argmax(classes, axis=1)
if New_pred==[1]:
  print('Prediction: Normal')
  print('Prediction: Corona')

Detection of COVID-19 From Chest X-Ray Images Using Machine Learning

Let’s check another.

img = image.load_img('Data/Normal/IM-0162-0001.jpeg', target_size=(224, 224)) #insert a random normal x-ray image
imgplot = plt.imshow(img)
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
img_data = preprocess_input(x)
classes = model.predict(img_data)
New_pred = np.argmax(classes, axis=1)
if New_pred==[1]:
  print('Prediction: Normal')
  print('Prediction: Corona')

Detection of COVID-19 From Chest X-Ray Images Using Machine Learning

To get the full code Click here.


Thanks! for reading this article. In short, this model is for educational purposes only.
Also, read the loan prediction project

One response to “Detection of COVID-19 From Chest X-Ray Images Using Machine Learning”

  1. Ansh adlakha says:

    Seriously awesome work girl

Leave a Reply

Your email address will not be published. Required fields are marked *