Detection of COVID-19 From Chest X-Ray Images Using Machine Learning

In this tutorial, we will learn how to detect COVID-19 from chest X-ray images using machine learning in Python.

Undoubtedly Those who are reading this article are already familiar with the crisis of Coronavirus Whole over the World.

Build a Model that automatically Detect the Patient having Coronavirus or Not

Well! Can you distinguish between two x-ray images and tell which x-ray image is having coronavirus or not. I bet you can’t but a Machine Can.
In this tutorial, we are going to make a model that can predict whether the X-Ray image contains coronavirus or not.
Here is the Approach:

  • You have to create a Dataset contains two folders, in which one has sampled X-Ray images of Normal Patients (which you can get from this Kaggle Link). I have taken around 100 sampled X-ray images of Normal patients.
  • Then you have to create another folder in which you will put the X-Ray images of coronavirus patients. (For this you have to do some Data Analysis Stuffs.)
  • After creating two folders we will merge the images and set the labels
  • Then we will split that into training and testing set and create a VGG model that will predict our data.

So Let’s Deep Dive into the code!!

Get the X-ray Images of COVID-19 Patients

First, you need to collect the X-ray images of the patient’s results positive for coronavirus.
This Kaggle Link contains X-ray images of pneumonia, COVID-19, and Normal patients. We need to figure out the X-Rays Images of coronavirus.

Step-1: Read the Dataset metadata.csv

import numpy as np
import pandas as pd
covid_data=pd.read_csv('metadata.csv')
covid_data.head()

Output:

The first 5 rows of the dataset.

Step-2: Drop the columns with NAN Values

covid_data.dropna(axis=1,inplace=True)

Step-3: Analyze the Finding Column

covid_data.groupby('finding').count()

Output:
Detection of COVID-19 From Chest X-Ray Images Using Machine Learning

Step-4: Extract The X-Ray Images that tested Positive for COVID-19

In this Step we will extract the X-rays of COVID-19 patients. for that we will iter over the dataset and count the rows where the finding is equal to COVID-19, and view should be PA(Posterioranterior).

import pandas as pd
import shutil
import os

# Selecting all combination of 'COVID-19' patients with 'PA' X-Ray view
coronavirus = "COVID-19" # Virus to look for
x_ray = "PA" # View of X-Ray

metadata = "metadata.csv" # Metadata.csv Directory
imageDir = "images" # Directory of images
outputDir = 'Data//Covid' # Output directory to store selected images

metadata_csv = pd.read_csv(metadata)

# loop over the rows of the COVID-19 data frame
for (i, row) in metadata_csv.iterrows():
    if row["finding"] != coronavirus or row["view"] != x_ray:
        continue

    filename = row['filename'].split(os.path.sep)[-1]
    filePath = os.path.sep.join([imageDir, filename])
    shutil.copy2(filePath, outputDir)
print('Done')

Output:

Done

After you got all the X-ray images of COVID-19, you must put it in another folder that mentioned before. I have taken around 100 X-ray images of COVID-19 for this model.
Put the folder inside the dataset folder you have created. Therefore, inside the Dataset folder(But in my case it’s Data), Normal and COVID folders are there. However, you could rename the folders. Now ZIP the folder, as a result, to use Google colab.

Build The Model

Step-1: Mount your drive

from google.colab import drive
drive.mount('/content/gdrive')

Output:

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).

Step-2: Unzip your file

!unzip -q "/content/gdrive/My Drive/Data.zip"

Step-3: Import all the necessary Libraries

import matplotlib.pyplot as plt
import argparse
import os
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import AveragePooling2D
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split

Step-4: Initialize the Epochs and the Batch Size

INIT_LR = 1e-3
EPOCHS = 10
BS = 8
dataset = "/content/Data" #The Dataset
args={}
args["dataset"]=dataset

Step-5: Set Labels into the images

import numpy as np
import cv2
iPaths = list(paths.list_images(args["dataset"]))  #image paths
data = []
labels = []
for iPath in iPaths:
    label = iPath.split(os.path.sep)[-2]   #split the image paths
    image = cv2.imread(iPath)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) #Convert images into RGB Channel
    image = cv2.resize(image, (224, 224))  #Resizing the images
    data.append(image)
    labels.append(label)
data = np.array(data) / 255.0
labels = np.array(labels)

Firstly we will load the data, on the other hand, we will fetch the images present inside the Data. Then set labels according to the image. as a result, we then scale pixel intensities to the range [0,1] and convert both Data and Labels to NumPy array format.

Meanwhile, Let’s Have a Look of the X-rays

import os
Data_Dir = "Data//"
Cimages = os.listdir(Data_Dir+"Covid")
Nimages = os.listdir(Data_Dir+"Normal")
import matplotlib.pyplot as plt
import cv2
import skimage
from skimage.transform import resize
import numpy as np
def plotter(i):
    normal = cv2.imread(Data_Dir+"Normal//"+Nimages[i])
    normal = skimage.transform.resize(normal, (150, 150, 3))
    coronavirus = cv2.imread(Data_Dir+"Covid//"+Cimages[i])
    coronavirus = skimage.transform.resize(coronavirus, (150, 150, 3) , mode = 'reflect')
    pair = np.concatenate((normal, coronavirus), axis=1)
    print("Normal Chest X-ray Vs Covid-19 Chest X-ray")
    plt.figure(figsize=(10,5))
    plt.imshow(pair)
    plt.show()
for i in range(0,5):
    plotter(i)

Output:
X-rays normal vs covid-19

Here the output shows the first row. However, the Output will show up to 5 rows.

Step-6: Perform One Hot Encoding into the Labels

LB = LabelBinarizer()  #Initialize label binarizer
labels = LB.fit_transform(labels)
labels = to_categorical(labels); print(labels)
(X_train, X_test, Y_train, Y_test) = train_test_split(data, labels,test_size=0.20, stratify=labels, random_state=42)
trainAug = ImageDataGenerator(
    rotation_range=15,
    fill_mode="nearest")

Here we perform one-hot encoding. for instance, in addition, the COVID-19 label is 0 Likewise, Normal is 1. On the other hand, we split our data into training and testing sets. where the training set contains 80% of the data in the same vein test set contains 20%.

Step-7: Create The VGG Model

bModel = VGG16(weights="imagenet", include_top=False,input_tensor=Input(shape=(224, 224, 3)))  #base_Model
hModel = bModel.output #head_Model
hModel = AveragePooling2D(pool_size=(4, 4))(hModel)
hModel = Flatten(name="flatten")(hModel)
hModel = Dense(64, activation="relu")(hModel)
hModel = Dropout(0.5)(hModel)
hModel = Dense(2, activation="softmax")(hModel)
model = Model(inputs=bModel.input, outputs=hModel)
for layer in bModel.layers:
    layer.trainable = False

Create a VGG Model. In addition Left the Top layer empty(include_top=False). subsequently, construct a fully connected layer and append it on the top of the VGG model.

Subsequently, let’s check the training and test set. In other words, analyze the shape of training and test data.

X_train.shape,X_test.shape,Y_train.shape,Y_test.shape

Output:

((160, 224, 224, 3), (40, 224, 224, 3), (160, 2), (40, 2))

In short, we left with 160 images for training and 40 images for testing.

Let’s analyze the training data. To clarify what’s in the training data.

W_grid = 4 #width
L_grid = 4 #lenth
fig, axes = plt.subplots(L_grid, W_grid, figsize = (25, 25)) #subplots
axes = axes.ravel()
n_training = len(X_train)
for i in np.arange(0, L_grid * W_grid):
    index = np.random.randint(0, n_training) # pick a random number
    axes[i].imshow(X_train[index])
    axes[i].set_title(Y_train[index])
    axes[i].axis('off')
plt.subplots_adjust(hspace = 0.4) #hspace indicates the space between the height of the images

Output:
analyze the training data

Here the output shows only the first row. But, you will get the output according to your range.

Step-8: Train and Compile the model

opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model.compile(loss="binary_crossentropy", optimizer=opt,metrics=["accuracy"])
print("Compiling Starts")
R = model.fit_generator(
    trainAug.flow(X_train, Y_train, batch_size=BS),
    steps_per_epoch=len(X_train) // BS,
    validation_data=(X_test, Y_test),
    validation_steps=len(X_test) // BS,
    epochs=EPOCHS)

Output:
Train and Compile the model

Step-9: Predict the test set and compare it with the test data.

L = 6
W = 5
fig, axes = plt.subplots(L, W, figsize = (12, 12))
axes = axes.ravel()
y_pred = model.predict(X_test, batch_size=BS)
for i in np.arange(0,L*W):
    axes[i].imshow(X_test[i])
    axes[i].set_title('Prediction = {}\n True = {}'.format(y_pred.argmax(axis=1)[i], Y_test.argmax(axis=1)[i]))
    axes[i].axis('off')
plt.subplots_adjust(wspace = 1, hspace=1)

Output:
Predict the test set and compare it with the test data

Last Step: Get the classification report and accuracy.

from sklearn.metrics import classification_report
y_pred = model.predict(X_test, batch_size=BS)
y_pred = np.argmax(y_pred, axis=1)
print(classification_report(Y_test.argmax(axis=1), y_pred,target_names=LB.classes_))

Output:
Get the classification report and accuracy

On the other hand, Construct the Confusion Matrix.

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(Y_test.argmax(axis=1), y_pred)
total = sum(sum(cm))
acc = (cm[0, 0] + cm[1, 1]) / total
sensitivity = cm[0, 0] / (cm[0, 0] + cm[0, 1])
specificity = cm[1, 1] / (cm[1, 0] + cm[1, 1])
print(cm)
print("acc: {:.4f}".format(acc))
print("sensitivity: {:.4f}".format(sensitivity))
print("specificity: {:.4f}".format(specificity))

Output:

[[19  1]
 [ 0 20]]
acc: 0.9750
sensitivity: 0.9500
specificity: 1.0000

So we got a good accuracy of around 97%. in short with 39 correct predictions, and 1 incorrect prediction.

Plot the loss and accuracy

# plot the loss
plt.plot(R.history['loss'], label='train loss')
plt.plot(R.history['val_loss'], label='val loss')
plt.legend()
plt.show()
plt.savefig('Validation_loss')

# plot the accuracy
plt.plot(R.history['accuracy'], label='train acc')
plt.plot(R.history['val_accuracy'], label='val acc')
plt.legend()
plt.show()
plt.savefig('Validation_accuracy')

Output:
Plot the loss and accuracy

Let’s Check Our Model

Firstly Save the model.

import tensorflow as tf
from keras.models import load_model
model.save('Covid_model.h5')

After that, Load and compile the model.

import tensorflow as tf 
model = tf.keras.models.load_model('Covid_model.h5')
model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

Test the model with a new data

from keras.preprocessing import image
from keras.models import load_model
from keras.applications.vgg16 import preprocess_input
img = image.load_img('Data/Covid/1-s2.0-S1684118220300682-main.pdf-002-a1.png', target_size=(224, 224)) #insert a random covid-19 x-ray image
imgplot = plt.imshow(img)
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
img_data = preprocess_input(x)
classes = model.predict(img_data)
New_pred = np.argmax(classes, axis=1)
if New_pred==[1]:
  print('Prediction: Normal')
else:
  print('Prediction: Corona')

Output:
Detection of COVID-19 From Chest X-Ray Images Using Machine Learning

Let’s check another.

img = image.load_img('Data/Normal/IM-0162-0001.jpeg', target_size=(224, 224)) #insert a random normal x-ray image
imgplot = plt.imshow(img)
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
img_data = preprocess_input(x)
classes = model.predict(img_data)
New_pred = np.argmax(classes, axis=1)
if New_pred==[1]:
  print('Prediction: Normal')
else:
  print('Prediction: Corona')

Output:
Detection of COVID-19 From Chest X-Ray Images Using Machine Learning

To get the full code Click here.

Conclusion

Thanks! for reading this article. In short, this model is for educational purposes only.
Also, read the loan prediction project

2 responses to “Detection of COVID-19 From Chest X-Ray Images Using Machine Learning”

  1. Ansh adlakha says:

    Seriously awesome work girl

  2. Silviu says:

    I ran the dataset script to select covid-19 images from the images folder and it does not copy the images in the new folder. I get no error or anything.
    import pandas as pd
    import shutil
    import os
    # Selecting all combination of ‘COVID-19’ patients with ‘PA’ X-Ray view
    coronavirus = “COVID-19” # Virus to look for
    x_ray = “PA” # View of X-Ray
    metadata = “data/cohen/metadata.csv” # Metadata.csv Directory
    imageDir = “data/cohen/images” # Directory of images
    outputDir = ‘data/cohen/Covid’ # Output directory to store selected images
    metadata_csv = pd.read_csv(metadata)
    # loop over the rows of the COVID-19 data frame
    for (i, row) in metadata_csv.iterrows():
    if row[“finding”] != coronavirus or row[“view”] != x_ray:
    continue
    filename = row[‘filename’].split(os.path.sep)[-1]
    filePath = os.path.sep.join([imageDir, filename])
    shutil.copy2(filePath, outputDir)
    print(‘Done’)

    this is my code. Can you see any mistake?

Leave a Reply

Your email address will not be published. Required fields are marked *