How to build a neural network that classifies images in Python

Fellow coders, in this tutorial we are going to build a deep neural network that classifies images using the Python programming language and it’s most popular open-source computer vision library “OpenCV”. We will also use “NumPy” to perform operations on our data.

If you are interested in Computer Vision and you are just starting on this journey then this tutorial is for you. Computer Vision is a field in which we teach computers to “see” and “understand” the contents of the image or video.

Without further ado, let’s dive into this tutorial.

Downloading models and image

To follow along, you must download the pre-trained models for object detection. In this tutorial, we are going to use the Caffe model. The download link is provided below:
http://dl.caffe.berkeleyvision.org/bvlc_googlenet.caffemodel

Now, we need to download the synset_words file:
synset_words.txt
The ImageNet database is organized according to the wordnet hierarchy. Each meaningful concept in wordnet is called a synonym set or a synset. These 1000 classes are stored in this synset file.

Next, we need to download the “googlenet.prototxt” file from the following link:
bvlc_googlenet.prototxt

This is a zip file so unzip it.

Note: After downloading all the three files store them in a separate folder. You can name it whatever you want but for the sake of this tutorial let’s name the folder “models”.

For this tutorial, we will use the butterfly.jpg image from OpenCV GitHub page:
https://github.com/opencv/opencv/blob/master/samples/data/butterfly.jpg

Note: After downloading the image store it in a separate folder named “images”.

Working with the Code

Open a new Python file in your text editor in the same directory where you created the “models” and “images” folder and name it “dnn_image.py”.

Now Let’s start writing code in our file. Import “cv2” and “numpy” at the beginning of our file.

import cv2 as cv
import numpy as np

 

The code above will successfully import OpenCV and numpy in our working file. Next, we read the image we want to classify using OpenCV’s “imread” function.

img = cv.imread("images/butterfly.jpg")

 

Now we get all the rows from the file “synset_words” using the Python split() function. After that, get all the classes of words from these rows using list comprehension.

all_rows = open('models/synset_words.txt').read().strip().split('\n')

classes = [r[r.find(' ') + 1:] for r in all_rows]

Next, using OpenCV’s “dnn” module we will load the prototxt file and the Caffe model in our network. We then create our blob which will act as an input to our neural network. We can see in our “.prototxt” file that the model expects images of size 224 * 224. In the “.blobFromImage()” function the second argument “1” is the scale factor. 1 is the default value, which means that we do not want our image to be scaled. The third argument is the image size. After creating the blob we then set it as input to the network. Subsequently, we perform a forward pass to get the prediction for each of 1,000 classes.

net = cv.dnn.readNetFromCaffe('models/bvlc_googlenet.prototxt', 'models/bvlc_googlenet.caffemodel')

# enter 1 after img so that it doesn't resize it
blob = cv.dnn.blobFromImage(img, 1, (224,224))
net.setInput(blob)

outp = net.forward()

 

Also, read:Detecting the handwritten digit in Python

We only want the top 5 predictions (and not all of them) sorted in descending order of probability. We can perform this operation easily in NumPy.

idx = np.argsort(outp[0])[::-1][:5]

 

Finally, we will display the top 5 predictions in our terminal window. We use OpenCV’s “imshow()” function to display the image in a window. The first argument in this function contains the name of the window and the second contain the image itself. We then set the waitKey(0).

for (i, obj_id) in enumerate(idx):
    print('{}. {} ({}): Probability {:.3}%'.format(i+1, classes[obj_id], obj_id, outp[0][obj_id]*100 ))

cv.imshow('butterfly', img)
cv.waitKey(0)
cv.destroyAllWindows()

 

Now, let us look at the entire segment of code that we just wrote.

import cv2 as cv
import numpy as np

img = cv.imread('images/butterfly.jpg')

all_rows = open('models/synset_words.txt').read().strip().split('\n')

classes = [r[r.find(' ') + 1:] for r in all_rows]

net = cv.dnn.readNetFromCaffe('models/bvlc_googlenet.prototxt', 'models/bvlc_googlenet.caffemodel')

# enter 1 after img so that it doesn't resize it
blob = cv.dnn.blobFromImage(img, 1, (224,224))
net.setInput(blob)

outp = net.forward()
# you can try: print(outp)

idx = np.argsort(outp[0])[::-1][:5]

for (i, obj_id) in enumerate(idx):
    print('{}. {} ({}): Probability {:.3}%'.format(i+1, classes[obj_id], obj_id, outp[0][obj_id]*100 ))


cv.imshow('butterfly', img)
cv.waitKey(0)
cv.destroyAllWindows()

 

The output of the above code is as follows:

1. ringlet, ringlet butterfly (322): Probability 65.6%
2. lycaenid, lycaenid butterfly (326): Probability 23.0%
3. sulphur butterfly, sulfur butterfly (325): Probability 5.09%
4. monarch, monarch butterfly, milkweed butterfly, Danaus plexippus (323): Probability 2.96%
5. lacewing, lacewing fly (318): Probability 1.27%

Leave a Reply

Your email address will not be published. Required fields are marked *