YOLO Object Detection from image with OpenCV and Python
In this tutorial, we will be learning how to use Python and OpenCV in order to detect an object from an image with the help of the YOLO algorithm. We will be using PyCharm IDE to solve this problem.
YOLO is an object detection algorithm or model that was launched in May 2016. YOLO stands for “You Only Look Once”. This algorithm looks at the entire image in one go and detects objects.
We load the algorithm. In order to load the algorithm, we need these 3 files:
- Weight file: The trained model that detects the objects.
- Cfg file: The configuration file
- Name files: Consists of the names of the objects that this algorithm can detect
Click on the above highlights links to download these files.
Prerequisites
In order to build this program, we’ll require the following header files:
- cv2
- NumPy
import cv2 import numpy as np
We will be testing our program with this Input Image
Load Yolo In Our Python Program
We follow the following steps:
- Use the files we have downloaded
- Load classes from the file i.e the objects that Yolo can detect
- Then we have to use the getLayerNames() function and getUnconnectedOutLayers() function to get the output layers.
#Load YOLO Algorithms\ net=cv2.dnn.readNet("yolov3.weights","yolov3.cfg") #To load all objects that have to be detected classes=[] with open("coco.names","r") as f: read=f.readlines() for i in range(len(read)): classes.append(read[i].strip("\n")) #Defining layer names layer_names=net.getLayerNames() output_layers=[] for i in net.getUnconnectedOutLayers(): output_layers.append(layer_names[i[0]-1])
Load The Image File
We follow the following steps:
- Use imread() funciton to read the image
- Use .shape to get the height,width and channels of the image
#Loading the Image img=cv2.imread("Road.jpg") height,width,channels=img.shape
Extracting features to detect objects
BLOB stands for Binary Large Object and refers to a group of connected pixels in a binary image.
We follow the following steps:
- Use blobFromImage() function to extract the blob
- Pass this blob image into the algorithm
- Use forward() to forward the blob to the output layer to generate the result
#Extracting features to detect objects blob=cv2.dnn.blobFromImage(img,0.00392,(416,416),(0,0,0),True,crop=False) #Standard #Inverting blue with red #ImageSize #bgr->rgb #We need to pass the img_blob to the algorithm net.setInput(blob) outs=net.forward(output_layers)
Displaying Information On The Screen
Here, we are going through the result to retrieve the scores,class_id and confidence of a particular object detected. If the cofidence is greated that 0.5, then we use the coordinate values to draw a rectangle around the object.
#Displaying information on the screen class_ids=[] confidences=[] boxes=[] for output in outs: for detection in output: #Detecting confidence in 3 steps scores=detection[5:] #1 class_id=np.argmax(scores) #2 confidence =scores[class_id] #3 if confidence >0.5: #Means if the object is detected center_x=int(detection[0]*width) center_y=int(detection[1]*height) w=int(detection[2]*width) h=int(detection[3]*height) #Drawing a rectangle x=int(center_x-w/2) # top left value y=int(center_y-h/2) # top left value boxes.append([x,y,w,h]) confidences.append(float(confidence)) class_ids.append(class_id) cv2.rectangle(img,(x,y),(x+w,y+h),(0,255,0),2)
But if we try to print the result, our program draws double boxes around some objects which is not correct
Removing Double Boxes
We will be using the NoMarkSupression function to remove the double boxes from our result and thus get only the top and bottom coordinates of the required object.
#Removing Double Boxes indexes=cv2.dnn.NMSBoxes(boxes,confidences,0.3,0.4) for i in range(len(boxes)): if i in indexes: x, y, w, h = boxes[i] label = classes[class_ids[i]] # name of the objects cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2) cv2.putText(img, label, (x, y), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2)
Printing the Output
We’ll create a new variable to store the original image that we just processed just to compare it with the resulting image we get after we run the program.
cv2.imshow("Output",img) cv2.waitKey(0) cv2.destroyAllWindows()
Complete Code
Here is the complete code for this program
import cv2 import numpy as np #Load YOLO Algorithm net=cv2.dnn.readNet("yolov3.weights","yolov3.cfg") #To load all objects that have to be detected classes=[] with open("coco.names","r") as f: read=f.readlines() for i in range(len(read)): classes.append(read[i].strip("\n")) #Defining layer names layer_names=net.getLayerNames() output_layers=[] for i in net.getUnconnectedOutLayers(): output_layers.append(layer_names[i[0]-1]) #Loading the Image img=cv2.imread("Road.jpg") height,width,channels=img.shape #Extracting features to detect objects blob=cv2.dnn.blobFromImage(img,0.00392,(416,416),(0,0,0),True,crop=False) #Inverting blue with red #bgr->rgb #We need to pass the img_blob to the algorithm net.setInput(blob) outs=net.forward(output_layers) #print(outs) #Displaying informations on the screen class_ids=[] confidences=[] boxes=[] for output in outs: for detection in output: #Detecting confidence in 3 steps scores=detection[5:] #1 class_id=np.argmax(scores) #2 confidence =scores[class_id] #3 if confidence >0.5: #Means if the object is detected center_x=int(detection[0]*width) center_y=int(detection[1]*height) w=int(detection[2]*width) h=int(detection[3]*height) #Drawing a rectangle x=int(center_x-w/2) # top left value y=int(center_y-h/2) # top left value boxes.append([x,y,w,h]) confidences.append(float(confidence)) class_ids.append(class_id) #cv2.rectangle(img,(x,y),(x+w,y+h),(0,255,0),2) #Removing Double Boxes indexes=cv2.dnn.NMSBoxes(boxes,confidences,0.3,0.4) for i in range(len(boxes)): if i in indexes: x, y, w, h = boxes[i] label = classes[class_ids[i]] # name of the objects cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2) cv2.putText(img, label, (x, y), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2) cv2.imshow("Output",img) cv2.waitKey(0) cv2.destroyAllWindows()
Now if we run our program, we will able to see the final output image just like below:
We get our final image with all the objects highlighted with their names
Hope this post helps you understand the concept of YOLO Object Detection with OpenCV and Python
Hi,
thanks a lot for your article! This is a very interesting topic and good short sample to start working with it.
A small remark. I currently have a problem compiling this code.
In the Python 3.10.5 I get error for the line
output_layers.append(layer_names[i[0]-1])
“IndexError: invalid index to scalar variable.”
Could you please help and/or update your code here?
output_layers.append(layer_names[i[0]-1])
IndexError: invalid index to scalar variable.
**I study yolo in many tutorials and yours. In yours tutorial, I can’t solve this error. I am beginner in computer vision and python.
**Please, help me