Object recognition with Yolo and OpenCV

Tags: Artificial Intelligence, OpenCV, Python, Vision

In this tutorial, we’ll look at how to perform object recognition with Yolo and OpenCV, using a neural network pre-trained with deep learning.

We saw in a previous tutorial how to recognize simple shapes using computer vision. This method only works for certain predefined simple shapes. If you want to recognize a wider variety of objects, the easiest way is to use artificial intelligence.

Hardware

A computer with a Python3 installation
A camera

Principle

Artificial intelligence is a field of computer science in which the program itself learns to perform certain tasks. Visual recognition in particular. In this tutorial, we’ll use a trained neural network to recognize particular shapes.

You need a lot of data to train a neural network properly. It has been shown that learning is faster on a neural network trained for something else. For example, a neural network trained to recognize dogs will train more easily to recognize cats.

Ultralytics offers YOLO models, which are easy to use and quite effective for detection, classification and tracking.

ultralytics-yolo-tasks Object recognition with Yolo and OpenCV

For each version, there are different models, from the lightest and least powerful to the most powerful but heaviest.

YOLOv5n	YOLOv5s	YOLOv5m	YOLOv5l	YOLOv5x
YOLOv8n	YOLOv8s	YOLOv8m	YOLOv8l	YOLOv8x

(nano, small, medium, large and extra large)

Python configuration

If not, you can download and install Python 3

You can then install the necessary libraries imutils, OpenCV, ultralytics

python3 -m pip install imutils opencv-python ultralytics

Retrieving a pre-trained model

You can retrieve a pre-trained model from the pyhton script

# load the pre-trained YOLOv8n model
model = YOLO("yolov8n.pt")

Python script for object recognition

First, we create a video stream (vs) using the imutils library, which will retrieve images from the camera.

vs = VideoStream(src=0, resolution=(1600, 1200)).start()

We initialize a neural network with the model parameters using the YOLO library (yolov5, v8, etc.).

model = YOLO("yolov8n.pt")

Next, we’ll create a loop which, at each iteration, reads the camera image and passes it to the neural network for object detection and recognition.

while True:
	# start time to compute the fps
	start = datetime.datetime.now()

	#ret, frame = video_cap.read()
	#frame = vs.read(); ret=True
	
	
	# if there are no more frames to process, break out of the loop
	if not ret:
		break

	# run the YOLO model on the frame
	detections = model(frame)[0]

Finally, the code displays the detection box, recognition probability and position on the image.

	# loop over the detections
	for box in detections.boxes:
		#extract the label name
		label=model.names.get(box.cls.item())
		
		# extract the confidence (i.e., probability) associated with the detection
		data=box.data.tolist()[0]
		confidence = data[4]

		# filter out weak detections
		if float(confidence) &lt; CONFIDENCE_THRESHOLD:
			continue

		# if the confidence is greater than the minimum confidence
		xmin, ymin, xmax, ymax = int(data[0]), int(data[1]), int(data[2]), int(data[3])
		cv2.rectangle(frame, (xmin, ymin) , (xmax, ymax), GREEN, 2)

		#draw confidence and label
		y = ymin - 15 if ymin - 15 &gt; 15 else ymin + 15
		cv2.putText(frame, "{} {:.1f}%".format(label,float(confidence*100)), (xmin, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, GREEN, 2)

Complete object recognition code with OpenCV and YOLO

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#

import datetime
from ultralytics import YOLO
import cv2
from imutils.video import VideoStream
#from helper import create_video_writer


# define some constants
CONFIDENCE_THRESHOLD = 0.8
GREEN = (0, 255, 0)



# load the pre-trained YOLOv8n model
model = YOLO("yolov8n.pt")
#model = YOLO("yolov5n.pt")
"""
#detect on image
frame= cv2.imread('./datasets/two-boats.jpg');ret= True #from image file
detections = model(frame)[0]
# loop over the detections
#for data in detections.boxes.data.tolist():
for box in detections.boxes:
	#extract the label name
	label=model.names.get(box.cls.item())
		
	# extract the confidence (i.e., probability) associated with the detection
	data=box.data.tolist()[0]
	confidence = data[4]

	# filter out weak detections by ensuring the
	# confidence is greater than the minimum confidence
	if float(confidence) &lt; CONFIDENCE_THRESHOLD:
		continue

	# if the confidence is greater than the minimum confidence,
	# draw the bounding box on the frame
	xmin, ymin, xmax, ymax = int(data[0]), int(data[1]), int(data[2]), int(data[3])
	cv2.rectangle(frame, (xmin, ymin) , (xmax, ymax), GREEN, 2)

	#draw confidence and label
	y = ymin - 15 if ymin - 15 &gt; 15 else ymin + 15
	cv2.putText(frame, "{} {:.1f}%".format(label,float(confidence*100)), (xmin, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, GREEN, 2)

# show the frame to our screen
cv2.imshow("Img", frame)
"""
#detect on video
# initialize the video capture object
#vs = VideoStream(src=0, resolution=(1600, 1200)).start()
video_cap = cv2.VideoCapture("datasets\\Splash - 23011.mp4")
# initialize the video writer object
#writer = create_video_writer(video_cap, "output.mp4")

while True:
	# start time to compute the fps
	start = datetime.datetime.now()

	#ret, frame = video_cap.read()
	#frame = vs.read(); ret=True
	
	
	# if there are no more frames to process, break out of the loop
	if not ret:
		break

	# run the YOLO model on the frame
	detections = model(frame)[0]
	
	# loop over the detections
	#for data in detections.boxes.data.tolist():
	for box in detections.boxes:
		#extract the label name
		label=model.names.get(box.cls.item())
		
		# extract the confidence (i.e., probability) associated with the detection
		data=box.data.tolist()[0]
		confidence = data[4]

		# filter out weak detections by ensuring the
		# confidence is greater than the minimum confidence
		if float(confidence) &lt; CONFIDENCE_THRESHOLD:
			continue

		# if the confidence is greater than the minimum confidence,
		# draw the bounding box on the frame
		xmin, ymin, xmax, ymax = int(data[0]), int(data[1]), int(data[2]), int(data[3])
		cv2.rectangle(frame, (xmin, ymin) , (xmax, ymax), GREEN, 2)

		#draw confidence and label
		y = ymin - 15 if ymin - 15 &gt; 15 else ymin + 15
		cv2.putText(frame, "{} {:.1f}%".format(label,float(confidence*100)), (xmin, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, GREEN, 2)
		#cv2.circle(frame, (int(X)-15, int(Y)), 1, GREEN, 2)
		#cv2.putText(frame, poslbl, (int(X), int(Y)),cv2.FONT_HERSHEY_SIMPLEX, 0.5, GREEN, 2)

	# end time to compute the fps
	end = datetime.datetime.now()
	# show the time it took to process 1 frame
	total = (end - start).total_seconds()
	print(f"Time to process 1 frame: {total * 1000:.0f} milliseconds")

	# calculate the frame per second and draw it on the frame
	fps = f"FPS: {1 / total:.2f}"
	cv2.putText(frame, fps, (50, 50),
				cv2.FONT_HERSHEY_SIMPLEX, 2, (0, 0, 255), 4)

	# show the frame to our screen
	cv2.imshow("Frame", frame)
	#writer.write(frame)
	if cv2.waitKey(1) == ord("q"):
		break

video_cap.release()
vs.stop()
#writer.release()
cv2.destroyAllWindows()

Image sources for object detection

You can use this script with different image sources. To do so, you’ll need to adapt the previous code slightly to modify the “img” variable containing the image to be analyzed.

Your computer’s webcam

vs = VideoStream(src=0, resolution=(1600, 1200)).start()
while True:
	frame = vs.read()

The video stream must be stopped at the end of the script with vs.stop()

Une caméra IP

vc = cv2.VideoCapture('rtsp://user:password@ipaddress:rtspPort')
while True:
	ret, frame=vc.read() #from ip cam

Make sure to stop video capture at the end of the script with vc.release()

Raspberry Pi Picam

vs = VideoStream(usePiCamera=True, resolution=(1600, 1200)).start()
while True:
	frame = vs.read()

Remember to stop the stream at the end of the script with vs.stop()

A video file

vc = cv2.VideoCapture('./datasets/Splash - 23011.mp4') #from video
while True:
	ret, frame=vc.read() #from video

An image file

frame= cv2.imread('./datasets/two-boats.jpg')

Results

In this example, we send the neural network an image of two boats, which are correctly recognized. To obtain slightly different results, you can modify the confidence parameter to avoid false positives.