Object Detection Basics and Performance Metrics

In this article, you will know what is object detection, and basics of object detection, why is it used, using what metrics we can measure the performance of the object detection algorithm. Let’s delve in.

Do you guys watch cricket? If yes, then have you ever observed that the camera tracks the ball in the air with no human operating the camera. What the camera here is trying to do is, it tries to detect the ball, locate it, and eventually track the ball till the end. The whole procedure of locating an object is known as Object Detection.

What is Object Detection?

Object Detection is a technology that tries to detect and locate objects. They relate it to computer vision and Image processing majorly. It draws a colored box around the object after detecting it. Many people often confuse object detection with Image recognition.

Object detection means it tries to label each object present in an image. For example, if an image contains two dogs and a human, then object recognition distinguishes the human from dogs by labeling them. Instead, image recognition tries to name the entire image. Face recognition comes under this category.  

The image below shows the difference between image recognition and object detection.  

We can perform object detection in two ways:

  • Machine learning-based: In ML-based approaches, we use computer vision (OpenCV) techniques to identify various features of an image. Those features may be edges, color histograms, etc. We give these features as input to a classification model and it detects the location of an object. This model helps to identify a group of pixels that belongs to the respective object.
  • Deep Learning-based: In Deep learning-based approaches we use Convolutional neural networks. We mainly focus on these techniques.

How does Object detection work?

Object detection is very crucial in analyzing situations virtually. Now let’s know how this works. In earlier days we have used some classical algorithms, some OpenCV, and computer vision libraries to implement object detection. But the accuracy and performance of using these are not up to the mark.  

To improve the accuracy and performance speed, some new algorithms came into existence recently. They are R-CNN, Fast R-CNN, Retina Net these are faster than those classical algorithms. SSD and YOLO are two algorithms we can use in Deep Learning.  

Let’s see each algorithm more clearly.

  • R-CNN: Region-based convolutional neural network (R-CNN) tries to locate the objects in an image. We will start with the sliding window approach. What a sliding window does is that is it runs over the image in the form of rectangles and observes each smaller image in a brute force method. The areas in those small rectangles here are called region proposals.

In this way, there will be a huge number of smaller regions. To optimize this, we use a selective search and extract nearly 2000 areas, and these areas are called region proposals. Region proposals are smaller areas of the image where the required object might present.

After extracting these region proposals, we fed them into a convolutional neural network. That CNN produces a 4096-dimensional feature vector as output. The primary aim of CNN is to extract the features from those region proposals. We give these features as input to the SVM classifier, which concludes whether there is an object or not. Along with locating the object, this algorithm tries to adjust the box over the object such that the entire object covers.   

The major drawback of R-CNN is that it takes so long to train the region proposals. To improve performance, they have introduced Fast R-CNN and Faster R-CNN.

  • Faster R-CNN: In faster R-CNN we don’t feed the neural network with region proposals. Instead, it considers the complete image as input and produces a feature map as output. We that feature map we have to extract region proposals. They reshape these region proposals in fixed size and we give them as input to fully connected layers.
  • YOLO: Y You only look once (YOLO). R-CNN and Fast R-CNN uses two-stage detectors, which makes them dead slow while detecting an object. To increase the speed in YOLO, we only use one stage detector. Here in YOLO, a single convolutional neural network detects the objects and along with class probabilities.  

In YOLO we split the image into a particular number of grids. And grid consists of some bounding boxes. For each bounding box, we have a class probability, and we only select those boundary boxes whose class probability is greater than the threshold value to locate the objects.

Real-Life Applications of Object Detection

Nowadays, Object detection is entering into every other field of technology. There are various applications of Object detection in real life. We can object detection in Image processing for picture retrieval, security, etc. Let’s see a few of them here in our article.  

  •  Tracking Objects:  We use an object detection framework to track objects, like we discussed above tracking a ball in the cricket field, we can also track the swinging of bat, vehicles, etc. Tracking an object has many uses like security, traffic checking.
  • Automated CCTV surveillance: Surveillance is most necessary for security. Here what the camera does is that it tries to detect any unusual activities, persons trying to breach the data, or trying to harm.
  • Person Detection: Again, in surveillance person detection is a key point we should focus on.
  • Self-Driving Cars: Autonomous driving is one of the best examples of object detection in real life. Because object detection tells the car system what to do next. Like where to slow down, when to accelerate, whether to take a turn or go straight.
  • Face Detection: These features we mainly come across in our daily life. Our Facebook account tries to detect the human face in a group picture.

Metrics to Measure the Performance

There are many metrics to measure the performance of object detection in mathematics. Let’s see each metric.

  • Precision and Recall: We use Precision we measure the correctness of the prediction, and we use Recall to calculate true predictions out of total predictions.
  • Average precision (AP): Using Average precision we can get numerical values which we can them in comparing our model with others. With the help of the precision-recall curve average precision calculates the weighted mean of precisions if there is an increase in recall. We calculate the average precision for each object.
  • Mean Average Precision (MAP): MAP is an extension of average precision (AP). In AP we calculate it for each object but in MAP we compute the precision of the entire model. This gives the percentage of correct predictions.  
  • Intersection over Union (IoU): This metric finds the difference between the ground truth and the prediction of bounding boxes. With the help of confidence scores, we remove some unwanted bounding boxes from the output.


These are some algorithms and performance measuring techniques we use in Object detection.

Thanks for reading!


You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

DMCA.com Protection Status