What is YOLO
YOLO (You Only Look Once) is a real-time object detection algorithm using deep learning. It divides the image into a grid. Each cell of this grid is then responsible for detecting objects within itself. YOLO is well known and used because it is relatively fast and accurate. In this proof of concept, we will use YOLOv5, a family of YOLO object detection architecture pre-trained on the COCO dataset.
Why is it interesting for FastTrack
FastTrack is currently limited to tracking objects in quasi-2-D on very contrasted images with an immobile camera. The software is well adapted for large datasets or datasets with poor quality achieving good accuracy with blazing fast speed. But for users with very detailed datasets, strong 3-D movements, a moving point of view, different types of objects, or complex scenery, FastTrack is limited. Adding a YOLO detector with a pre-or custom-trained model can help these users. They will be able to benefit from a state-of-the-art deep learning detector with the ease and intuitive environment of FastTrack.
To demonstrate the feasibility of using YOLO as a detector for FastTrack, we take a video of running kittens that will first preprocess using YOLOv5 (PyTorch for Python). The resulting video will be tracked using FastTrack GUI.
This video presents quite a challenge to track with 3-D motion, motion blur, occlusions, object deformation, and light kittens on a light background.
The video is processed using YOLOv5x detector pre-trained on the COCO dataset. For each kitten detected, we draw the minimal enclosing image containing the kitten in a white background image. We then repeat the process for all the detected kittens for all the images. This preprocessing simulates what could be done if YOLO was integrated directly inside FastTrack. The direct integration would be much more accurate as each detected object will be processed separately, avoiding object overlap. These images are then tracked using FastTrack, following the standard process.
The implementation of the YOLO detector is relatively straightforward. Currently, the detection is performed using a threshold and then finding the objects in the resulting binary image. The YOLO detection step can be implemented as a class using C++/OpenCV and will return for each image a binary version with the detected objects that will be fed to the FastTrack feature detector. Implementing the YOLO detector will require modifying the GUI by adding a new tab allowing the user to select the specific COCO classes and the detector architecture version. The same modifications will have to be reflected for the CLI with the addition of several command line parameters. Deployment should not cause any problem because the YOLO model files can be deployed with the binary of FastTrack.
Integrating the YOLO detector inside FastTrack will require some work and testing to have the stability to be made into a stable version for any computer OS. The implementation will be split into several chunks and implemented bit by bit because I have to fit it in between paid contracts, as I can't be on it full time. To speed up the implementation, you can support the project at https://ko-fi.com/bgallois.