Meet YOLO v4, the state of the art in Computer Vision
Understand why this detection method is called "You Only Look Once"
I like to introduce Computer Vision technology as the ability to give vision to machines in the space around them. It means making equipment capable of recognizing, identifying and extracting data from videos or images.
In the past, this task required strong mathematical and programming knowledge. An extensive work, which started from the capture, quality and elements of the image, colors, light and the understanding of how each of these elements impact the function of vision. Making the machine see with all the efficiency with which the human brain does it through our eyes.
There are currently several programming libraries for the development of software and solutions in this segment. And what I want to share in this article is the newest software that has been revolutionizing the so-called Computer Vision, version 4 of YOLO.
In case you haven't heard about Yolo, I must explain that it is a Computer Vision technology that has been getting a lot of prominence in the object detection sector. The software is free and all its architecture and source code are available to everyone on the internet.
His technique launched in 2015 by Joseph Redmon and Ali Farhadi was very innovative because it processed with similar or superior accuracy compared to its competitors, but with a very performative time performance in real time (around 30 fps - frames per second) . Many surveys have reported time performance 10 times compared to the most accurate methods at the time of its launch. Even in 2017 I participated in tests with super computers of the time involving IBM and NVIDIA (https://exame.com/tecnologia/brasileiro-ajuda-ibm-e-nvidia-a-dar-olhos-para-a-computacao/).
The Yolo 4 version in April 2020 was considered state of the art. Because it is a technology that presents greater precision when processing classification and location of objects in real time. The following video demonstrates the result of the current version, as well as the respective evolution of Yolo technology.
As we can conclude in the video above, if a certain pedestrian was wearing a T-shirt with the print used as proof of concept in laboratory tests, an autonomous vehicle could run over a human if it processed objects in previous versions (if the vehicles did not have complementary technologies such sensors).
YOLO technology was revolutionary and gained notoriety in a presentation at TED Talks, where Redmon demonstrates in real time the first version. In this demonstration, the author convinced the world of efficiency by processing in a GPU the classification and location of up to 80 categories of objects with an approximate rate of 30 fps.
YOLO is not a network, but a detection method. Its main differential compared to other methods (such as Haar Cascade or HOG), is the concept where with just a single scan, it is possible to detect the object and locate the region belonging to it. This reason was the name (You Only Look Once).
In other words, class predictions occur in a single pass on the network. Other methods were delayed until then, because they performed the detection by dividing the image in several regions, these areas were submitted to a classifier one by one up to thousands of times on the same figure (technique known as the sliding window).
Its implementation was written in C language, the project was named Darknet. It is fully open source and has support for GPUs. The project is available at https://github.com/AlexeyAB/darknet
Version 4 was made available by different authors. In February, one of the original authors mentioned stopping his research with a computer vision due to the impact on society and the use of his technology in the market (https://twitter.com/pjreddie/status/1230524770350817280).
Yolo 4 was published by Alexey Bochkovskiy, Chien-Yao Wang and Hong-Yuan Mark Liao. The main advantages are gains in performance / inference speed and assertiveness / accuracy. Another advantage of this version was to apply more efficient techniques to run GPU processing, based on optimization and use with less memory usage. In short: faster and more accurate compared to EfficientDet, RetinaNet / MaskRCNN with the COCO dataset.
This technology is gaining space. The market, scientists and researchers have been using the technology all over the world in different segments such as robotics, medicine and agribusiness. YOLO is used as object detention on several platforms and segments, such as applications on cell phones, autonomous cars and others.
Incredible, isn't it? If you want to go deeper into the topic, come take the Computer Vision Fundamentals course with me and learn how to innovate with these incredible technologies: https://www.i2ai.org/course/19/detail/
YOLOv4 Paper – https://arxiv.org/abs/2004.10934
YOLOv4 Real-Time Object Detection – https://github.com/AlexeyAB/darknet
YOLOv4 on OpenCV – https://docs.opencv.org/master/da/d9d/tutorial_dnn_yolo.html
About the author
Sócio cofundador da empresa OITI TECHNOLOGIES, Pesquisador cujo primeiro contato com tecnologia foi em 1983 com 11 anos de idade. Leva o Linux a sério, pesquisa e trabalhos com biometria e visão computacional desde 1998. Experiência com biometria facial desde 2003, redes neurais artificiais e neurotecnologia desde 2009. Inventor da tecnologia CERTIFACE, mais de 100 palestras ministradas, 14 artigos impressos publicados, mais de 8 milhões de acessos nos 120 artigos publicados, Docente da FIA, Membro oficial Mozillians, Membro oficial e Embaixador OpenSUSE Linux América Latina, Membro do Conselho OWASP SP, Contribuidor da biblioteca OpenCV e Global Oficial OneAPI Innovator Intel, Membro Notável I2AI, Fundador da iniciativa Global openSUSE Linux INNOVATOR e Mentor Cybersecuritygirls BR
International Trends in Artificial Intelligence Strategies
On September 28 and October 5, we will be holding webinars that cover trends in Artificial Intelligence (AI) strategies. In the first event, we will bring together experts who will