Publicado em Aug. 10, 2021

YoloR: Implicit Unified Networks and Panoptic Segmentation.


YoloR: Implicit Unified Networks and Panoptic Segmentation.

YOLO-R has just been released (paper published in May) and presents a new context in the computer vision industry. 

YOLO is an acronym in English that means: You Only Look Once, in a literal translation, you only look once, and this was the name given to the algorithm that has been revolutionizing Computer Vision. Its first version brought the great evolution of detecting and recognizing various objects in an image, in RealTime. From YOLO 1 to YOLO 4 we always had leaps in performance, accuracy, speed and in the precision and efficiency of image detection.

The newest version of this algorithm is YoloR, named by its authors as: You Only Learn One Representation (you only learn one representation) and impresses once again with great evolutions!

To better understand this evolution, it is worth remembering that until then, a Convolutional Neural Network recognized an object, an animal or a human being in a very consistent way. But CNN (Convolutional Neural Network) could only do one thing at a time. Trying to identify the action of a particular person and their clothing, impaired the accuracy and/or performance depending on the technique applied. If we compare human learning with machine learning, people can see and identify all kinds of information in an image, and also "understand" the world through sight, hearing, touch, most of the time, all these senses are used in an integrated way.

The purpose of YOLO-R is to be a unified network, designed to make the inference with images, sounds and texts.

Another important detail is the learning model. 

Human learning can happen intentionally, where attention is paid to the learning objective, which we call explicit learning, or subconsciously, which we call implicit learning. These experiences learned through explicit or implicit learning are somehow stored in the brain. By correlating the learning experiences and storage of the human brain with a huge database, we can say that human beings can efficiently process data, even if it is not fully known.

Implicit knowledge refers to involuntary learning, explicit knowledge is applied in Deep Learning. Because learning takes place based on holistic techniques (observation-based learning).

A way to exemplify in computer vision, the explicit model will recognize a computer, while the implicit model will recognize the power button, the DVD player, is in the on state and other details. 

YOLO R proposes a unified network to codify implicit knowledge and explicit knowledge together, just like the human brain. The unified network can generate a unified representation to serve multiple tasks simultaneously. Thus, it can refine multitasking prediction and learning in a convolutional neural network. The results demonstrate that when implicit knowledge is introduced into the neural network, it benefits the performance of all tasks. We also analyzed the implicit representation learned with the proposed unified network, and it shows great capacity to capture the physical meaning of different tasks.


So we can understand that YoloR is an implementation ofexplicit and implicit learning multitasking with a unified network (single model architecture) that analyzes images, sounds and texts. Therefore, several tasks can be performed. In the paper https://arxiv.org/abs/2105.04206 the authors mention experiments with caption detection, objects, instance segmentation, panoptic segmentation, and many others in the future. Understandably, this new Unified and Implicit Deep Learning contains much more math involved in its unified architecture.

Panotic Segmentation

About performance: Comparing YoloR with state-of-the-art state-of-the-art algorithms such as Yolov4, EfficientDet and others, YoloR demonstrates similar accuracy. However, the FPS performance is scary (almost double)!

More details on Implementation of this new toy in the AI ​​sector with computer vision will be available soon.

424 reads 135 Likes

About the author

Alessandro Faria

Alessandro Faria

CTIO OITI Tecnologia

Sócio cofundador da empresa OITI TECHNOLOGIES, Pesquisador cujo primeiro contato com tecnologia foi em 1983 com 11 anos de idade. Leva o Linux a sério, pesquisa e trabalhos com biometria e visão computacional desde 1998. Experiência com biometria facial desde 2003, redes neurais artificiais e neurotecnologia desde 2009. Inventor da tecnologia CERTIFACE, mais de 100 palestras ministradas, 14 artigos impressos publicados, mais de 8 milhões de acessos nos 120 artigos publicados, Docente da FIA, Membro oficial Mozillians, Membro oficial e Embaixador OpenSUSE Linux América Latina, Membro do Conselho OWASP SP, Contribuidor da biblioteca OpenCV e Global Oficial OneAPI Innovator Intel, Membro Notável I2AI, Fundador da iniciativa Global openSUSE Linux INNOVATOR e Mentor Cybersecuritygirls BR

Read too

5G has arrived! What to expect from it?

5G has arrived! What to expect from it?

The new generation of mobile communication promises many features, but actual use will depend on a set of coordinated actions to create tools that use their full potential

424 reads 135 Likes
V Symposium on Artificial Intelligence  - Presencial
Próximo Evento
Sept. 27, 2022

V Symposium on Artificial Intelligence - Presencial

The 5th edition of the Artificial Intelligence Symposium is sure to be a success. Held by I2AI in partnership with ECA-USP, we promote lectures and debates on the latest trends