Yet another gentle intro to the YOLO-NAS model

Table of Contents

YOLO-NAS delivers state-of-the-art (SOTA) performance with the unparalleled accuracy-speed performance.

YOLO-NAS is a Apache-2.0 license model created by Deci AI (now acquired by NVIDIA) that combines YOLO (You Only Look Once) object detection with Neural Architecture Search (NAS) to create a more efficient and accurate object detector. [see this Nvidia article for details]

A little overview for YOLO-NAS

How YOLO-NAS got trained

YOLO-NAS undergoes a multi-phase training process. It was pre-trained on the Objects365 dataset (2 million images under 365 categories, 25-40 epochs on NVIDIA RTX A5000 x8) and the COCO pseudo-labeled dataset. There was also Knowledge Distillation (KD) and Distribution Focal Loss (DFL) used.

Knowledge Distillation (KD)

What is KD? Knowledge Distillation is the process involves transferring the knowledge from a large model or a set of models to one smaller model.

YOLO-NAS’s Model Structure

Checkout this Colab Notebook YOLO-NAS Playground for details

Here is a little trick to the YOLO-NAS architecture

!pip install torchinfo
from torchinfo import summary

summary(model = yolo_nas,
       input_size = (16,3,640,640),
       col_names = ['input_size',
                   'output_size',
                   'num_params',
                   'trainable'],
       col_width = 20,
       row_settings = ['var_names'])

Interpret the YOLO-NAS outputs

The output of YOLO-NAS inference is an ImageDetectionPrediction object. This object contains 3 fields:

image
class_names
prediction