Boosting Aerial Object Detection Performance via Virtual Reality Data and Multi-Object Training

Boosting Aerial Object Detection Performance via Virtual Reality Data and
Multi-Object Training Nikolas Koutsoubis(1), Kyle Naddeo(1), Garrett Williams(1), George Lecakes, Jr(1), Gregory Ditzler(1), Nidhal C. Bouaynaya(1)*, and Thomas Kiel(2) (1)Rowan University (2)U.S. Army DEVCOM Armaments Center *Corresponding author: Nidhal Bouaynaya [[email protected]]

Overview of the Talk • UAV Threat/Object Detection using ML
• Approach • Motivate Data Augmentation with Synthetic Data • Real-Time Object Detection for Small Objects • Experimental Results • YOLOv7-tiny Training and Validation • Results from YOLOv7-tiny ML models • Conclusions and Future Directions Citation N. Koutsoubis, K. Naddeo, G. Williams, G. Lecakes, T. Kiel, G. Ditzler, N. Bouaynaya, “Boosting Aerial Object Detection Performance via Virtual Reality Data and Multi-Object Training,” in IEEE/INNS International Joint Conference on Neural Networks, 2023.

Motivation

UAV Object Detection Using Machine Learning • Unmanned Aerial Vehicles
(UAVs) pose a major threat across many different landscapes. • These objects tend to be small (e.g., contained on a small number of pixels), and • There are limited datasets available to learn an object detector. • Number of samples • Types of UAVs • Biomes where data are collected • Etc. • Machine Learning (ML) is a potential solution in identifying and classifying friendly and hostile UAVs in large swarms; however, limited data will limit the performance of any ML model.

Motivations for Our Work • Challenges • The lack of
datasets with UAVs in combat landscapes due to safety concerns and the emergence of new UAV technologies. • Synthetic training datasets are able to augment real datasets, allowing for training on large amounts of balanced data. • The synthetic datasets need to be realistic if they are going to improve the performances on real-world deployments. • The nature of drones are small objects which are challenging to detect even for SOTA methods, such as YOLO (and its variations). • YOLO struggles with its IoU loss and an overpenalization of small object errors. • Differents can improve the performance on small object while maintaining a good performance on the remaining objects.

Approach

Generating VR Datasets to Augment Low-Volume Data • Advancements in
computer graphics allow for the realistic renderings of 3D digital environments produced in real-time (30FPS+). • Imagery of custom 3D assets in a simulated 3D environment is created by software such as the Unity Real-Time Development Platform (Unity) • The developer of the datasets are no longer limited by the biome, types of drones, etc. • This allows for users to develop more diverse datasets. • Real-time rendering allows for the rapid production of synthetic datasets. Compatible YOLO Formats https://github.com/RowanMAVRC/DyViR-For-Unity-SPIE-2023

Example VR Outputs

Density Outputs • Single Label Set (SLS): • Each image
in this VR dataset has at most one ground truth label. This dataset is reflective of several existing real-world drone detection datasets. • Multi-Label Set (MLS): • The difference between the MLS and SLS datasets are that an image in MLS can have up to four ground truth labels per image, two for each class. • One property of MLS and SLS is that they both are both about the same size with the MLS containing 196,855 images and the SLS 200,042 images; however, MLS and SLS have 674,180 and 174,866 objects to learn, respectively. • High-Density Set (HDS): • Q: Can a smaller dataset with a higher density of ground truth labels within each image produce comparable (or better) results to a larger dataset with a lower ground truth label density while reducing memory size?

Density Outputs • High-Density Set (HDS): • The HDS dataset
contains 100,020 images, which is half the number of images in the prior two datasets. • There are 878,015 labels, which is a substantial increase in the number of objects to train on in half the number of images. • Contains up to ten ground truth instances per image, five of each class; airplane or drone an over 2x increase in label density over the MLS. Example output from the HDS

YOLOv7-tiny as the Object Detection Backbone • To detect aerial
objects in real-time, the model used must be capable of fast inference times while still maintaining good accuracy. • YOLOv7-tiny is the most recent (at the time of submission) implementation of YOLO that achieves state-of-the-art performance. • Our approach is not limited to a specific version of YOLO. • Issue: YOLO struggles with its IoU loss and an overpenalization of small object errors.

Detecting Small Objects via the Normalized Wasserstein Distance • We
use the Normalized Wasserstein Distance (NWD) [16] to mitigate the over-penalization seen with IoU loss • The bounding boxes (BB) are treated as a Gaussian distribution. The distance is measured by: • The NWD between two Gaussian distributions (ground truth and model BBs) is

Recap of the Approach Known Assets and Biomes Human Input
VR Dataset Generation Object Detection Model Training Pretrain YOLOv7 on VR Data Fine Tune YOLOv7 on Real-World Model Deployment Desert, Forest, etc. Drone, Plane, etc. Texture, etc. Subject matter expert • Research questions ◦ Does using synthetic VR data boost the performance on real-world data? ◦ Does the NWD loss provide a performance gain compared to the IoU loss?

Experimental Results

• YOLOv7-tiny was trained on SLS, MLS, and HDS with
the same hyperparameters. • The model was trained for 100 epochs on eight Quadro RTX 8000 GPUs, the total batch size was 1024 images with a batch size of 128 per GPU. • The backbone of the YOLOv7-tiny model was pretrained on COCO. • Object detection performance is reported as the mean Average Precision (mAP). • The real-world drone detection dataset was collected from Svanstrom et al.’s (2021) work [12]. Experimental Setup F. Svanstrom, C. Englund, and F. Alonso-Fernandez, “Real-time drone detection and tracking with visible, thermal and acoustic sensors,” International Conference on Pattern Recognition (ICPR), 2021. Image from [12]

Dataset (Real-World and Virtual Reality) Summaries VR Data

Sample Images of the SLS, MLS and HDS N. Koutsoubis,
K. Naddeo, G. Williams, G. Lecakes, A. Almon, T. Kiel, G. Ditzler, N. Bouaynaya, “DyViR: Dynamic Virtual Reality Dataset for Aerial Threat Object Detection,” SPIE Defense + Commercial Sensing Meeting Information: Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 2023.

Experimental Protocol • A YOLOv7-tiny model is trained on each
of the VR datasets (SLS, MLS, and HDS), respectively. • Each model is instantiated with the same pre-trained YOLOv7-tiny model • The models are trained for 100 epochs. • Each of the VR pre-trained models is tuned on the Drone Detection Dataset [12] for 100 epochs. • The mAP is reported on the real-world datasets. Note that the validation is not as important for VR data since the end task is real-world prediction.

Mean Average Precision (mAP) Scores of YOLOv7-tiny Observation: Using the
VR datasets (SLS, MLS or HDS) yielded larger mAP scores than using the real-world data alone.

Comparison of the IoU and NWD Losses NWD achieves a
larger mAP after 100 epochs of fine tuning. Fine tuning occurs after pretraining with SLS VR datasets.

• Timing: Training on the MLS dataset only takes an
additional 45 minutes, despite MLS having 474,138 more labels than SLS. • Pre-training: NWD outperforms IoU with and without VR pre-training. • This result is specifically observed on the “small” objects. • The NWD yields better results on all objects, and specifically small ones. Timing and Eﬀect of Pre-training

Conclusion

Conclusion • Access to real-world data to build a robust
and trustworthy machine learning model can be challenging when real-world data are scarce. • Real-world data can be challenging or near impossible to collect. • Collecting data that covers all p(x) can be challenging regarding. • Different biomes, UAVs, weather, etc. • This work showed that VR and real-world data, with the NWD loss can improve the performance of SOTA object detectors for small objects. • Future Work: Make novel modifications to algorithms that improve the integration of VR and real-world data to increase the performance of the object detector on real-world data. • Develop software to superimpose VR drones over real-scenes. • Investigate curriculum learning methods to boost performance. This work was supported by grants from the Army Research Oﬀice W15QKN-21-C-0077, and National Science Foundation CAREER #1943552. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the sponsors' views.

Boosting Aerial Object Detection Performance via Virtual Reality Data and
Multi-Object Training Nikolas Koutsoubis(1), Kyle Naddeo(1), Garrett Williams(1), George Lecakes, Jr(1), Gregory Ditzler(1), Nidhal C. Bouaynaya(1)*, and Thomas Kiel(2) (1)Rowan University (2)U.S. Army DEVCOM Armaments Center *Corresponding author: Nidhal Bouaynaya [[email protected]] Thank you! Drs. Ditzler and Bouaynaya are hiring multiple positions for highly motivated PhD students in the areas of adversarial ML, lifelong learning, few-shot learning, and explainability. See QR code!

Boosting Aerial Object Detection Performance vi...

Boosting Aerial Object Detection Performance via Virtual Reality Data and Multi-Object Training

Gregory Ditzler

More Decks by Gregory Ditzler

Other Decks in Research

Featured

Transcript