Multi-Object Training Nikolas Koutsoubis(1), Kyle Naddeo(1), Garrett Williams(1), George Lecakes, Jr(1), Gregory Ditzler(1), Nidhal C. Bouaynaya(1)*, and Thomas Kiel(2) (1)Rowan University (2)U.S. Army DEVCOM Armaments Center *Corresponding author: Nidhal Bouaynaya [[email protected]]
• Approach • Motivate Data Augmentation with Synthetic Data • Real-Time Object Detection for Small Objects • Experimental Results • YOLOv7-tiny Training and Validation • Results from YOLOv7-tiny ML models • Conclusions and Future Directions Citation N. Koutsoubis, K. Naddeo, G. Williams, G. Lecakes, T. Kiel, G. Ditzler, N. Bouaynaya, “Boosting Aerial Object Detection Performance via Virtual Reality Data and Multi-Object Training,” in IEEE/INNS International Joint Conference on Neural Networks, 2023.
(UAVs) pose a major threat across many different landscapes. • These objects tend to be small (e.g., contained on a small number of pixels), and • There are limited datasets available to learn an object detector. • Number of samples • Types of UAVs • Biomes where data are collected • Etc. • Machine Learning (ML) is a potential solution in identifying and classifying friendly and hostile UAVs in large swarms; however, limited data will limit the performance of any ML model.
datasets with UAVs in combat landscapes due to safety concerns and the emergence of new UAV technologies. • Synthetic training datasets are able to augment real datasets, allowing for training on large amounts of balanced data. • The synthetic datasets need to be realistic if they are going to improve the performances on real-world deployments. • The nature of drones are small objects which are challenging to detect even for SOTA methods, such as YOLO (and its variations). • YOLO struggles with its IoU loss and an overpenalization of small object errors. • Differents can improve the performance on small object while maintaining a good performance on the remaining objects.
computer graphics allow for the realistic renderings of 3D digital environments produced in real-time (30FPS+). • Imagery of custom 3D assets in a simulated 3D environment is created by software such as the Unity Real-Time Development Platform (Unity) • The developer of the datasets are no longer limited by the biome, types of drones, etc. • This allows for users to develop more diverse datasets. • Real-time rendering allows for the rapid production of synthetic datasets. Compatible YOLO Formats https://github.com/RowanMAVRC/DyViR-For-Unity-SPIE-2023
in this VR dataset has at most one ground truth label. This dataset is reflective of several existing real-world drone detection datasets. • Multi-Label Set (MLS): • The difference between the MLS and SLS datasets are that an image in MLS can have up to four ground truth labels per image, two for each class. • One property of MLS and SLS is that they both are both about the same size with the MLS containing 196,855 images and the SLS 200,042 images; however, MLS and SLS have 674,180 and 174,866 objects to learn, respectively. • High-Density Set (HDS): • Q: Can a smaller dataset with a higher density of ground truth labels within each image produce comparable (or better) results to a larger dataset with a lower ground truth label density while reducing memory size?
contains 100,020 images, which is half the number of images in the prior two datasets. • There are 878,015 labels, which is a substantial increase in the number of objects to train on in half the number of images. • Contains up to ten ground truth instances per image, five of each class; airplane or drone an over 2x increase in label density over the MLS. Example output from the HDS
objects in real-time, the model used must be capable of fast inference times while still maintaining good accuracy. • YOLOv7-tiny is the most recent (at the time of submission) implementation of YOLO that achieves state-of-the-art performance. • Our approach is not limited to a specific version of YOLO. • Issue: YOLO struggles with its IoU loss and an overpenalization of small object errors.
use the Normalized Wasserstein Distance (NWD) [16] to mitigate the over-penalization seen with IoU loss • The bounding boxes (BB) are treated as a Gaussian distribution. The distance is measured by: • The NWD between two Gaussian distributions (ground truth and model BBs) is
VR Dataset Generation Object Detection Model Training Pretrain YOLOv7 on VR Data Fine Tune YOLOv7 on Real-World Model Deployment Desert, Forest, etc. Drone, Plane, etc. Texture, etc. Subject matter expert • Research questions ◦ Does using synthetic VR data boost the performance on real-world data? ◦ Does the NWD loss provide a performance gain compared to the IoU loss?
the same hyperparameters. • The model was trained for 100 epochs on eight Quadro RTX 8000 GPUs, the total batch size was 1024 images with a batch size of 128 per GPU. • The backbone of the YOLOv7-tiny model was pretrained on COCO. • Object detection performance is reported as the mean Average Precision (mAP). • The real-world drone detection dataset was collected from Svanstrom et al.’s (2021) work [12]. Experimental Setup F. Svanstrom, C. Englund, and F. Alonso-Fernandez, “Real-time drone detection and tracking with visible, thermal and acoustic sensors,” International Conference on Pattern Recognition (ICPR), 2021. Image from [12]
K. Naddeo, G. Williams, G. Lecakes, A. Almon, T. Kiel, G. Ditzler, N. Bouaynaya, “DyViR: Dynamic Virtual Reality Dataset for Aerial Threat Object Detection,” SPIE Defense + Commercial Sensing Meeting Information: Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 2023.
of the VR datasets (SLS, MLS, and HDS), respectively. • Each model is instantiated with the same pre-trained YOLOv7-tiny model • The models are trained for 100 epochs. • Each of the VR pre-trained models is tuned on the Drone Detection Dataset [12] for 100 epochs. • The mAP is reported on the real-world datasets. Note that the validation is not as important for VR data since the end task is real-world prediction.
additional 45 minutes, despite MLS having 474,138 more labels than SLS. • Pre-training: NWD outperforms IoU with and without VR pre-training. • This result is specifically observed on the “small” objects. • The NWD yields better results on all objects, and specifically small ones. Timing and Effect of Pre-training
and trustworthy machine learning model can be challenging when real-world data are scarce. • Real-world data can be challenging or near impossible to collect. • Collecting data that covers all p(x) can be challenging regarding. • Different biomes, UAVs, weather, etc. • This work showed that VR and real-world data, with the NWD loss can improve the performance of SOTA object detectors for small objects. • Future Work: Make novel modifications to algorithms that improve the integration of VR and real-world data to increase the performance of the object detector on real-world data. • Develop software to superimpose VR drones over real-scenes. • Investigate curriculum learning methods to boost performance. This work was supported by grants from the Army Research Office W15QKN-21-C-0077, and National Science Foundation CAREER #1943552. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the sponsors' views.
Multi-Object Training Nikolas Koutsoubis(1), Kyle Naddeo(1), Garrett Williams(1), George Lecakes, Jr(1), Gregory Ditzler(1), Nidhal C. Bouaynaya(1)*, and Thomas Kiel(2) (1)Rowan University (2)U.S. Army DEVCOM Armaments Center *Corresponding author: Nidhal Bouaynaya [[email protected]] Thank you! Drs. Ditzler and Bouaynaya are hiring multiple positions for highly motivated PhD students in the areas of adversarial ML, lifelong learning, few-shot learning, and explainability. See QR code!