Search | arXiv e-print repository

NVAutoNet: Fast and Accurate 360$^{\circ}$ 3D Visual Perception For Self Driving

Authors: Trung Pham, Mehran Maghoumi, Wanli Jiang, Bala Siva Sashank Jujjavarapu, Mehdi Sajjadi, Xin Liu, Hsuan-Chu Lin, Bor-Jeng Chen, Giang Truong, Chao Fang, Junghyun Kwon, Minwoo Park

Abstract: Achieving robust and real-time 3D perception is fundamental for autonomous vehicles. While most existing 3D perception methods prioritize detection accuracy, they often overlook critical aspects such as computational efficiency, onboard chip deployment friendliness, resilience to sensor mounting deviations, and adaptability to various vehicle types. To address these challenges, we present NVAutoNe… ▽ More Achieving robust and real-time 3D perception is fundamental for autonomous vehicles. While most existing 3D perception methods prioritize detection accuracy, they often overlook critical aspects such as computational efficiency, onboard chip deployment friendliness, resilience to sensor mounting deviations, and adaptability to various vehicle types. To address these challenges, we present NVAutoNet: a specialized Bird's-Eye-View (BEV) perception network tailored explicitly for automated vehicles. NVAutoNet takes synchronized camera images as input and predicts 3D signals like obstacles, freespaces, and parking spaces. The core of NVAutoNet's architecture (image and BEV backbones) relies on efficient convolutional networks, optimized for high performance using TensorRT. More importantly, our image-to-BEV transformation employs simple linear layers and BEV look-up tables, ensuring rapid inference speed. Trained on an extensive proprietary dataset, NVAutoNet consistently achieves elevated perception accuracy, operating remarkably at 53 frames per second on the NVIDIA DRIVE Orin SoC. Notably, NVAutoNet demonstrates resilience to sensor mounting deviations arising from diverse car models. Moreover, NVAutoNet excels in adapting to varied vehicle types, facilitated by inexpensive model fine-tuning procedures that expedite compatibility adjustments. △ Less

Submitted 27 November, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

Comments: Accepted to WACV 2024. Link to video https://www.youtube.com/watch?v=cPxVhCJ7kyY

arXiv:2106.10980 [pdf, other]

SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild

Authors: Ariel Caputo, Andrea Giachetti, Simone Soso, Deborah Pintani, Andrea D'Eusanio, Stefano Pini, Guido Borghi, Alessandro Simoni, Roberto Vezzani, Rita Cucchiara, Andrea Ranieri, Franca Giannini, Katia Lupinetti, Marina Monti, Mehran Maghoumi, Joseph J. LaViola Jr, Minh-Quan Le, Hai-Dang Nguyen, Minh-Triet Tran

Abstract: Gesture recognition is a fundamental tool to enable novel interaction paradigms in a variety of application scenarios like Mixed Reality environments, touchless public kiosks, entertainment systems, and more. Recognition of hand gestures can be nowadays performed directly from the stream of hand skeletons estimated by software provided by low-cost trackers (Ultraleap) and MR headsets (Hololens, Oc… ▽ More Gesture recognition is a fundamental tool to enable novel interaction paradigms in a variety of application scenarios like Mixed Reality environments, touchless public kiosks, entertainment systems, and more. Recognition of hand gestures can be nowadays performed directly from the stream of hand skeletons estimated by software provided by low-cost trackers (Ultraleap) and MR headsets (Hololens, Oculus Quest) or by video processing software modules (e.g. Google Mediapipe). Despite the recent advancements in gesture and action recognition from skeletons, it is unclear how well the current state-of-the-art techniques can perform in a real-world scenario for the recognition of a wide set of heterogeneous gestures, as many benchmarks do not test online recognition and use limited dictionaries. This motivated the proposal of the SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild. For this contest, we created a novel dataset with heterogeneous gestures featuring different types and duration. These gestures have to be found inside sequences in an online recognition scenario. This paper presents the result of the contest, showing the performances of the techniques proposed by four research groups on the challenging task compared with a simple baseline method. △ Less

Submitted 21 June, 2021; originally announced June 2021.

Comments: 12 pages, to be published on Computers & Graphics

arXiv:2011.09149 [pdf, other]

DeepNAG: Deep Non-Adversarial Gesture Generation

Authors: Mehran Maghoumi, Eugene M. Taranta II, Joseph J. LaViola Jr

Abstract: Synthetic data generation to improve classification performance (data augmentation) is a well-studied problem. Recently, generative adversarial networks (GAN) have shown superior image data augmentation performance, but their suitability in gesture synthesis has received inadequate attention. Further, GANs prohibitively require simultaneous generator and discriminator network training. We tackle b… ▽ More Synthetic data generation to improve classification performance (data augmentation) is a well-studied problem. Recently, generative adversarial networks (GAN) have shown superior image data augmentation performance, but their suitability in gesture synthesis has received inadequate attention. Further, GANs prohibitively require simultaneous generator and discriminator network training. We tackle both issues in this work. We first discuss a novel, device-agnostic GAN model for gesture synthesis called DeepGAN. Thereafter, we formulate DeepNAG by introducing a new differentiable loss function based on dynamic time war** and the average Hausdorff distance, which allows us to train DeepGAN's generator without requiring a discriminator. Through evaluations, we compare the utility of DeepGAN and DeepNAG against two alternative techniques for training five recognizers using data augmentation over six datasets. We further investigate the perceived quality of synthesized samples via an Amazon Mechanical Turk user study based on the HYPE benchmark. We find that DeepNAG outperforms DeepGAN in accuracy, training time (up to 17x faster), and realism, thereby opening the door to a new line of research in generator network design and training for gesture synthesis. Our source code is available at https://www.deepnag.com. △ Less

Submitted 18 November, 2020; originally announced November 2020.

Comments: 13 pages

arXiv:1810.12514 [pdf]

DeepGRU: Deep Gesture Recognition Utility

Authors: Mehran Maghoumi, Joseph J. LaViola Jr

Abstract: We propose DeepGRU, a novel end-to-end deep network model informed by recent developments in deep learning for gesture and action recognition, that is streamlined and device-agnostic. DeepGRU, which uses only raw skeleton, pose or vector data is quickly understood, implemented, and trained, and yet achieves state-of-the-art results on challenging datasets. At the heart of our method lies a set of… ▽ More We propose DeepGRU, a novel end-to-end deep network model informed by recent developments in deep learning for gesture and action recognition, that is streamlined and device-agnostic. DeepGRU, which uses only raw skeleton, pose or vector data is quickly understood, implemented, and trained, and yet achieves state-of-the-art results on challenging datasets. At the heart of our method lies a set of stacked gated recurrent units (GRU), two fully-connected layers and a novel global attention model. We evaluate our method on seven publicly available datasets, containing various number of samples and spanning over a broad range of interactions (full-body, multi-actor, hand gestures, etc.). In all but one case we outperform the state-of-the-art pose-based methods. For instance, we achieve a recognition accuracy of 84.9% and 92.3% on cross-subject and cross-view tests of the NTU RGB+D dataset respectively, and also 100% recognition accuracy on the UT-Kinect dataset. While DeepGRU works well on large datasets with many training samples, we show that even in the absence of a large number of training data, and with as little as four samples per class, DeepGRU can beat traditional methods specifically designed for small training sets. Lastly, we demonstrate that even without powerful hardware, and using only the CPU, our method can still be trained in under 10 minutes on small-scale datasets, making it an enticing choice for rapid application prototy** and development. △ Less

Submitted 10 October, 2019; v1 submitted 29 October, 2018; originally announced October 2018.

Comments: Published in ISVC 2019. Code is available at https://github.com/Maghoumi/DeepGRU

arXiv:1708.02174 [pdf, other]

Code Park: A New 3D Code Visualization Tool

Authors: Pooya Khaloo, Mehran Maghoumi, Eugene Taranta II, David Bettner, Joseph Laviola Jr

Abstract: We introduce Code Park, a novel tool for visualizing codebases in a 3D game-like environment. Code Park aims to improve a programmer's understanding of an existing codebase in a manner that is both engaging and intuitive, appealing to novice users such as students. It achieves these goals by laying out the codebase in a 3D park-like environment. Each class in the codebase is represented as a 3D ro… ▽ More We introduce Code Park, a novel tool for visualizing codebases in a 3D game-like environment. Code Park aims to improve a programmer's understanding of an existing codebase in a manner that is both engaging and intuitive, appealing to novice users such as students. It achieves these goals by laying out the codebase in a 3D park-like environment. Each class in the codebase is represented as a 3D room-like structure. Constituent parts of the class (variable, member functions, etc.) are laid out on the walls, resembling a syntax-aware "wallpaper". The users can interact with the codebase using an overview, and a first-person viewer mode. We conducted two user studies to evaluate Code Park's usability and suitability for organizing an existing project. Our results indicate that Code Park is easy to get familiar with and significantly helps in code understanding compared to a traditional IDE. Further, the users unanimously believed that Code Park was a fun tool to work with. △ Less

Submitted 7 August, 2017; originally announced August 2017.

Comments: Accepted for publication in 2017 IEEE Working Conference on Software Visualization (VISSOFT 2017); Supplementary video: https://www.youtube.com/watch?v=LUiy1M9hUKU

Showing 1–5 of 5 results for author: Maghoumi, M