Search | arXiv e-print repository

Cross-Domain Synthetic-to-Real In-the-Wild Depth and Normal Estimation for 3D Scene Understanding

Authors: Jay Bhanushali, Manivannan Muniyandi, Praneeth Chakravarthula

Abstract: We present a cross-domain inference technique that learns from synthetic data to estimate depth and normals for in-the-wild omnidirectional 3D scenes encountered in real-world uncontrolled settings. To this end, we introduce UBotNet, an architecture that combines UNet and Bottleneck Transformer elements to predict consistent scene normals and depth. We also introduce the OmniHorizon synthetic data… ▽ More We present a cross-domain inference technique that learns from synthetic data to estimate depth and normals for in-the-wild omnidirectional 3D scenes encountered in real-world uncontrolled settings. To this end, we introduce UBotNet, an architecture that combines UNet and Bottleneck Transformer elements to predict consistent scene normals and depth. We also introduce the OmniHorizon synthetic dataset containing 24,335 omnidirectional images that represent a wide variety of outdoor environments, including buildings, streets, and diverse vegetation. This dataset is generated from expansive, lifelike virtual spaces and encompasses dynamic scene elements, such as changing lighting conditions, different times of day, pedestrians, and vehicles. Our experiments show that UBotNet achieves significantly improved accuracy in depth estimation and normal estimation compared to existing models. Lastly, we validate cross-domain synthetic-to-real depth and normal estimation on real outdoor images using UBotNet trained solely on our synthetic OmniHorizon dataset, demonstrating the potential of both the synthetic dataset and the proposed network for real-world scene understanding applications. △ Less

Submitted 7 June, 2024; v1 submitted 9 December, 2022; originally announced December 2022.

Comments: Accepted to OmniCV 2024

arXiv:2209.08516 [pdf, other]

VisTaNet: Attention Guided Deep Fusion for Surface Roughness Classification

Authors: Prasanna Kumar Routray, Aditya Sanjiv Kanade, Jay Bhanushali, Manivannan Muniyandi

Abstract: Human texture perception is a weighted average of multi-sensory inputs: visual and tactile. While the visual sensing mechanism extracts global features, the tactile mechanism complements it by extracting local features. The lack of coupled visuotactile datasets in the literature is a challenge for studying multimodal fusion strategies analogous to human texture perception. This paper presents a vi… ▽ More Human texture perception is a weighted average of multi-sensory inputs: visual and tactile. While the visual sensing mechanism extracts global features, the tactile mechanism complements it by extracting local features. The lack of coupled visuotactile datasets in the literature is a challenge for studying multimodal fusion strategies analogous to human texture perception. This paper presents a visual dataset that augments an existing tactile dataset. We propose a novel deep fusion architecture that fuses visual and tactile data using four types of fusion strategies: summation, concatenation, max-pooling, and attention. Our model shows significant performance improvements (97.22%) in surface roughness classification accuracy over tactile only (SVM - 92.60%) and visual only (FENet-50 - 85.01%) architectures. Among the several fusion techniques, attention-guided architecture results in better classification accuracy. Our study shows that analogous to human texture perception, the proposed model chooses a weighted combination of the two modalities (visual and tactile), thus resulting in higher surface roughness classification accuracy; and it chooses to maximize the weightage of the tactile modality where the visual modality fails and vice-versa. △ Less

Submitted 18 September, 2022; originally announced September 2022.

arXiv:2209.03750 [pdf, other]

Towards Multidimensional Textural Perception and Classification Through Whisker

Authors: Prasanna Kumar Routray, Aditya Sanjiv Kanade, Pauline Pounds, Manivannan Muniyandi

Abstract: Texture-based studies and designs have been in focus recently. Whisker-based multidimensional surface texture data is missing in the literature. This data is critical for robotics and machine perception algorithms in the classification and regression of textural surfaces. In this study, we present a novel sensor design to acquire multidimensional texture information. The surface texture's roughnes… ▽ More Texture-based studies and designs have been in focus recently. Whisker-based multidimensional surface texture data is missing in the literature. This data is critical for robotics and machine perception algorithms in the classification and regression of textural surfaces. In this study, we present a novel sensor design to acquire multidimensional texture information. The surface texture's roughness and hardness were measured experimentally using swee** and dabbing. Three machine learning models (SVM, RF, and MLP) showed excellent classification accuracy for the roughness and hardness of surface textures. We show that the combination of pressure and accelerometer data, collected from a standard machined specimen using the whisker sensor, improves classification accuracy. Further, we experimentally validate that the sensor can classify texture with roughness depths as low as $2.5μm$ at an accuracy of $90\%$ or more and segregate materials based on their roughness and hardness. We present a novel metric to consider while designing a whisker sensor to guarantee the quality of texture data acquisition beforehand. The machine learning model performance was validated against the data collected from the laser sensor from the same set of surface textures. As part of our work, we are releasing two-dimensional texture data: roughness and hardness to the research community. △ Less

Submitted 1 September, 2022; originally announced September 2022.

arXiv:2204.07840 [pdf, other]

A Robust and Scalable Attention Guided Deep Learning Framework for Movement Quality Assessment

Authors: Aditya Kanade, Mansi Sharma, Manivannan Muniyandi

Abstract: Physical rehabilitation programs frequently begin with a brief stay in the hospital and continue with home-based rehabilitation. Lack of feedback on exercise correctness is a significant issue in home-based rehabilitation. Automated movement quality assessment (MQA) using skeletal movement data (hereafter referred to as skeletal data) collected via depth imaging devices can assist with home-based… ▽ More Physical rehabilitation programs frequently begin with a brief stay in the hospital and continue with home-based rehabilitation. Lack of feedback on exercise correctness is a significant issue in home-based rehabilitation. Automated movement quality assessment (MQA) using skeletal movement data (hereafter referred to as skeletal data) collected via depth imaging devices can assist with home-based rehabilitation by providing the necessary quantitative feedback. This paper aims to use recent advances in deep learning to address the problem of MQA. Movement quality score generation is an essential component of MQA. We propose three novel skeletal data augmentation schemes. We show that using the proposed augmentations for generating movement quality scores result in significant performance boosts over existing methods. Finally, we propose a novel transformer based architecture for MQA. Four novel feature extractors are proposed and studied that allow the transformer network to operate on skeletal data. We show that adding the attention mechanism in the design of the proposed feature extractor allows the transformer network to pay attention to specific body parts that make a significant contribution towards executing a movement. We report an improvement in movement quality score prediction of 12% on UI-PRMD dataset and 21% on KIMORE dataset compared to the existing methods. △ Less

Submitted 16 April, 2022; originally announced April 2022.

Showing 1–4 of 4 results for author: Muniyandi, M