-
Hilti SLAM Challenge 2023: Benchmarking Single + Multi-session SLAM across Sensor Constellations in Construction
Authors:
Ashish Devadas Nair,
Julien Kindle,
Plamen Levchev,
Davide Scaramuzza
Abstract:
Simultaneous Localization and Map** systems are a key enabler for positioning in both handheld and robotic applications. The Hilti SLAM Challenges organized over the past years have been successful at benchmarking some of the world's best SLAM Systems with high accuracy. However, more capabilities of these systems are yet to be explored, such as platform agnosticism across varying sensor suites…
▽ More
Simultaneous Localization and Map** systems are a key enabler for positioning in both handheld and robotic applications. The Hilti SLAM Challenges organized over the past years have been successful at benchmarking some of the world's best SLAM Systems with high accuracy. However, more capabilities of these systems are yet to be explored, such as platform agnosticism across varying sensor suites and multi-session SLAM. These factors indirectly serve as an indicator of robustness and ease of deployment in real-world applications. There exists no dataset plus benchmark combination publicly available, which considers these factors combined. The Hilti SLAM Challenge 2023 Dataset and Benchmark addresses this issue. Additionally, we propose a novel fiducial marker design for a pre-surveyed point on the ground to be observable from an off-the-shelf LiDAR mounted on a robot, and an algorithm to estimate its position at mm-level accuracy. Results from the challenge show an increase in overall participation, single-session SLAM systems getting increasingly accurate, successfully operating across varying sensor suites, but relatively few participants performing multi-session SLAM.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Estimating the time-evolving refractivity of a turbulent medium using optical beam measurements: a data assimilation approach
Authors:
Anjali Nair,
Qin Li,
Samuel N. Stechmann
Abstract:
In applications such as free-space optical communication, a signal is often recovered after propagation through a turbulent medium. In this setting, it is common to assume that limited information is known about the turbulent medium, such as a space- and time-averaged statistic (e.g., root-mean-square), but without information about the state of the spatial variations. It could be helpful to gain…
▽ More
In applications such as free-space optical communication, a signal is often recovered after propagation through a turbulent medium. In this setting, it is common to assume that limited information is known about the turbulent medium, such as a space- and time-averaged statistic (e.g., root-mean-square), but without information about the state of the spatial variations. It could be helpful to gain more information if the state of the turbulent medium can be characterized with the spatial variations and evolution in time described. Here, we propose to investigate the use of data assimilation techniques for this purpose. A computational setting is used with the paraxial wave equation, and the extended Kalman filter is used to conduct data assimilation using intensity measurements. To reduce computational cost, the evolution of the turbulent medium is modeled as a stochastic process. Following some past studies, the process has only a small number of Fourier wavelengths for spatial variations. The results show that the spatial and temporal variations of the medium are recovered accurately in many cases. In some time windows in some cases, the error is larger for the recovery. Finally we discuss the potential use of the spatial variation information for aiding the recovery of the transmitted signal or beam source.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Studying the Effects of Sex-related Differences on Brain Age Prediction using brain MR Imaging
Authors:
Mahsa Dibaji,
Neha Gianchandani,
Akhil Nair,
Mansi Singhal,
Roberto Souza,
Mariana Bento
Abstract:
While utilizing machine learning models, one of the most crucial aspects is how bias and fairness affect model outcomes for diverse demographics. This becomes especially relevant in the context of machine learning for medical imaging applications as these models are increasingly being used for diagnosis and treatment planning. In this paper, we study biases related to sex when develo** a machine…
▽ More
While utilizing machine learning models, one of the most crucial aspects is how bias and fairness affect model outcomes for diverse demographics. This becomes especially relevant in the context of machine learning for medical imaging applications as these models are increasingly being used for diagnosis and treatment planning. In this paper, we study biases related to sex when develo** a machine learning model based on brain magnetic resonance images (MRI). We investigate the effects of sex by performing brain age prediction considering different experimental designs: model trained using only female subjects, only male subjects and a balanced dataset. We also perform evaluation on multiple MRI datasets (Calgary-Campinas(CC359) and CamCAN) to assess the generalization capability of the proposed models. We found disparities in the performance of brain age prediction models when trained on distinct sex subgroups and datasets, in both final predictions and decision making (assessed using interpretability models). Our results demonstrated variations in model generalizability across sex-specific subgroups, suggesting potential biases in models trained on unbalanced datasets. This underlines the critical role of careful experimental design in generating fair and reliable outcomes.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
Control of Vortex Dynamics using Invariants
Authors:
Kartik Krishna,
Aditya G. Nair,
Anand Krishnan,
Steven L. Brunton,
Eurika Kaiser
Abstract:
Vortex-dominated flows are ubiquitous in engineering, and the ability to efficiently manipulate the dynamics of these vortices has broad applications, from wake sha** to mixing enhancement. However, the strongly nonlinear behavior of the vortex dynamics makes this a challenging task. In this work, we investigate the control of vortex dynamics by using a change of coordinates from the Biot-Savart…
▽ More
Vortex-dominated flows are ubiquitous in engineering, and the ability to efficiently manipulate the dynamics of these vortices has broad applications, from wake sha** to mixing enhancement. However, the strongly nonlinear behavior of the vortex dynamics makes this a challenging task. In this work, we investigate the control of vortex dynamics by using a change of coordinates from the Biot-Savart equations into well-known invariants, such as the Hamiltonian, linear, and angular impulses, which are Koopman eigenfunctions. We then combine the resulting model with model predictive control to generate control laws that force the vortex system using "virtual cylinders". The invariant model is beneficial as it provides a linear, global description of the vortex dynamics through a recently developed Koopman control scheme for conserved quantities and invariants. The use of this model has not been well studied in the literature in the context of control. In this paper, we seek to understand the effect of changing each invariant individually or multiple invariants simultaneously. We use the 4-vortex system as our primary test bed, as it is the simplest configuration that exhibits chaotic behavior. We show that by controlling to specific invariant quantities, we can modify the transition from chaotic to quasiperiodic states. Finally, we computationally demonstrate the effectiveness of invariant control on a toy example of tracer mixing in the 4-vortex system.
△ Less
Submitted 7 November, 2023; v1 submitted 7 August, 2023;
originally announced August 2023.
-
A 3D deep learning classifier and its explainability when assessing coronary artery disease
Authors:
Wing Keung Cheung,
Jeremy Kalindjian,
Robert Bell,
Arjun Nair,
Leon J. Menezes,
Riyaz Patel,
Simon Wan,
Kacy Chou,
Jiahang Chen,
Ryo Torii,
Rhodri H. Davies,
James C. Moon,
Daniel C. Alexander,
Joseph Jacob
Abstract:
Early detection and diagnosis of coronary artery disease (CAD) could save lives and reduce healthcare costs. In this study, we propose a 3D Resnet-50 deep learning model to directly classify normal subjects and CAD patients on computed tomography coronary angiography images. Our proposed method outperforms a 2D Resnet-50 model by 23.65%. Explainability is also provided by using a Grad-GAM. Further…
▽ More
Early detection and diagnosis of coronary artery disease (CAD) could save lives and reduce healthcare costs. In this study, we propose a 3D Resnet-50 deep learning model to directly classify normal subjects and CAD patients on computed tomography coronary angiography images. Our proposed method outperforms a 2D Resnet-50 model by 23.65%. Explainability is also provided by using a Grad-GAM. Furthermore, we link the 3D CAD classification to a 2D two-class semantic segmentation for improved explainability and accurate abnormality localisation.
△ Less
Submitted 29 July, 2023;
originally announced August 2023.
-
SPADE: Self-supervised Pretraining for Acoustic DisEntanglement
Authors:
John Harvill,
Jarred Barber,
Arun Nair,
Ramin Pishehvar
Abstract:
Self-supervised representation learning approaches have grown in popularity due to the ability to train models on large amounts of unlabeled data and have demonstrated success in diverse fields such as natural language processing, computer vision, and speech. Previous self-supervised work in the speech domain has disentangled multiple attributes of speech such as linguistic content, speaker identi…
▽ More
Self-supervised representation learning approaches have grown in popularity due to the ability to train models on large amounts of unlabeled data and have demonstrated success in diverse fields such as natural language processing, computer vision, and speech. Previous self-supervised work in the speech domain has disentangled multiple attributes of speech such as linguistic content, speaker identity, and rhythm. In this work, we introduce a self-supervised approach to disentangle room acoustics from speech and use the acoustic representation on the downstream task of device arbitration. Our results demonstrate that our proposed approach significantly improves performance over a baseline when labeled training data is scarce, indicating that our pretraining scheme learns to encode room acoustic information while remaining invariant to other attributes of the speech signal.
△ Less
Submitted 2 February, 2023;
originally announced February 2023.
-
Evaluation of 3D GANs for Lung Tissue Modelling in Pulmonary CT
Authors:
Sam Ellis,
Octavio E. Martinez Manzanera,
Vasileios Baltatzis,
Ibrahim Nawaz,
Arjun Nair,
Loïc Le Folgoc,
Sujal Desai,
Ben Glocker,
Julia A. Schnabel
Abstract:
GANs are able to model accurately the distribution of complex, high-dimensional datasets, e.g. images. This makes high-quality GANs useful for unsupervised anomaly detection in medical imaging. However, differences in training datasets such as output image dimensionality and appearance of semantically meaningful features mean that GAN models from the natural image domain may not work `out-of-the-b…
▽ More
GANs are able to model accurately the distribution of complex, high-dimensional datasets, e.g. images. This makes high-quality GANs useful for unsupervised anomaly detection in medical imaging. However, differences in training datasets such as output image dimensionality and appearance of semantically meaningful features mean that GAN models from the natural image domain may not work `out-of-the-box' for medical imaging, necessitating re-implementation and re-evaluation. In this work we adapt and evaluate three GAN models to the task of modelling 3D healthy image patches for pulmonary CT. To the best of our knowledge, this is the first time that such an evaluation has been performed. The DCGAN, styleGAN and the bigGAN architectures were investigated due to their ubiquity and high performance in natural image processing. We train different variants of these methods and assess their performance using the FID score. In addition, the quality of the generated images was evaluated by a human observer study, the ability of the networks to model 3D domain-specific features was investigated, and the structure of the GAN latent spaces was analysed. Results show that the 3D styleGAN produces realistic-looking images with meaningful 3D structure, but suffer from mode collapse which must be addressed during training to obtain samples diversity. Conversely, the 3D DCGAN models show a greater capacity for image variability, but at the cost of poor-quality images. The 3D bigGAN models provide an intermediate level of image quality, but most accurately model the distribution of selected semantically meaningful features. The results suggest that future development is required to realise a 3D GAN with sufficient capacity for patch-based lung CT anomaly detection and we offer recommendations for future areas of research, such as experimenting with other architectures and incorporation of position-encoding.
△ Less
Submitted 17 August, 2022;
originally announced August 2022.
-
Challenges and Opportunities in Multi-device Speech Processing
Authors:
Gregory Ciccarelli,
Jarred Barber,
Arun Nair,
Israel Cohen,
Tao Zhang
Abstract:
We review current solutions and technical challenges for automatic speech recognition, keyword spotting, device arbitration, speech enhancement, and source localization in multidevice home environments to provide context for the INTERSPEECH 2022 special session, "Challenges and opportunities for signal processing and machine learning for multiple smart devices". We also identify the datasets neede…
▽ More
We review current solutions and technical challenges for automatic speech recognition, keyword spotting, device arbitration, speech enhancement, and source localization in multidevice home environments to provide context for the INTERSPEECH 2022 special session, "Challenges and opportunities for signal processing and machine learning for multiple smart devices". We also identify the datasets needed to support these research areas. Based on the review and our research experience in the multi-device domain, we conclude with an outlook on the future evolution
△ Less
Submitted 27 June, 2022;
originally announced June 2022.
-
Decision Forest Based EMG Signal Classification with Low Volume Dataset Augmented with Random Variance Gaussian Noise
Authors:
Tekin Gunasar,
Alexandra Rekesh,
Atul Nair,
Penelope King,
Anastasiya Markova,
Jiaqi Zhang,
Isabel Tate
Abstract:
Electromyography signals can be used as training data by machine learning models to classify various gestures. We seek to produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience while comparing the effect of our feature extraction results on model accuracy to other more conventional methods such as the use of AR param…
▽ More
Electromyography signals can be used as training data by machine learning models to classify various gestures. We seek to produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience while comparing the effect of our feature extraction results on model accuracy to other more conventional methods such as the use of AR parameters on a sliding window across the channels of a signal. We appeal to a set of more elementary methods such as the use of random bounds on a signal, but desire to show the power these methods can carry in an online setting where EMG classification is being conducted, as opposed to more complicated methods such as the use of the Fourier Transform. To augment our limited training data, we used a standard technique, known as jitter, where random noise is added to each observation in a channel wise manner. Once all datasets were produced using the above methods, we performed a grid search with Random Forest and XGBoost to ultimately create a high accuracy model. For human computer interface purposes, high accuracy classification of EMG signals is of particular importance to their functioning and given the difficulty and cost of amassing any sort of biomedical data in a high volume, it is valuable to have techniques that can work with a low amount of high-quality samples with less expensive feature extraction methods that can reliably be carried out in an online application.
△ Less
Submitted 29 June, 2022;
originally announced June 2022.
-
Enhancing Cancer Prediction in Challenging Screen-Detected Incident Lung Nodules Using Time-Series Deep Learning
Authors:
Shahab Aslani,
Pavan Alluri,
Eyjolfur Gudmundsson,
Edward Chandy,
John McCabe,
Anand Devaraj,
Carolyn Horst,
Sam M Janes,
Rahul Chakkara,
Arjun Nair,
Daniel C Alexander,
SUMMIT consortium,
Joseph Jacob
Abstract:
Lung cancer is the leading cause of cancer-related mortality worldwide. Lung cancer screening (LCS) using annual low-dose computed tomography (CT) scanning has been proven to significantly reduce lung cancer mortality by detecting cancerous lung nodules at an earlier stage. Improving risk stratification of malignancy risk in lung nodules can be enhanced using machine/deep learning algorithms. Howe…
▽ More
Lung cancer is the leading cause of cancer-related mortality worldwide. Lung cancer screening (LCS) using annual low-dose computed tomography (CT) scanning has been proven to significantly reduce lung cancer mortality by detecting cancerous lung nodules at an earlier stage. Improving risk stratification of malignancy risk in lung nodules can be enhanced using machine/deep learning algorithms. However most existing algorithms: a) have primarily assessed single time-point CT data alone thereby failing to utilize the inherent advantages contained within longitudinal imaging datasets; b) have not integrated into computer models pertinent clinical data that might inform risk prediction; c) have not assessed algorithm performance on the spectrum of nodules that are most challenging for radiologists to interpret and where assistance from analytic tools would be most beneficial.
Here we show the performance of our time-series deep learning model (DeepCAD-NLM-L) which integrates multi-model information across three longitudinal data domains: nodule-specific, lung-specific, and clinical demographic data. We compared our time-series deep learning model to a) radiologist performance on CTs from the National Lung Screening Trial enriched with the most challenging nodules for diagnosis; b) a nodule management algorithm from a North London LCS study (SUMMIT). Our model demonstrated comparable and complementary performance to radiologists when interpreting challenging lung nodules and showed improved performance (AUC=88\%) against models utilizing single time-point data only. The results emphasise the importance of time-series, multi-modal analysis when interpreting malignancy risk in LCS.
△ Less
Submitted 30 March, 2022;
originally announced March 2022.
-
Socially Compliant Navigation Dataset (SCAND): A Large-Scale Dataset of Demonstrations for Social Navigation
Authors:
Haresh Karnan,
Anirudh Nair,
Xuesu Xiao,
Garrett Warnell,
Soeren Pirk,
Alexander Toshev,
Justin Hart,
Joydeep Biswas,
Peter Stone
Abstract:
Social navigation is the capability of an autonomous agent, such as a robot, to navigate in a 'socially compliant' manner in the presence of other intelligent agents such as humans. With the emergence of autonomously navigating mobile robots in human populated environments (e.g., domestic service robots in homes and restaurants and food delivery robots on public sidewalks), incorporating socially…
▽ More
Social navigation is the capability of an autonomous agent, such as a robot, to navigate in a 'socially compliant' manner in the presence of other intelligent agents such as humans. With the emergence of autonomously navigating mobile robots in human populated environments (e.g., domestic service robots in homes and restaurants and food delivery robots on public sidewalks), incorporating socially compliant navigation behaviors on these robots becomes critical to ensuring safe and comfortable human robot coexistence. To address this challenge, imitation learning is a promising framework, since it is easier for humans to demonstrate the task of social navigation rather than to formulate reward functions that accurately capture the complex multi objective setting of social navigation. The use of imitation learning and inverse reinforcement learning to social navigation for mobile robots, however, is currently hindered by a lack of large scale datasets that capture socially compliant robot navigation demonstrations in the wild. To fill this gap, we introduce Socially CompliAnt Navigation Dataset (SCAND) a large scale, first person view dataset of socially compliant navigation demonstrations. Our dataset contains 8.7 hours, 138 trajectories, 25 miles of socially compliant, human teleoperated driving demonstrations that comprises multi modal data streams including 3D lidar, joystick commands, odometry, visual and inertial information, collected on two morphologically different mobile robots a Boston Dynamics Spot and a Clearpath Jackal by four different human demonstrators in both indoor and outdoor environments. We additionally perform preliminary analysis and validation through real world robot experiments and show that navigation policies learned by imitation learning on SCAND generate socially compliant behaviors
△ Less
Submitted 8 June, 2022; v1 submitted 28 March, 2022;
originally announced March 2022.
-
In-filter Computing For Designing Ultra-light Acoustic Pattern Recognizers
Authors:
Abhishek Ramdas Nair,
Shantanu Chakrabartty,
Chetan Singh Thakur
Abstract:
We present a novel in-filter computing framework that can be used for designing ultra-light acoustic classifiers for use in smart internet-of-things (IoTs). Unlike a conventional acoustic pattern recognizer, where the feature extraction and classification are designed independently, the proposed architecture integrates the convolution and nonlinear filtering operations directly into the kernels of…
▽ More
We present a novel in-filter computing framework that can be used for designing ultra-light acoustic classifiers for use in smart internet-of-things (IoTs). Unlike a conventional acoustic pattern recognizer, where the feature extraction and classification are designed independently, the proposed architecture integrates the convolution and nonlinear filtering operations directly into the kernels of a Support Vector Machine (SVM). The result of this integration is a template-based SVM whose memory and computational footprint (training and inference) is light enough to be implemented on an FPGA-based IoT platform. While the proposed in-filter computing framework is general enough, in this paper, we demonstrate this concept using a Cascade of Asymmetric Resonator with Inner Hair Cells (CAR-IHC) based acoustic feature extraction algorithm. The complete system has been optimized using time-multiplexing and parallel-pipeline techniques for a Xilinx Spartan 7 series Field Programmable Gate Array (FPGA). We show that the system can achieve robust classification performance on benchmark sound recognition tasks using only ~ 1.5k Look-Up Tables (LUTs) and ~ 2.8k Flip-Flops (FFs), a significant improvement over other approaches.
△ Less
Submitted 11 September, 2021;
originally announced September 2021.
-
Interspeech 2021 Deep Noise Suppression Challenge
Authors:
Chandan K A Reddy,
Harishchandra Dubey,
Kazuhito Koishida,
Arun Nair,
Vishak Gopal,
Ross Cutler,
Sebastian Braun,
Hannes Gamper,
Robert Aichner,
Sriram Srinivasan
Abstract:
The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression to achieve superior perceptual speech quality. We recently organized a DNS challenge special session at INTERSPEECH and ICASSP 2020. We open-sourced training and test datasets for the wideband scenario. We also open-sourced a subjective evaluation framework based on ITU-T standard P.808, wh…
▽ More
The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression to achieve superior perceptual speech quality. We recently organized a DNS challenge special session at INTERSPEECH and ICASSP 2020. We open-sourced training and test datasets for the wideband scenario. We also open-sourced a subjective evaluation framework based on ITU-T standard P.808, which was also used to evaluate participants of the challenge. Many researchers from academia and industry made significant contributions to push the field forward, yet even the best noise suppressor was far from achieving superior speech quality in challenging scenarios. In this version of the challenge organized at INTERSPEECH 2021, we are expanding both our training and test datasets to accommodate full band scenarios. The two tracks in this challenge will focus on real-time denoising for (i) wide band, and(ii) full band scenarios. We are also making available a reliable non-intrusive objective speech quality metric called DNSMOS for the participants to use during their development phase.
△ Less
Submitted 4 April, 2021; v1 submitted 6 January, 2021;
originally announced January 2021.
-
Double Directional Channel Measurements for THz Communications in an Urban Environment
Authors:
Naveed A. Abbasi,
Arjun Hariharan,
Arun Moni Nair,
Ahmed S. Almaiman,
François B. Rottenberg,
Alan E. Willner,
Andreas F. Molisch
Abstract:
While mm-wave systems are a mainstay for 5G communications, the inexorable increase of data rate requirements and user densities will soon require the exploration of next-generation technologies. Among these, Terahertz (THz) band communication seems to be a promising direction due to availability of large bandwidth in the electromagnetic spectrum in this frequency range, and the ability to exploit…
▽ More
While mm-wave systems are a mainstay for 5G communications, the inexorable increase of data rate requirements and user densities will soon require the exploration of next-generation technologies. Among these, Terahertz (THz) band communication seems to be a promising direction due to availability of large bandwidth in the electromagnetic spectrum in this frequency range, and the ability to exploit its directional nature by directive antennas with small form factors. The first step in the analysis of any communication system is the analysis of the propagation channel, since it determines the fundamental limitations it faces. While THz channels have been explored for indoor, short-distance communications, the channels for {\em wireless access links in outdoor environments} are largely unexplored. In this paper, we present the - to our knowledge - first set of double-directional outdoor propagation channel measurements for the THz band. Specifically, the measurements are done in the 141 - 148.5 GHz range, which is one of the frequency bands recently allocated for THz research by the Federal Communication Commission (FCC). We employ double directional channel sounding using a frequency domain sounding setup based on RF-over-Fiber (RFoF) extensions for measurements over 100 m distance in urban scenarios. An important result is the surprisingly large number of directions (i.e., direction-of-arrival and direction-of-departure pairs) that carry significant energy. More generally, our results suggest fundamental parameters that can be used in future THz Band analysis and implementations.
△ Less
Submitted 3 October, 2019;
originally announced October 2019.