-
Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet
Authors:
Manish Dhakal,
Arman Chhetri,
Aman Kumar Gupta,
Prabin Lamichhane,
Suraj Pandey,
Subarna Shakya
Abstract:
This paper presents an end-to-end deep learning model for Automatic Speech Recognition (ASR) that transcribes Nepali speech to text. The model was trained and tested on the OpenSLR (audio, text) dataset. The majority of the audio dataset have silent gaps at both ends which are clipped during dataset preprocessing for a more uniform map** of audio frames and their corresponding texts. Mel Frequen…
▽ More
This paper presents an end-to-end deep learning model for Automatic Speech Recognition (ASR) that transcribes Nepali speech to text. The model was trained and tested on the OpenSLR (audio, text) dataset. The majority of the audio dataset have silent gaps at both ends which are clipped during dataset preprocessing for a more uniform map** of audio frames and their corresponding texts. Mel Frequency Cepstral Coefficients (MFCCs) are used as audio features to feed into the model. The model having Bidirectional LSTM paired with ResNet and one-dimensional CNN produces the best results for this dataset out of all the models (neural networks with variations of LSTM, GRU, CNN, and ResNet) that have been trained so far. This novel model uses Connectionist Temporal Classification (CTC) function for loss calculation during training and CTC beam search decoding for predicting characters as the most likely sequence of Nepali text. On the test dataset, the character error rate (CER) of 17.06 percent has been achieved. The source code is available at: https://github.com/manishdhakal/ASR-Nepali-using-CNN-BiLSTM-ResNet.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning
Authors:
Arnav Goel,
Medha Hira,
Anubha Gupta
Abstract:
Advent of modern deep learning techniques has given rise to advancements in the field of Speech Emotion Recognition (SER). However, most systems prevalent in the field fail to generalize to speakers not seen during training. This study focuses on handling challenges of multilingual SER, specifically on unseen speakers. We introduce CAMuLeNet, a novel architecture leveraging co-attention based fusi…
▽ More
Advent of modern deep learning techniques has given rise to advancements in the field of Speech Emotion Recognition (SER). However, most systems prevalent in the field fail to generalize to speakers not seen during training. This study focuses on handling challenges of multilingual SER, specifically on unseen speakers. We introduce CAMuLeNet, a novel architecture leveraging co-attention based fusion and multitask learning to address this problem. Additionally, we benchmark pretrained encoders of Whisper, HuBERT, Wav2Vec2.0, and WavLM using 10-fold leave-speaker-out cross-validation on five existing multilingual benchmark datasets: IEMOCAP, RAVDESS, CREMA-D, EmoDB and CaFE and, release a novel dataset for SER on the Hindi language (BhavVani). CAMuLeNet shows an average improvement of approximately 8% over all benchmarks on unseen speakers determined by our cross-validation strategy.
△ Less
Submitted 19 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Progress Towards Decoding Visual Imagery via fNIRS
Authors:
Michel Adamic,
Wellington Avelino,
Anna Brandenberger,
Bryan Chiang,
Hunter Davis,
Stephen Fay,
Andrew Gregory,
Aayush Gupta,
Raphael Hotter,
Grace Jiang,
Fiona Leng,
Stephen Polcyn,
Thomas Ribeiro,
Paul Scotti,
Michelle Wang,
Marley Xiong,
Jonathan Xu
Abstract:
We demonstrate the possibility of reconstructing images from fNIRS brain activity and start building a prototype to match the required specs. By training an image reconstruction model on downsampled fMRI data, we discovered that cm-scale spatial resolution is sufficient for image generation. We obtained 71% retrieval accuracy with 1-cm resolution, compared to 93% on the full-resolution fMRI, and 2…
▽ More
We demonstrate the possibility of reconstructing images from fNIRS brain activity and start building a prototype to match the required specs. By training an image reconstruction model on downsampled fMRI data, we discovered that cm-scale spatial resolution is sufficient for image generation. We obtained 71% retrieval accuracy with 1-cm resolution, compared to 93% on the full-resolution fMRI, and 20% with 2-cm resolution. With simulations and high-density tomography, we found that time-domain fNIRS can achieve 1-cm resolution, compared to 2-cm resolution for continuous-wave fNIRS. Lastly, we share designs for a prototype time-domain fNIRS device, consisting of a laser driver, a single photon detector, and a time-to-digital converter system.
△ Less
Submitted 22 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
Multilingual Prosody Transfer: Comparing Supervised & Transfer Learning
Authors:
Arnav Goel,
Medha Hira,
Anubha Gupta
Abstract:
The field of prosody transfer in speech synthesis systems is rapidly advancing. This research is focused on evaluating learning methods for adapting pre-trained monolingual text-to-speech (TTS) models to multilingual conditions, i.e., Supervised Fine-Tuning (SFT) and Transfer Learning (TL). This comparison utilizes three distinct metrics: Mean Opinion Score (MOS), Recognition Accuracy (RA), and Me…
▽ More
The field of prosody transfer in speech synthesis systems is rapidly advancing. This research is focused on evaluating learning methods for adapting pre-trained monolingual text-to-speech (TTS) models to multilingual conditions, i.e., Supervised Fine-Tuning (SFT) and Transfer Learning (TL). This comparison utilizes three distinct metrics: Mean Opinion Score (MOS), Recognition Accuracy (RA), and Mel Cepstral Distortion (MCD). Results demonstrate that, in comparison to SFT, TL leads to significantly enhanced performance, with an average MOS higher by 1.53 points, a 37.5% increase in RA, and approximately a 7.8-point improvement in MCD. These findings are instrumental in hel** build TTS models for low-resource languages.
△ Less
Submitted 18 June, 2024; v1 submitted 23 May, 2024;
originally announced June 2024.
-
CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer Learning
Authors:
Medha Hira,
Arnav Goel,
Anubha Gupta
Abstract:
This paper presents CrossVoice, a novel cascade-based Speech-to-Speech Translation (S2ST) system employing advanced ASR, MT, and TTS technologies with cross-lingual prosody preservation through transfer learning. We conducted comprehensive experiments comparing CrossVoice with direct-S2ST systems, showing improved BLEU scores on tasks such as Fisher Es-En, VoxPopuli Fr-En and prosody preservation…
▽ More
This paper presents CrossVoice, a novel cascade-based Speech-to-Speech Translation (S2ST) system employing advanced ASR, MT, and TTS technologies with cross-lingual prosody preservation through transfer learning. We conducted comprehensive experiments comparing CrossVoice with direct-S2ST systems, showing improved BLEU scores on tasks such as Fisher Es-En, VoxPopuli Fr-En and prosody preservation on benchmark datasets CVSS-T and IndicTTS. With an average mean opinion score of 3.75 out of 4, speech synthesized by CrossVoice closely rivals human speech on the benchmark, highlighting the efficacy of cascade-based systems and transfer learning in multilingual S2ST with prosody transfer.
△ Less
Submitted 18 June, 2024; v1 submitted 23 May, 2024;
originally announced June 2024.
-
A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation
Authors:
Gwanghyun Kim,
Alonso Martinez,
Yu-Chuan Su,
Brendan Jou,
José Lezama,
Agrim Gupta,
Lijun Yu,
Lu Jiang,
Aren Jansen,
Jacob Walker,
Krishna Somandepalli
Abstract:
Training diffusion models for audiovisual sequences allows for a range of generation tasks by learning conditional distributions of various input-output combinations of the two modalities. Nevertheless, this strategy often requires training a separate model for each task which is expensive. Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the a…
▽ More
Training diffusion models for audiovisual sequences allows for a range of generation tasks by learning conditional distributions of various input-output combinations of the two modalities. Nevertheless, this strategy often requires training a separate model for each task which is expensive. Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the audiovisual space.Our key contribution lies in how we parameterize the diffusion timestep in the forward diffusion process. Instead of the standard fixed diffusion timestep, we propose applying variable diffusion timesteps across the temporal dimension and across modalities of the inputs. This formulation offers flexibility to introduce variable noise levels for various portions of the input, hence the term mixture of noise levels. We propose a transformer-based audiovisual latent diffusion model and show that it can be trained in a task-agnostic fashion using our approach to enable a variety of audiovisual generation tasks at inference time. Experiments demonstrate the versatility of our method in tackling cross-modal and multimodal interpolation tasks in the audiovisual space. Notably, our proposed approach surpasses baselines in generating temporally and perceptually consistent samples conditioned on the input. Project page: avdit2024.github.io
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Parameter Identification for Electrochemical Models of Lithium-Ion Batteries Using Bayesian Optimization
Authors:
Jianzong Pi,
Samuel Filgueira da Silva,
Mehmet Fatih Ozkan,
Abhishek Gupta,
Marcello Canova
Abstract:
Efficient parameter identification of electrochemical models is crucial for accurate monitoring and control of lithium-ion cells. This process becomes challenging when applied to complex models that rely on a considerable number of interdependent parameters that affect the output response. Gradient-based and metaheuristic optimization techniques, although previously employed for this task, are lim…
▽ More
Efficient parameter identification of electrochemical models is crucial for accurate monitoring and control of lithium-ion cells. This process becomes challenging when applied to complex models that rely on a considerable number of interdependent parameters that affect the output response. Gradient-based and metaheuristic optimization techniques, although previously employed for this task, are limited by their lack of robustness, high computational costs, and susceptibility to local minima. In this study, Bayesian Optimization is used for tuning the dynamic parameters of an electrochemical equivalent circuit battery model (E-ECM) for a nickel-manganese-cobalt (NMC)-graphite cell. The performance of the Bayesian Optimization is compared with baseline methods based on gradient-based and metaheuristic approaches. The robustness of the parameter optimization method is tested by performing verification using an experimental drive cycle. The results indicate that Bayesian Optimization outperforms Gradient Descent and PSO optimization techniques, achieving reductions on average testing loss by 28.8% and 5.8%, respectively. Moreover, Bayesian optimization significantly reduces the variance in testing loss by 95.8% and 72.7%, respectively.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Lumbar Spine Tumor Segmentation and Localization in T2 MRI Images Using AI
Authors:
Rikathi Pal,
Sudeshna Mondal,
Aditi Gupta,
Priya Saha,
Somoballi Ghoshal,
Amlan Chakrabarti,
Susmita Sur-Kolay
Abstract:
In medical imaging, segmentation and localization of spinal tumors in three-dimensional (3D) space pose significant computational challenges, primarily stemming from limited data availability. In response, this study introduces a novel data augmentation technique, aimed at automating spine tumor segmentation and localization through AI approaches. Leveraging a fusion of fuzzy c-means clustering an…
▽ More
In medical imaging, segmentation and localization of spinal tumors in three-dimensional (3D) space pose significant computational challenges, primarily stemming from limited data availability. In response, this study introduces a novel data augmentation technique, aimed at automating spine tumor segmentation and localization through AI approaches. Leveraging a fusion of fuzzy c-means clustering and Random Forest algorithms, the proposed method achieves successful spine tumor segmentation based on predefined masks initially delineated by domain experts in medical imaging. Subsequently, a Convolutional Neural Network (CNN) architecture is employed for tumor classification. Moreover, 3D vertebral segmentation and labeling techniques are used to help pinpoint the exact location of the tumors in the lumbar spine. Results indicate a remarkable performance, with 99% accuracy for tumor segmentation, 98% accuracy for tumor classification, and 99% accuracy for tumor localization achieved with the proposed approach. These metrics surpass the efficacy of existing state-of-the-art techniques, as evidenced by superior Dice Score, Class Accuracy, and Intersection over Union (IOU) on class accuracy metrics. This innovative methodology holds promise for enhancing the diagnostic capabilities in detecting and characterizing spinal tumors, thereby facilitating more effective clinical decision-making.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
ASID: Active Exploration for System Identification in Robotic Manipulation
Authors:
Marius Memmel,
Andrew Wagenmaker,
Chuning Zhu,
Patrick Yin,
Dieter Fox,
Abhishek Gupta
Abstract:
Model-free control strategies such as reinforcement learning have shown the ability to learn control strategies without requiring an accurate model or simulator of the world. While this is appealing due to the lack of modeling requirements, such methods can be sample inefficient, making them impractical in many real-world domains. On the other hand, model-based control techniques leveraging accura…
▽ More
Model-free control strategies such as reinforcement learning have shown the ability to learn control strategies without requiring an accurate model or simulator of the world. While this is appealing due to the lack of modeling requirements, such methods can be sample inefficient, making them impractical in many real-world domains. On the other hand, model-based control techniques leveraging accurate simulators can circumvent these challenges and use a large amount of cheap simulation data to learn controllers that can effectively transfer to the real world. The challenge with such model-based techniques is the requirement for an extremely accurate simulation, requiring both the specification of appropriate simulation assets and physical parameters. This requires considerable human effort to design for every environment being considered. In this work, we propose a learning system that can leverage a small amount of real-world data to autonomously refine a simulation model and then plan an accurate control strategy that can be deployed in the real world. Our approach critically relies on utilizing an initial (possibly inaccurate) simulator to design effective exploration policies that, when deployed in the real world, collect high-quality data. We demonstrate the efficacy of this paradigm in identifying articulation, mass, and other physical parameters in several challenging robotic manipulation tasks, and illustrate that only a small amount of real-world data can allow for effective sim-to-real transfer. Project website at https://weirdlabuw.github.io/asid
△ Less
Submitted 26 June, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
Enhancing MRI-Based Classification of Alzheimer's Disease with Explainable 3D Hybrid Compact Convolutional Transformers
Authors:
Arindam Majee,
Avisek Gupta,
Sourav Raha,
Swagatam Das
Abstract:
Alzheimer's disease (AD), characterized by progressive cognitive decline and memory loss, presents a formidable global health challenge, underscoring the critical importance of early and precise diagnosis for timely interventions and enhanced patient outcomes. While MRI scans provide valuable insights into brain structures, traditional analysis methods often struggle to discern intricate 3D patter…
▽ More
Alzheimer's disease (AD), characterized by progressive cognitive decline and memory loss, presents a formidable global health challenge, underscoring the critical importance of early and precise diagnosis for timely interventions and enhanced patient outcomes. While MRI scans provide valuable insights into brain structures, traditional analysis methods often struggle to discern intricate 3D patterns crucial for AD identification. Addressing this challenge, we introduce an alternative end-to-end deep learning model, the 3D Hybrid Compact Convolutional Transformers 3D (HCCT). By synergistically combining convolutional neural networks (CNNs) and vision transformers (ViTs), the 3D HCCT adeptly captures both local features and long-range relationships within 3D MRI scans. Extensive evaluations on prominent AD benchmark dataset, ADNI, demonstrate the 3D HCCT's superior performance, surpassing state of the art CNN and transformer-based methods in classification accuracy. Its robust generalization capability and interpretability marks a significant stride in AD classification from 3D MRI scans, promising more accurate and reliable diagnoses for improved patient care and superior clinical outcomes.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Densify & Conquer: Densified, smaller base-stations can conquer the increasing carbon footprint problem in nextG wireless
Authors:
Agrim Gupta,
Adel Heidari,
Jiaming **,
Dinesh Bharadia
Abstract:
Connectivity on-the-go has been one of the most impressive technological achievements in the 2010s decade. However, multiple studies show that this has come at an expense of increased carbon footprint, that also rivals the entire aviation sector's carbon footprint. The two major contributors of this increased footprint are (a) smartphone batteries which affect the embodied footprint and (b) base-s…
▽ More
Connectivity on-the-go has been one of the most impressive technological achievements in the 2010s decade. However, multiple studies show that this has come at an expense of increased carbon footprint, that also rivals the entire aviation sector's carbon footprint. The two major contributors of this increased footprint are (a) smartphone batteries which affect the embodied footprint and (b) base-stations that occupy ever-increasing energy footprint to provide the last mile wireless connectivity to smartphones. The root-cause of both these turn out to be the same, which is communicating over the last-mile lossy wireless medium. We show in this paper, titled DensQuer, how base-station densification, which is to replace a single larger base-station with multiple smaller ones, reduces the effect of the last-mile wireless, and in effect conquers both these adverse sources of increased carbon footprint. Backed by a open-source ray-tracing computation framework (Sionna), we show how a strategic densification strategy can minimize the number of required smaller base-stations to practically achievable numbers, which lead to about 3x power-savings in the base-station network. Also, DensQuer is able to also reduce the required deployment height of base-stations to as low as 15m, that makes the smaller cells easily deployable on trees/street poles instead of requiring a dedicated tower. Further, by utilizing newly introduced hardware power rails in Google Pixel 7a and above phones, we also show that this strategic densified network leads to reduction in mobile transmit power by 10-15 dB, leading to about 3x reduction in total cellular power consumption, and about 50% increase in smartphone battery life when it communicates data via the cellular network.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling
Authors:
Raunaq Bhirangi,
Chenyu Wang,
Venkatesh Pattabiraman,
Carmel Majidi,
Abhinav Gupta,
Tess Hellebrekers,
Lerrel Pinto
Abstract:
Reasoning from sequences of raw sensory data is a ubiquitous problem across fields ranging from medical devices to robotics. These problems often involve using long sequences of raw sensor data (e.g. magnetometers, piezoresistors) to predict sequences of desirable physical quantities (e.g. force, inertial measurements). While classical approaches are powerful for locally-linear prediction problems…
▽ More
Reasoning from sequences of raw sensory data is a ubiquitous problem across fields ranging from medical devices to robotics. These problems often involve using long sequences of raw sensor data (e.g. magnetometers, piezoresistors) to predict sequences of desirable physical quantities (e.g. force, inertial measurements). While classical approaches are powerful for locally-linear prediction problems, they often fall short when using real-world sensors. These sensors are typically non-linear, are affected by extraneous variables (e.g. vibration), and exhibit data-dependent drift. For many problems, the prediction task is exacerbated by small labeled datasets since obtaining ground-truth labels requires expensive equipment. In this work, we present Hierarchical State-Space Models (HiSS), a conceptually simple, new technique for continuous sequential prediction. HiSS stacks structured state-space models on top of each other to create a temporal hierarchy. Across six real-world sensor datasets, from tactile-based state prediction to accelerometer-based inertial measurement, HiSS outperforms state-of-the-art sequence models such as causal Transformers, LSTMs, S4, and Mamba by at least 23% on MSE. Our experiments further indicate that HiSS demonstrates efficient scaling to smaller datasets and is compatible with existing data-filtering techniques. Code, datasets and videos can be found on https://hiss-csp.github.io.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Towards Precision Cardiovascular Analysis in Zebrafish: The ZACAF Paradigm
Authors:
Amir Mohammad Naderi,
Jennifer G. Casey,
Mao-Hsiang Huang,
Rachelle Victorio,
David Y. Chiang,
Calum MacRae,
Hung Cao,
Vandana A. Gupta
Abstract:
Quantifying cardiovascular parameters like ejection fraction in zebrafish as a host of biological investigations has been extensively studied. Since current manual monitoring techniques are time-consuming and fallible, several image processing frameworks have been proposed to automate the process. Most of these works rely on supervised deep-learning architectures. However, supervised methods tend…
▽ More
Quantifying cardiovascular parameters like ejection fraction in zebrafish as a host of biological investigations has been extensively studied. Since current manual monitoring techniques are time-consuming and fallible, several image processing frameworks have been proposed to automate the process. Most of these works rely on supervised deep-learning architectures. However, supervised methods tend to be overfitted on their training dataset. This means that applying the same framework to new data with different imaging setups and mutant types can severely decrease performance. We have developed a Zebrafish Automatic Cardiovascular Assessment Framework (ZACAF) to quantify the cardiac function in zebrafish. In this work, we further applied data augmentation, Transfer Learning (TL), and Test Time Augmentation (TTA) to ZACAF to improve the performance for the quantification of cardiovascular function quantification in zebrafish. This strategy can be integrated with the available frameworks to aid other researchers. We demonstrate that using TL, even with a constrained dataset, the model can be refined to accommodate a novel microscope setup, encompassing diverse mutant types and accommodating various video recording protocols. Additionally, as users engage in successive rounds of TL, the model is anticipated to undergo substantial enhancements in both generalizability and accuracy. Finally, we applied this approach to assess the cardiovascular function in nrap mutant zebrafish, a model of cardiomyopathy.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Exploring the limits of decoder-only models trained on public speech recognition corpora
Authors:
Ankit Gupta,
George Saon,
Brian Kingsbury
Abstract:
The emergence of industrial-scale speech recognition (ASR) models such as Whisper and USM, trained on 1M hours of weakly labelled and 12M hours of audio only proprietary data respectively, has led to a stronger need for large scale public ASR corpora and competitive open source pipelines. Unlike the said models, large language models are typically based on Transformer decoders, and it remains uncl…
▽ More
The emergence of industrial-scale speech recognition (ASR) models such as Whisper and USM, trained on 1M hours of weakly labelled and 12M hours of audio only proprietary data respectively, has led to a stronger need for large scale public ASR corpora and competitive open source pipelines. Unlike the said models, large language models are typically based on Transformer decoders, and it remains unclear if decoder-only models trained on public data alone can deliver competitive performance. In this work, we investigate factors such as choice of training datasets and modeling components necessary for obtaining the best performance using public English ASR corpora alone. Our Decoder-Only Transformer for ASR (DOTA) model comprehensively outperforms the encoder-decoder open source replication of Whisper (OWSM) on nearly all English ASR benchmarks and outperforms Whisper large-v3 on 7 out of 15 test sets. We release our codebase and model checkpoints under permissive license.
△ Less
Submitted 31 January, 2024;
originally announced February 2024.
-
On the Target Detection Performance of a Molecular Communication Network with Multiple Mobile Nanomachines
Authors:
Nithin V. Sabu,
Abhishek K. Gupta
Abstract:
A network of nanomachines (NMs) can be used to build a target detection system for a variety of promising applications. They have the potential to detect toxic chemicals, infectious bacteria, and biomarkers of dangerous diseases such as cancer within the human body. Many diseases and health disorders can be detected early and efficiently treated in the future by utilizing these systems. To fully g…
▽ More
A network of nanomachines (NMs) can be used to build a target detection system for a variety of promising applications. They have the potential to detect toxic chemicals, infectious bacteria, and biomarkers of dangerous diseases such as cancer within the human body. Many diseases and health disorders can be detected early and efficiently treated in the future by utilizing these systems. To fully grasp the potential of these systems, mathematical analysis is required. This paper describes an analytical framework for modeling and analyzing the performance of target detection systems composed of multiple mobile nanomachines of varying sizes with passive/absorbing boundaries. We consider both direct contact detection, in which NMs must physically contact the target to detect it, and indirect sensing, in which NMs must detect the marker molecules emitted by the target. The detection performance of such systems is calculated for degradable and non-degradable targets, as well as mobile and stationary targets. The derived expressions provide various insights, such as the effect of NM density and target degradation on detection probability.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
GreenScan: Towards large-scale terrestrial monitoring the health of urban trees using mobile sensing
Authors:
Akshit Gupta,
Simone Mora,
Fan Zhang,
Martine Rutten,
R. Venkatesha Prasad,
Carlo Ratti
Abstract:
Healthy urban greenery is a fundamental asset to mitigate climate change phenomena such as extreme heat and air pollution. However, urban trees are often affected by abiotic and biotic stressors that hamper their functionality, and whenever not timely managed, even their survival. While the current greenery inspection techniques can help in taking effective measures, they often require a high amou…
▽ More
Healthy urban greenery is a fundamental asset to mitigate climate change phenomena such as extreme heat and air pollution. However, urban trees are often affected by abiotic and biotic stressors that hamper their functionality, and whenever not timely managed, even their survival. While the current greenery inspection techniques can help in taking effective measures, they often require a high amount of human labor, making frequent assessments infeasible at city-wide scales. In this paper, we present GreenScan, a ground-based sensing system designed to provide health assessments of urban trees at high spatio-temporal resolutions, with low costs. The system utilises thermal and multi-spectral imaging sensors fused using a custom computer vision model in order to estimate two tree health indexes. The evaluation of the system was performed through data collection experiments in Cambridge, USA. Overall, this work illustrates a novel approach for autonomous mobile ground-based tree health monitoring on city-wide scales at high temporal resolutions with low-costs.
△ Less
Submitted 6 April, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
Model-Free Change Point Detection for Mixing Processes
Authors:
Hao Chen,
Abhishek Gupta,
Yin Sun,
Ness Shroff
Abstract:
This paper considers the change point detection problem under dependent samples. In particular, we provide performance guarantees for the MMD-CUSUM test under exponentially $α$, $β$, and fast $φ$-mixing processes, which significantly expands its utility beyond the i.i.d. and Markovian cases used in previous studies. We obtain lower bounds for average-run-length (ARL) and upper bounds for average-d…
▽ More
This paper considers the change point detection problem under dependent samples. In particular, we provide performance guarantees for the MMD-CUSUM test under exponentially $α$, $β$, and fast $φ$-mixing processes, which significantly expands its utility beyond the i.i.d. and Markovian cases used in previous studies. We obtain lower bounds for average-run-length (ARL) and upper bounds for average-detection-delay (ADD) in terms of the threshold parameter. We show that the MMD-CUSUM test enjoys the same level of performance as the i.i.d. case under fast $φ$-mixing processes. The MMD-CUSUM test also achieves strong performance under exponentially $α$/$β$-mixing processes, which are significantly more relaxed than existing results. The MMD-CUSUM test statistic adapts to different settings without modifications, rendering it a completely data-driven, dependence-agnostic change point detection scheme. Numerical simulations are provided at the end to evaluate our findings.
△ Less
Submitted 1 May, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
CLiSA: A Hierarchical Hybrid Transformer Model using Orthogonal Cross Attention for Satellite Image Cloud Segmentation
Authors:
Subhajit Paul,
Ashutosh Gupta
Abstract:
Clouds in optical satellite images are a major concern since their presence hinders the ability to carry accurate analysis as well as processing. Presence of clouds also affects the image tasking schedule and results in wastage of valuable storage space on ground as well as space-based systems. Due to these reasons, deriving accurate cloud masks from optical remote-sensing images is an important t…
▽ More
Clouds in optical satellite images are a major concern since their presence hinders the ability to carry accurate analysis as well as processing. Presence of clouds also affects the image tasking schedule and results in wastage of valuable storage space on ground as well as space-based systems. Due to these reasons, deriving accurate cloud masks from optical remote-sensing images is an important task. Traditional methods such as threshold-based, spatial filtering for cloud detection in satellite images suffer from lack of accuracy. In recent years, deep learning algorithms have emerged as a promising approach to solve image segmentation problems as it allows pixel-level classification and semantic-level segmentation. In this paper, we introduce a deep-learning model based on hybrid transformer architecture for effective cloud mask generation named CLiSA - Cloud segmentation via Lipschitz Stable Attention network. In this context, we propose an concept of orthogonal self-attention combined with hierarchical cross attention model, and we validate its Lipschitz stability theoretically and empirically. We design the whole setup under adversarial setting in presence of Lovász-Softmax loss. We demonstrate both qualitative and quantitative outcomes for multiple satellite image datasets including Landsat-8, Sentinel-2, and Cartosat-2s. Performing comparative study we show that our model performs preferably against other state-of-the-art methods and also provides better generalization in precise cloud extraction from satellite multi-spectral (MX) images. We also showcase different ablation studies to endorse our choices corresponding to different architectural elements and objective functions.
△ Less
Submitted 1 December, 2023; v1 submitted 29 November, 2023;
originally announced November 2023.
-
SIRAN: Sinkhorn Distance Regularized Adversarial Network for DEM Super-resolution using Discriminative Spatial Self-attention
Authors:
Subhajit Paul,
Ashutosh Gupta
Abstract:
Digital Elevation Model (DEM) is an essential aspect in the remote sensing domain to analyze and explore different applications related to surface elevation information. In this study, we intend to address the generation of high-resolution DEMs using high-resolution multi-spectral (MX) satellite imagery by incorporating adversarial learning. To promptly regulate this process, we utilize the notion…
▽ More
Digital Elevation Model (DEM) is an essential aspect in the remote sensing domain to analyze and explore different applications related to surface elevation information. In this study, we intend to address the generation of high-resolution DEMs using high-resolution multi-spectral (MX) satellite imagery by incorporating adversarial learning. To promptly regulate this process, we utilize the notion of polarized self-attention of discriminator spatial maps as well as introduce a Densely connected Multi-Residual Block (DMRB) module to assist in efficient gradient flow. Further, we present an objective function related to optimizing Sinkhorn distance with traditional GAN to improve the stability of adversarial learning. In this regard, we provide both theoretical and empirical substantiation of better performance in terms of vanishing gradient issues and numerical convergence. We demonstrate both qualitative and quantitative outcomes with available state-of-the-art methods. Based on our experiments on DEM datasets of Shuttle Radar Topographic Mission (SRTM) and Cartosat-1, we show that the proposed model performs preferably against other learning-based state-of-the-art methods. We also generate and visualize several high-resolution DEMs covering terrains with diverse signatures to show the performance of our model.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Effective filtering approach for joint parameter-state estimation in SDEs via Rao-Blackwellization and modularization
Authors:
Zhou Fang,
Ankit Gupta,
Mustafa Khammash
Abstract:
Stochastic filtering is a vibrant area of research in both control theory and statistics, with broad applications in many scientific fields. Despite its extensive historical development, there still lacks an effective method for joint parameter-state estimation in SDEs. The state-of-the-art particle filtering methods suffer from either sample degeneracy or information loss, with both issues stemmi…
▽ More
Stochastic filtering is a vibrant area of research in both control theory and statistics, with broad applications in many scientific fields. Despite its extensive historical development, there still lacks an effective method for joint parameter-state estimation in SDEs. The state-of-the-art particle filtering methods suffer from either sample degeneracy or information loss, with both issues stemming from the dynamics of the particles generated to represent system parameters.
This paper provides a novel and effective approach for joint parameter-state estimation in SDEs via Rao-Blackwellization and modularization. Our method operates in two layers: the first layer estimates the system states using a bootstrap particle filter, and the second layer marginalizes out system parameters explicitly. This strategy circumvents the need to generate particles representing system parameters, thereby mitigating their associated problems of sample degeneracy and information loss. Moreover, our method employs a modularization approach when integrating out the parameters, which significantly reduces the computational complexity. All these designs ensure the superior performance of our method. Finally, a numerical example is presented to illustrate that our method outperforms existing approaches by a large margin.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
A Traffic Control Framework for Uncrewed Aircraft Systems
Authors:
Ananay Vikram Gupta,
Aaditya Prakash Kattekola,
Ansh Vikram Gupta,
Dacharla Venkata Abhiram,
Kamesh Namuduri,
Ravichandran Subramanian
Abstract:
The exponential growth of Advanced Air Mobility (AAM) services demands assurances of safety in the airspace. This research a Traffic Control Framework (TCF) for develo** digital flight rules for Uncrewed Aircraft System (UAS) flying in designated air corridors. The proposed TCF helps model, deploy, and test UAS control, agents, regardless of their hardware configurations. This paper investigates…
▽ More
The exponential growth of Advanced Air Mobility (AAM) services demands assurances of safety in the airspace. This research a Traffic Control Framework (TCF) for develo** digital flight rules for Uncrewed Aircraft System (UAS) flying in designated air corridors. The proposed TCF helps model, deploy, and test UAS control, agents, regardless of their hardware configurations. This paper investigates the importance of digital flight rules in preventing collisions in the context of AAM. TCF is introduced as a platform for develo** strategies for managing traffic towards enhanced autonomy in the airspace. It allows for assessment and evaluation of autonomous navigation, route planning, obstacle avoidance, and adaptive decision making for UAS. It also allows for the introduction and evaluation of advance technologies Artificial Intelligence (AI) and Machine Learning (ML) in a simulation environment before deploying them in the real world. TCF can be used as a tool for comprehensive UAS traffic analysis, including KPI measurements. It offers flexibility for further testing and deployment laying the foundation for improved airspace safety - a vital aspect of UAS technological advancement. Finally, this papers demonstrates the capabilities of the proposed TCF in managing UAS traffic at intersections and its impact on overall traffic flow in air corridors, noting the bottlenecks and the inverse relationship safety and traffic volume.
△ Less
Submitted 15 October, 2023;
originally announced October 2023.
-
Detecting and Mitigating System-Level Anomalies of Vision-Based Controllers
Authors:
Aryaman Gupta,
Kaustav Chakraborty,
Somil Bansal
Abstract:
Autonomous systems, such as self-driving cars and drones, have made significant strides in recent years by leveraging visual inputs and machine learning for decision-making and control. Despite their impressive performance, these vision-based controllers can make erroneous predictions when faced with novel or out-of-distribution inputs. Such errors can cascade to catastrophic system failures and c…
▽ More
Autonomous systems, such as self-driving cars and drones, have made significant strides in recent years by leveraging visual inputs and machine learning for decision-making and control. Despite their impressive performance, these vision-based controllers can make erroneous predictions when faced with novel or out-of-distribution inputs. Such errors can cascade to catastrophic system failures and compromise system safety. In this work, we introduce a run-time anomaly monitor to detect and mitigate such closed-loop, system-level failures. Specifically, we leverage a reachability-based framework to stress-test the vision-based controller offline and mine its system-level failures. This data is then used to train a classifier that is leveraged online to flag inputs that might cause system breakdowns. The anomaly detector highlights issues that transcend individual modules and pertain to the safety of the overall system. We also design a fallback controller that robustly handles these detected anomalies to preserve system safety. We validate the proposed approach on an autonomous aircraft taxiing system that uses a vision-based controller for taxiing. Our results show the efficacy of the proposed approach in identifying and handling system-level anomalies, outperforming methods such as prediction error-based detection, and ensembling, thereby enhancing the overall safety and robustness of autonomous systems.
△ Less
Submitted 8 April, 2024; v1 submitted 23 September, 2023;
originally announced September 2023.
-
Data-Driven Computation of Robust Invariant Sets and Gain-Scheduled Controllers for Linear Parameter-Varying Systems
Authors:
Manas Mejari,
Ankit Gupta,
Dario Piga
Abstract:
We present a direct data-driven approach to synthesize robust control invariant (RCI) sets and their associated gain-scheduled feedback control laws for linear parameter-varying (LPV) systems subjected to bounded disturbances. A data-set consisting of a single state-input-scheduling trajectory is gathered from the system, which is directly utilized to compute polytopic RCI set and controllers by s…
▽ More
We present a direct data-driven approach to synthesize robust control invariant (RCI) sets and their associated gain-scheduled feedback control laws for linear parameter-varying (LPV) systems subjected to bounded disturbances. A data-set consisting of a single state-input-scheduling trajectory is gathered from the system, which is directly utilized to compute polytopic RCI set and controllers by solving a semidefinite program. The proposed method does not require an intermediate LPV model identification step. Through a numerical example, we show that the proposed approach can generate RCI sets with a relatively small number of data samples when the data satisfies certain excitation conditions.
△ Less
Submitted 3 November, 2023; v1 submitted 4 September, 2023;
originally announced September 2023.
-
Deep Learning Architecture for Motor Imaged Words
Authors:
Vimal W,
Akshansh Gupta
Abstract:
The notion of a Brain-Computer Interface system is the acquisition of signals from the brain, processing them, and translating them into commands. The study concentrated on a specific sort of brain signal known as Motor Imagery EEG signals, which are activated in the brain without any external stimulus of the needed motor activities in relation to the signal. The signals are further processed usin…
▽ More
The notion of a Brain-Computer Interface system is the acquisition of signals from the brain, processing them, and translating them into commands. The study concentrated on a specific sort of brain signal known as Motor Imagery EEG signals, which are activated in the brain without any external stimulus of the needed motor activities in relation to the signal. The signals are further processed using complicated signal processing methods such as wavelet-based denoising and Independent Component Analysis (ICA) based dimensionality reduction approach. To extract the characteristics from the processed data, both signal processing includes Short-Term Fourier Transforms (STFT) and a probabilistic approach such as Gramian Angular field Theory are used. Furthermore, the gathered feature signals are analyzed and converted into noteworthy commands by Deep Learning algorithms, which can be a mix of complicated Deep Learning algorithm families such as CNN and RNN. The Weights of trained model with the particular subject is further used for the multiple subject which shows in the elevation of accuracy rate in translating the Motor Imagery EEG signals into the relevant motor actions
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
Decoding Emotions: A comprehensive Multilingual Study of Speech Models for Speech Emotion Recognition
Authors:
Anant Singh,
Akshat Gupta
Abstract:
Recent advancements in transformer-based speech representation models have greatly transformed speech processing. However, there has been limited research conducted on evaluating these models for speech emotion recognition (SER) across multiple languages and examining their internal representations. This article addresses these gaps by presenting a comprehensive benchmark for SER with eight speech…
▽ More
Recent advancements in transformer-based speech representation models have greatly transformed speech processing. However, there has been limited research conducted on evaluating these models for speech emotion recognition (SER) across multiple languages and examining their internal representations. This article addresses these gaps by presenting a comprehensive benchmark for SER with eight speech representation models and six different languages. We conducted probing experiments to gain insights into inner workings of these models for SER. We find that using features from a single optimal layer of a speech model reduces the error rate by 32\% on average across seven datasets when compared to systems where features from all layers of speech models are used. We also achieve state-of-the-art results for German and Persian languages. Our probing results indicate that the middle layers of speech models capture the most important emotional information for speech emotion recognition.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
The Multi-modality Cell Segmentation Challenge: Towards Universal Solutions
Authors:
Jun Ma,
Ronald Xie,
Shamini Ayyadhury,
Cheng Ge,
Anubha Gupta,
Ritu Gupta,
Song Gu,
Yao Zhang,
Gihun Lee,
Joonkee Kim,
Wei Lou,
Haofeng Li,
Eric Upschulte,
Timo Dickscheid,
José Guilherme de Almeida,
Yixin Wang,
Lin Han,
Xin Yang,
Marco Labagnara,
Vojislav Gligorovski,
Maxime Scheder,
Sahand Jamal Rahi,
Carly Kempster,
Alice Pollitt,
Leon Espinosa
, et al. (15 additional authors not shown)
Abstract:
Cell segmentation is a critical step for quantitative single-cell analysis in microscopy images. Existing cell segmentation methods are often tailored to specific modalities or require manual interventions to specify hyper-parameters in different experimental settings. Here, we present a multi-modality cell segmentation benchmark, comprising over 1500 labeled images derived from more than 50 diver…
▽ More
Cell segmentation is a critical step for quantitative single-cell analysis in microscopy images. Existing cell segmentation methods are often tailored to specific modalities or require manual interventions to specify hyper-parameters in different experimental settings. Here, we present a multi-modality cell segmentation benchmark, comprising over 1500 labeled images derived from more than 50 diverse biological experiments. The top participants developed a Transformer-based deep-learning algorithm that not only exceeds existing methods but can also be applied to diverse microscopy images across imaging platforms and tissue types without manual parameter adjustments. This benchmark and the improved algorithm offer promising avenues for more accurate and versatile cell analysis in microscopy imaging.
△ Less
Submitted 1 April, 2024; v1 submitted 10 August, 2023;
originally announced August 2023.
-
Copy Number Variation Informs fMRI-based Prediction of Autism Spectrum Disorder
Authors:
Nicha C. Dvornek,
Catherine Sullivan,
James S. Duncan,
Abha R. Gupta
Abstract:
The multifactorial etiology of autism spectrum disorder (ASD) suggests that its study would benefit greatly from multimodal approaches that combine data from widely varying platforms, e.g., neuroimaging, genetics, and clinical characterization. Prior neuroimaging-genetic analyses often apply naive feature concatenation approaches in data-driven work or use the findings from one modality to guide p…
▽ More
The multifactorial etiology of autism spectrum disorder (ASD) suggests that its study would benefit greatly from multimodal approaches that combine data from widely varying platforms, e.g., neuroimaging, genetics, and clinical characterization. Prior neuroimaging-genetic analyses often apply naive feature concatenation approaches in data-driven work or use the findings from one modality to guide posthoc analysis of another, missing the opportunity to analyze the paired multimodal data in a truly unified approach. In this paper, we develop a more integrative model for combining genetic, demographic, and neuroimaging data. Inspired by the influence of genotype on phenotype, we propose using an attention-based approach where the genetic data guides attention to neuroimaging features of importance for model prediction. The genetic data is derived from copy number variation parameters, while the neuroimaging data is from functional magnetic resonance imaging. We evaluate the proposed approach on ASD classification and severity prediction tasks, using a sex-balanced dataset of 228 ASD and typically develo** subjects in a 10-fold cross-validation framework. We demonstrate that our attention-based model combining genetic information, demographic data, and functional magnetic resonance imaging results in superior prediction performance compared to other multimodal approaches.
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
A hybrid approach for improving U-Net variants in medical image segmentation
Authors:
Aitik Gupta,
Dr. Joydip Dhar
Abstract:
Medical image segmentation is vital to the area of medical imaging because it enables professionals to more accurately examine and understand the information offered by different imaging modalities. The technique of splitting a medical image into various segments or regions of interest is known as medical image segmentation. The segmented images that are produced can be used for many different thi…
▽ More
Medical image segmentation is vital to the area of medical imaging because it enables professionals to more accurately examine and understand the information offered by different imaging modalities. The technique of splitting a medical image into various segments or regions of interest is known as medical image segmentation. The segmented images that are produced can be used for many different things, including diagnosis, surgery planning, and therapy evaluation.
In initial phase of research, major focus has been given to review existing deep-learning approaches, including researches like MultiResUNet, Attention U-Net, classical U-Net, and other variants. The attention feature vectors or maps dynamically add important weights to critical information, and most of these variants use these to increase accuracy, but the network parameter requirements are somewhat more stringent. They face certain problems such as overfitting, as their number of trainable parameters is very high, and so is their inference time.
Therefore, the aim of this research is to reduce the network parameter requirements using depthwise separable convolutions, while maintaining performance over some medical image segmentation tasks such as skin lesion segmentation using attention system and residual connections.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching
Authors:
H. J. Terry Suh,
Glen Chou,
Hongkai Dai,
Lujie Yang,
Abhishek Gupta,
Russ Tedrake
Abstract:
Gradient-based methods enable efficient search capabilities in high dimensions. However, in order to apply them effectively in offline optimization paradigms such as offline Reinforcement Learning (RL) or Imitation Learning (IL), we require a more careful consideration of how uncertainty estimation interplays with first-order methods that attempt to minimize them. We study smoothed distance to dat…
▽ More
Gradient-based methods enable efficient search capabilities in high dimensions. However, in order to apply them effectively in offline optimization paradigms such as offline Reinforcement Learning (RL) or Imitation Learning (IL), we require a more careful consideration of how uncertainty estimation interplays with first-order methods that attempt to minimize them. We study smoothed distance to data as an uncertainty metric, and claim that it has two beneficial properties: (i) it allows gradient-based methods that attempt to minimize uncertainty to drive iterates to data as smoothing is annealed, and (ii) it facilitates analysis of model bias with Lipschitz constants. As distance to data can be expensive to compute online, we consider settings where we need amortize this computation. Instead of learning the distance however, we propose to learn its gradients directly as an oracle for first-order optimizers. We show these gradients can be efficiently learned with score-matching techniques by leveraging the equivalence between distance to data and data likelihood. Using this insight, we propose Score-Guided Planning (SGP), a planning algorithm for offline RL that utilizes score-matching to enable first-order planning in high-dimensional problems, where zeroth-order methods were unable to scale, and ensembles were unable to overcome local minima. Website: https://sites.google.com/view/score-guided-planning/home
△ Less
Submitted 16 October, 2023; v1 submitted 24 June, 2023;
originally announced June 2023.
-
On the Coverage of Cognitive mmWave Networks with Directional Sensing and Communication
Authors:
Shuchi Tripathi,
Abhishek K. Gupta,
SaiDhiraj Amuru
Abstract:
Millimeter-waves' propagation characteristics create prospects for spatial and temporal spectrum sharing in a variety of contexts, including cognitive spectrum sharing (CSS). However, CSS along with omnidirectional sensing, is not efficient at mmWave frequencies due to their directional nature of transmission, as this limits secondary networks' ability to access the spectrum. This inspired us to c…
▽ More
Millimeter-waves' propagation characteristics create prospects for spatial and temporal spectrum sharing in a variety of contexts, including cognitive spectrum sharing (CSS). However, CSS along with omnidirectional sensing, is not efficient at mmWave frequencies due to their directional nature of transmission, as this limits secondary networks' ability to access the spectrum. This inspired us to create an analytical approach using stochastic geometry to examine the implications of directional cognitive sensing in mmWave networks. We explore a scenario where multiple secondary transmitter-receiver pairs coexist with a primary transmitter-receiver pair, forming a cognitive network. The positions of the secondary transmitters are modelled using a homogeneous Poisson point process (PPP) with corresponding secondary receivers located around them. A threshold on directional transmission is imposed on each secondary transmitter in order to limit its interference at the primary receiver. We derive the medium-access-probability of a secondary user along with the fraction of the secondary transmitters active at a time-instant. To understand cognition's feasibility, we derive the coverage probabilities of primary and secondary links. We provide various design insights via numerical results. For example, we investigate the interference-threshold's optimal value while ensuring coverage for both links and its dependence on various parameters. We find that directionality improves both links' performance as a key factor. Further, allowing location-aware secondary directionality can help achieve similar coverage for all secondary links.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
Inter Subject Emotion Recognition Using Spatio-Temporal Features From EEG Signal
Authors:
Mohammad Asif,
Diya Srivastava,
Aditya Gupta,
Uma Shanker Tiwary
Abstract:
Inter-subject or subject-independent emotion recognition has been a challenging task in affective computing. This work is about an easy-to-implement emotion recognition model that classifies emotions from EEG signals subject independently. It is based on the famous EEGNet architecture, which is used in EEG-related BCIs. We used the Dataset on Emotion using Naturalistic Stimuli (DENS) dataset. The…
▽ More
Inter-subject or subject-independent emotion recognition has been a challenging task in affective computing. This work is about an easy-to-implement emotion recognition model that classifies emotions from EEG signals subject independently. It is based on the famous EEGNet architecture, which is used in EEG-related BCIs. We used the Dataset on Emotion using Naturalistic Stimuli (DENS) dataset. The dataset contains the Emotional Events -- the precise information of the emotion timings that participants felt. The model is a combination of regular, depthwise and separable convolution layers of CNN to classify the emotions. The model has the capacity to learn the spatial features of the EEG channels and the temporal features of the EEG signals variability with time. The model is evaluated for the valence space ratings. The model achieved an accuracy of 73.04%.
△ Less
Submitted 27 May, 2023;
originally announced May 2023.
-
Multiple-stop** time Sequential Detection for Energy Efficient Mining in Blockchain-Enabled IoT
Authors:
Anurag Gupta,
Vikram Krishnamurthy
Abstract:
What are the optimal times for an Internet of Things (IoT) device to act as a blockchain miner? The aim is to minimize the energy consumed by low-power IoT devices that log their data into a secure (tamper-proof) distributed ledger. We formulate a multiple stop** time Bayesian sequential detection problem to address energy-efficient blockchain mining for IoT devices. The objective is to identify…
▽ More
What are the optimal times for an Internet of Things (IoT) device to act as a blockchain miner? The aim is to minimize the energy consumed by low-power IoT devices that log their data into a secure (tamper-proof) distributed ledger. We formulate a multiple stop** time Bayesian sequential detection problem to address energy-efficient blockchain mining for IoT devices. The objective is to identify $L$ optimal stops for mining, thereby maximizing the probability of successfully adding a block to the blockchain; we also present a model to optimize the number of stops (mining instants). The formulation is equivalent to a multiple stop** time POMDP. Since POMDPs are in general computationally intractable to solve, we show mathematically using submodularity arguments that the optimal mining policy has a useful structure: 1) it is monotone in belief space, and 2) it exhibits a threshold structure, which divides the belief space into two connected sets. Exploiting the structural results, we formulate a computationally-efficient linear mining policy for the blockchain-enabled IoT device. We present a policy gradient technique to optimize the parameters of the linear mining policy. Finally, we use synthetic and real Bitcoin datasets to study the performance of our proposed mining policy. We demonstrate the energy efficiency achieved by the optimal linear mining policy in contrast to other heuristic strategies.
△ Less
Submitted 17 August, 2023; v1 submitted 9 May, 2023;
originally announced May 2023.
-
Experimental Validation of Safe MPC for Autonomous Driving in Uncertain Environments
Authors:
Ivo Batkovic,
Ankit Gupta,
Mario Zanon,
Paolo Falcone
Abstract:
The full deployment of autonomous driving systems on a worldwide scale requires that the self-driving vehicle be operated in a provably safe manner, i.e., the vehicle must be able to avoid collisions in any possible traffic situation. In this paper, we propose a framework based on Model Predictive Control (MPC) that endows the self-driving vehicle with the necessary safety guarantees. In particula…
▽ More
The full deployment of autonomous driving systems on a worldwide scale requires that the self-driving vehicle be operated in a provably safe manner, i.e., the vehicle must be able to avoid collisions in any possible traffic situation. In this paper, we propose a framework based on Model Predictive Control (MPC) that endows the self-driving vehicle with the necessary safety guarantees. In particular, our framework ensures constraint satisfaction at all times, while tracking the reference trajectory as close as obstacles allow, resulting in a safe and comfortable driving behavior. To discuss the performance and real-time capability of our framework, we provide first an illustrative simulation example, and then we demonstrate the effectiveness of our framework in experiments with a real test vehicle.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
Brain Tumor Segmentation from MRI Images using Deep Learning Techniques
Authors:
Ayan Gupta,
Mayank Dixit,
Vipul Kumar Mishra,
Attulya Singh,
Atul Dayal
Abstract:
A brain tumor, whether benign or malignant, can potentially be life threatening and requires painstaking efforts in order to identify the type, origin and location, let alone cure one. Manual segmentation by medical specialists can be time-consuming, which calls out for the involvement of technology to hasten the process with high accuracy. For the purpose of medical image segmentation, we inspect…
▽ More
A brain tumor, whether benign or malignant, can potentially be life threatening and requires painstaking efforts in order to identify the type, origin and location, let alone cure one. Manual segmentation by medical specialists can be time-consuming, which calls out for the involvement of technology to hasten the process with high accuracy. For the purpose of medical image segmentation, we inspected and identified the capable deep learning model, which shows consistent results in the dataset used for brain tumor segmentation. In this study, a public MRI imaging dataset contains 3064 TI-weighted images from 233 patients with three variants of brain tumor, viz. meningioma, glioma, and pituitary tumor. The dataset files were converted and preprocessed before indulging into the methodology which employs implementation and training of some well-known image segmentation deep learning models like U-Net & Attention U-Net with various backbones, Deep Residual U-Net, ResUnet++ and Recurrent Residual U-Net. with varying parameters, acquired from our review of the literature related to human brain tumor classification and segmentation. The experimental findings showed that among all the applied approaches, the recurrent residual U-Net which uses Adam optimizer reaches a Mean Intersection Over Union of 0.8665 and outperforms other compared state-of-the-art deep learning models. The visual findings also show the remarkable results of the brain tumor segmentation from MRI scans and demonstrates how useful the algorithm will be for physicians to extract the brain cancers automatically from MRI scans and serve humanity.
△ Less
Submitted 29 April, 2023;
originally announced May 2023.
-
Direct Data-Driven Computation of Polytopic Robust Control Invariant Sets and State-Feedback Controllers
Authors:
Manas Mejari,
Ankit Gupta
Abstract:
This paper presents a direct data-driven approach for computing robust control invariant (RCI) sets and their associated state-feedback control laws for linear time-invariant systems affected by bounded disturbances. The proposed method utilizes a single state-input trajectory generated from the system, to compute a polytopic RCI set with a desired complexity and an invariance-inducing feedback co…
▽ More
This paper presents a direct data-driven approach for computing robust control invariant (RCI) sets and their associated state-feedback control laws for linear time-invariant systems affected by bounded disturbances. The proposed method utilizes a single state-input trajectory generated from the system, to compute a polytopic RCI set with a desired complexity and an invariance-inducing feedback controller, without the need to identify a model of the system. The problem is formulated in terms of a set of sufficient linear matrix inequality conditions that are then combined in a semi-definite program to maximize the volume of the RCI set while respecting the state and input constraints. We demonstrate through a numerical case study that the proposed data-driven approach can generate RCI sets that are of comparable size to those obtained by a model-based method in which exact knowledge of the system matrices is assumed.
△ Less
Submitted 2 October, 2023; v1 submitted 31 March, 2023;
originally announced March 2023.
-
ModEFormer: Modality-Preserving Embedding for Audio-Video Synchronization using Transformers
Authors:
Akash Gupta,
Rohun Tripathi,
Wondong Jang
Abstract:
Lack of audio-video synchronization is a common problem during television broadcasts and video conferencing, leading to an unsatisfactory viewing experience. A widely accepted paradigm is to create an error detection mechanism that identifies the cases when audio is leading or lagging. We propose ModEFormer, which independently extracts audio and video embeddings using modality-specific transforme…
▽ More
Lack of audio-video synchronization is a common problem during television broadcasts and video conferencing, leading to an unsatisfactory viewing experience. A widely accepted paradigm is to create an error detection mechanism that identifies the cases when audio is leading or lagging. We propose ModEFormer, which independently extracts audio and video embeddings using modality-specific transformers. Different from the other transformer-based approaches, ModEFormer preserves the modality of the input streams which allows us to use a larger batch size with more negative audio samples for contrastive learning. Further, we propose a trade-off between the number of negative samples and number of unique samples in a batch to significantly exceed the performance of previous methods. Experimental results show that ModEFormer achieves state-of-the-art performance, 94.5% for LRS2 and 90.9% for LRS3. Finally, we demonstrate how ModEFormer can be used for offset detection for test clips.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
Challenges and Opportunities for Beyond-5G Wireless Security
Authors:
Eric Ruzomberka,
David J. Love,
Christopher G. Brinton,
Arpit Gupta,
Chih-Chun Wang,
H. Vincent Poor
Abstract:
The demand for broadband wireless access is driving research and standardization of 5G and beyond-5G wireless systems. In this paper, we aim to identify emerging security challenges for these wireless systems and pose multiple research areas to address these challenges.
The demand for broadband wireless access is driving research and standardization of 5G and beyond-5G wireless systems. In this paper, we aim to identify emerging security challenges for these wireless systems and pose multiple research areas to address these challenges.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
Diagonal State Space Augmented Transformers for Speech Recognition
Authors:
George Saon,
Ankit Gupta,
Xiaodong Cui
Abstract:
We improve on the popular conformer architecture by replacing the depthwise temporal convolutions with diagonal state space (DSS) models. DSS is a recently introduced variant of linear RNNs obtained by discretizing a linear dynamical system with a diagonal state transition matrix. DSS layers project the input sequence onto a space of orthogonal polynomials where the choice of basis functions, metr…
▽ More
We improve on the popular conformer architecture by replacing the depthwise temporal convolutions with diagonal state space (DSS) models. DSS is a recently introduced variant of linear RNNs obtained by discretizing a linear dynamical system with a diagonal state transition matrix. DSS layers project the input sequence onto a space of orthogonal polynomials where the choice of basis functions, metric and support is controlled by the eigenvalues of the transition matrix. We compare neural transducers with either conformer or our proposed DSS-augmented transformer (DSSformer) encoders on three public corpora: Switchboard English conversational telephone speech 300 hours, Switchboard+Fisher 2000 hours, and a spoken archive of holocaust survivor testimonials called MALACH 176 hours. On Switchboard 300/2000 hours, we reach a single model performance of 8.9%/6.7% WER on the combined test set of the Hub5 2000 evaluation, respectively, and on MALACH we improve the WER by 7% relative over the previous best published result. In addition, we present empirical evidence suggesting that DSS layers learn damped Fourier basis functions where the attenuation coefficients are layer specific whereas the frequency coefficients converge to almost identical linearly-spaced values across all layers.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
Pendulum Actuated Spherical Robot: Dynamic Modeling & Analysis for Wobble & Precession
Authors:
Animesh Singhal,
Sahil Modi,
Abhishek Gupta,
Leena Vachhani,
Omkar A. Ghag
Abstract:
A spherical robot has many practical advantages as the entire electronics are protected within a hull and can be carried easily by any Unmanned Aerial Vehicle (UAV). However, its use is limited due to finding mounts for sensors. Pendulum actuated spherical robot provides space for mounting sensors at the yoke. We study the non-linear dynamics of a pendulum-actuated spherical robot to analyze the d…
▽ More
A spherical robot has many practical advantages as the entire electronics are protected within a hull and can be carried easily by any Unmanned Aerial Vehicle (UAV). However, its use is limited due to finding mounts for sensors. Pendulum actuated spherical robot provides space for mounting sensors at the yoke. We study the non-linear dynamics of a pendulum-actuated spherical robot to analyze the dynamics of internal assembly (yoke) for mounting sensors. For such robots, we provide a coupled dynamic model that takes care of the relationship between forward and sideways motion. We further demonstrate the effects of wobbling and precession captured by our model when the bot is controlled to execute a turning maneuver while moving with a moderate forward velocity, a practical situation encountered by spherical robots moving in an indoor setting. A simulation setup based on the developed model provides visualization of the spherical robot motion.
△ Less
Submitted 14 January, 2023;
originally announced January 2023.
-
GreenMO: Virtualized User-proportionate MIMO
Authors:
Agrim Gupta,
Sajjad Nassirpour,
Manideep Dunna,
Eamon Patamasing,
Alireza Vahid,
Dinesh Bharadia
Abstract:
With the turn of new decade, wireless communications face a major challenge on connecting many more new users and devices, at the same time being energy efficient and minimizing its carbon footprint. However, the current approaches to address the growing number of users and spectrum demands, like traditional fully digital architectures for Massive MIMO, demand exorbitant energy consumption. The re…
▽ More
With the turn of new decade, wireless communications face a major challenge on connecting many more new users and devices, at the same time being energy efficient and minimizing its carbon footprint. However, the current approaches to address the growing number of users and spectrum demands, like traditional fully digital architectures for Massive MIMO, demand exorbitant energy consumption. The reason is that traditionally MIMO requires a separate RF chain per antenna, so the power consumption scales with number of antennas, instead of number of users, hence becomes energy inefficient. Instead, GreenMO creates a new massive MIMO architecture which is able to use many more antennas while kee** power consumption to user-proportionate numbers. To achieve this GreenMO introduces for the first time, the concept of virtualization of the RF chain hardware. Instead of laying the RF chains physically to each antenna, GreenMO creates these RF chains virtually in digital domain. This also enables GreenMO to be the first flexible massive MIMO architecture. Since GreenMO's virtual RF chains are created on the fly digitally, it can tune the number of these virtual chains according to the user load, hence always flexibly consume user-proportionate power. Thus, GreenMO paves the way for green and flexible massive MIMO. We prototype GreenMO on a PCB with eight antennas and evaluate it with a WARPv3 SDR platform in an office environment. The results demonstrate that GreenMO is 3x more power-efficient than traditional Massive MIMO and 4x more spectrum-efficient than traditional OFDMA systems, while multiplexing 4 users, and can save upto 40% power in modern 5G NR base stations.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search
Authors:
Zihan Wang,
Qi Meng,
HaiFeng Lan,
XinRui Zhang,
KeHao Guo,
Akshat Gupta
Abstract:
Speech emotion recognition (SER) classifies audio into emotion categories such as Happy, Angry, Fear, Disgust and Neutral. While Speech Emotion Recognition (SER) is a common application for popular languages, it continues to be a problem for low-resourced languages, i.e., languages with no pretrained speech-to-text recognition models. This paper firstly proposes a language-specific model that extr…
▽ More
Speech emotion recognition (SER) classifies audio into emotion categories such as Happy, Angry, Fear, Disgust and Neutral. While Speech Emotion Recognition (SER) is a common application for popular languages, it continues to be a problem for low-resourced languages, i.e., languages with no pretrained speech-to-text recognition models. This paper firstly proposes a language-specific model that extract emotional information from multiple pre-trained speech models, and then designs a multi-domain model that simultaneously performs SER for various languages. Our multidomain model employs a multi-gating mechanism to generate unique weighted feature combination for each language, and also searches for specific neural network structure for each language through a neural architecture search module. In addition, we introduce a contrastive auxiliary loss to build more separable representations for audio data. Our experiments show that our model raises the state-of-the-art accuracy by 3% for German and 14.3% for French.
△ Less
Submitted 15 November, 2022; v1 submitted 31 October, 2022;
originally announced November 2022.
-
Data Converter Design Space Exploration for IoT Applications: An Overview of Challenges and Future Directions
Authors:
Buddhi Prakash Sharma,
Anu Gupta,
Chandra Shekhar
Abstract:
Human lives are improving with the widespread use of cutting-edge digital technology like the Internet of Things (IoT). Recently, the pandemic has shown the demand for more digitally advanced IoT-based devices. International Data Corporation (IDC) forecasts that by 2025, there will be approximately 42 billion of these devices in use, capable of producing around 80 ZB (zettabytes) of data. So data…
▽ More
Human lives are improving with the widespread use of cutting-edge digital technology like the Internet of Things (IoT). Recently, the pandemic has shown the demand for more digitally advanced IoT-based devices. International Data Corporation (IDC) forecasts that by 2025, there will be approximately 42 billion of these devices in use, capable of producing around 80 ZB (zettabytes) of data. So data acquisition, processing, communication, and visualization are necessary from a functional standpoint. Indicating sensors & data converters are the key components for IoT-based applications. The efficiency of such applications is truly measured in terms of latency, power, and resolution of data converters motivating designers to perform efficiently. Sensors capture and covert physical features from their chosen environment into detectable quantities. Data converter gives meaningful information and connects the real analog world to the digital component of the devices. The received data is interpreted and analyzed with the digital processing circuitry. Ultimately, it is used as information by a network of internet-connected smart devices. Because IoT technologies are adaptable to nearly any technology that may provide its operational activity and environmental conditions. But the challenges occur with power consumption as the complete IoT framework is battery operated and replacing a battery is a daunting task. So the goal of this chapter is to unveil the requirements to design energy-efficient data converters for IoT applications.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
A Novel Frame Structure for Cloud-Based Audio-Visual Speech Enhancement in Multimodal Hearing-aids
Authors:
Abhijeet Bishnu,
Ankit Gupta,
Mandar Gogate,
Kia Dashtipour,
Ahsan Adeel,
Amir Hussain,
Mathini Sellathurai,
Tharmalingam Ratnarajah
Abstract:
In this paper, we design a first of its kind transceiver (PHY layer) prototype for cloud-based audio-visual (AV) speech enhancement (SE) complying with high data rate and low latency requirements of future multimodal hearing assistive technology. The innovative design needs to meet multiple challenging constraints including up/down link communications, delay of transmission and signal processing,…
▽ More
In this paper, we design a first of its kind transceiver (PHY layer) prototype for cloud-based audio-visual (AV) speech enhancement (SE) complying with high data rate and low latency requirements of future multimodal hearing assistive technology. The innovative design needs to meet multiple challenging constraints including up/down link communications, delay of transmission and signal processing, and real-time AV SE models processing. The transceiver includes device detection, frame detection, frequency offset estimation, and channel estimation capabilities. We develop both uplink (hearing aid to the cloud) and downlink (cloud to hearing aid) frame structures based on the data rate and latency requirements. Due to the varying nature of uplink information (audio and lip-reading), the uplink channel supports multiple data rate frame structure, while the downlink channel has a fixed data rate frame structure. In addition, we evaluate the latency of different PHY layer blocks of the transceiver for developed frame structures using LabVIEW NXG. This can be used with software defined radio (such as Universal Software Radio Peripheral) for real-time demonstration scenarios.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
What can we learn about a generated image corrupting its latent representation?
Authors:
Agnieszka Tomczak,
Aarushi Gupta,
Slobodan Ilic,
Nassir Navab,
Shadi Albarqouni
Abstract:
Generative adversarial networks (GANs) offer an effective solution to the image-to-image translation problem, thereby allowing for new possibilities in medical imaging. They can translate images from one imaging modality to another at a low cost. For unpaired datasets, they rely mostly on cycle loss. Despite its effectiveness in learning the underlying data distribution, it can lead to a discrepan…
▽ More
Generative adversarial networks (GANs) offer an effective solution to the image-to-image translation problem, thereby allowing for new possibilities in medical imaging. They can translate images from one imaging modality to another at a low cost. For unpaired datasets, they rely mostly on cycle loss. Despite its effectiveness in learning the underlying data distribution, it can lead to a discrepancy between input and output data. The purpose of this work is to investigate the hypothesis that we can predict image quality based on its latent representation in the GANs bottleneck. We achieve this by corrupting the latent representation with noise and generating multiple outputs. The degree of differences between them is interpreted as the strength of the representation: the more robust the latent representation, the fewer changes in the output image the corruption causes. Our results demonstrate that our proposed method has the ability to i) predict uncertain parts of synthesized images, and ii) identify samples that may not be reliable for downstream tasks, e.g., liver segmentation task.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
Source detection via multi-label classification
Authors:
Jayakrishnan Vijayamohanan,
Arjun Gupta,
Oameed Noakoasteen,
Sotirios Goudos,
Christos Christodoulou
Abstract:
Radio source detection through conventional algorithms has been unreliable when trying to solve for large number of sources in the presence of low SINR and less number of snapshots. We address this by reformulating source detection as a multi-class classification problem solved using deep learning frameworks. Incoming waveforms are sampled using a centrosymmetric linear array with omni-directional…
▽ More
Radio source detection through conventional algorithms has been unreliable when trying to solve for large number of sources in the presence of low SINR and less number of snapshots. We address this by reformulating source detection as a multi-class classification problem solved using deep learning frameworks. Incoming waveforms are sampled using a centrosymmetric linear array with omni-directional elements and the normalized upper triangle of the autocorrelation matrix is extracted as the input feature to a modified convolutional neural network with uni-dimensional filters, trained to detect the sources in the presence of both uncorrelated and correlated signals. Two detection algorithms are introduced and referred to as CNNDetector and RadioNet, and subsequently benchmarked against the conventional source detection algorithms. By including preprocessing in forward backward spatial smoothing, RadioNet can also resolve the number of uncorrelated sources in the presence of correlated paths. Finally, the algorithms are stress tested under challenging operational conditions and extensive evaluations are presented showing the efficacy and contributions of the introduced predictive models.
△ Less
Submitted 1 February, 2023; v1 submitted 27 September, 2022;
originally announced September 2022.
-
Robustness to Modeling Errors in Risk-Sensitive Markov Decision Problems with Markov Risk Measures
Authors:
Shi** Shao,
Abhishek Gupta,
William B. Haskell
Abstract:
We consider risk-sensitive Markov decision processes (MDPs), where the MDP model is influenced by a parameter which takes values in a compact metric space. We identify sufficient conditions under which small perturbations in the model parameters lead to small changes in the optimal value function and optimal policy. We further establish the robustness of the risk-sensitive optimal policies to mode…
▽ More
We consider risk-sensitive Markov decision processes (MDPs), where the MDP model is influenced by a parameter which takes values in a compact metric space. We identify sufficient conditions under which small perturbations in the model parameters lead to small changes in the optimal value function and optimal policy. We further establish the robustness of the risk-sensitive optimal policies to modeling errors. Implications of the results for data-driven decision-making, decision-making with preference uncertainty, and systems with changing noise distributions are discussed.
△ Less
Submitted 26 September, 2022;
originally announced September 2022.
-
WiForceSticker: Batteryless, Thin Sticker-like Flexible Force Sensor
Authors:
Agrim Gupta,
Daegue Park,
Shayaun Bashar,
Cedric Girerd,
Tania Morimoto,
Dinesh Bharadia
Abstract:
Any two objects in contact with each other exert a force that could be simply due to gravity or mechanical contact, such as a robotic arm grip** an object or even the contact between two bones at our knee joints. The ability to naturally measure and monitor these contact forces allows a plethora of applications from warehouse management (detect faulty packages based on weights) to robotics (maki…
▽ More
Any two objects in contact with each other exert a force that could be simply due to gravity or mechanical contact, such as a robotic arm grip** an object or even the contact between two bones at our knee joints. The ability to naturally measure and monitor these contact forces allows a plethora of applications from warehouse management (detect faulty packages based on weights) to robotics (making a robotic arms' grip as sensitive as human skin) and healthcare (knee-implants). It is challenging to design a ubiquitous force sensor that can be used naturally for all these applications. First, the sensor should be small enough to fit in narrow spaces. Next, we don't want to lay cumbersome cables to read the force values from the sensors. Finally, we need to have a battery-free design to meet the in-vivo applications. We develop WiForceSticker, a wireless, battery-free, sticker-like force sensor that can be ubiquitously deployed on any surface, such as all warehouse packages, robotic arms, and knee joints. WiForceSticker first designs a tiny $4$~mm~$\times$~$2$~mm~$\times$~$0.4$~mm capacitative sensor design equipped with a $10$~mm~$\times$~$10$~mm antenna designed on a flexible PCB substrate. Secondly, it introduces a new mechanism to transduce the force information on ambient RF radiations that can be read by a remotely located reader wirelessly without requiring any battery or active components at the force sensor, by interfacing the sensors with COTS RFID systems. The sensor can detect forces in the range of $0$-$6$~N with sensing accuracy of $<0.5$~N across multiple testing environments and evaluated with over $10,000$ varying force level presses on the sensor. We also showcase two application case studies with our designed sensors, weighing warehouse packages and sensing forces applied by bone joints.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Preemptive Scheduling of EV Charging for Providing Demand Response Services
Authors:
Shi** Shao,
Farshad Harirchi,
Devang Dave,
Abhishek Gupta
Abstract:
We develop a new algorithm for scheduling the charging process of a large number of electric vehicles (EVs) over a finite horizon. We assume that EVs arrive at the charging stations with different charge levels and different flexibility windows. The arrival process is assumed to have a known distribution and that the charging process of EVs can be preemptive. We pose the scheduling problem as a dy…
▽ More
We develop a new algorithm for scheduling the charging process of a large number of electric vehicles (EVs) over a finite horizon. We assume that EVs arrive at the charging stations with different charge levels and different flexibility windows. The arrival process is assumed to have a known distribution and that the charging process of EVs can be preemptive. We pose the scheduling problem as a dynamic program with constraints. We show that the resulting formulation leads to a monotone dynamic program with Lipschitz continuous value functions that are robust against perturbation of system parameters. We propose a simulation based fitted value iteration algorithm to determine the value function approximately, and derive the sample complexity for computing the approximately optimal solution.
△ Less
Submitted 30 November, 2022; v1 submitted 20 August, 2022;
originally announced August 2022.
-
Human-to-Robot Imitation in the Wild
Authors:
Shikhar Bahl,
Abhinav Gupta,
Deepak Pathak
Abstract:
We approach the problem of learning by watching humans in the wild. While traditional approaches in Imitation and Reinforcement Learning are promising for learning in the real world, they are either sample inefficient or are constrained to lab settings. Meanwhile, there has been a lot of success in processing passive, unstructured human data. We propose tackling this problem via an efficient one-s…
▽ More
We approach the problem of learning by watching humans in the wild. While traditional approaches in Imitation and Reinforcement Learning are promising for learning in the real world, they are either sample inefficient or are constrained to lab settings. Meanwhile, there has been a lot of success in processing passive, unstructured human data. We propose tackling this problem via an efficient one-shot robot learning algorithm, centered around learning from a third-person perspective. We call our method WHIRL: In-the-Wild Human Imitating Robot Learning. WHIRL extracts a prior over the intent of the human demonstrator, using it to initialize our agent's policy. We introduce an efficient real-world policy learning scheme that improves using interactions. Our key contributions are a simple sampling-based policy optimization approach, a novel objective function for aligning human and robot videos as well as an exploration method to boost sample efficiency. We show one-shot generalization and success in real-world settings, including 20 different manipulation tasks in the wild. Videos and talk at https://human2robot.github.io
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
Interference Constrained Beam Alignment for Time-Varying Channels via Kernelized Bandits
Authors:
Yuntian Deng,
Xingyu Zhou,
Arnob Ghosh,
Abhishek Gupta,
Ness B. Shroff
Abstract:
To fully utilize the abundant spectrum resources in millimeter wave (mmWave), Beam Alignment (BA) is necessary for large antenna arrays to achieve large array gains. In practical dynamic wireless environments, channel modeling is challenging due to time-varying and multipath effects. In this paper, we formulate the beam alignment problem as a non-stationary online learning problem with the objecti…
▽ More
To fully utilize the abundant spectrum resources in millimeter wave (mmWave), Beam Alignment (BA) is necessary for large antenna arrays to achieve large array gains. In practical dynamic wireless environments, channel modeling is challenging due to time-varying and multipath effects. In this paper, we formulate the beam alignment problem as a non-stationary online learning problem with the objective to maximize the received signal strength under interference constraint. In particular, we employ the non-stationary kernelized bandit to leverage the correlation among beams and model the complex beamforming and multipath channel functions. Furthermore, to mitigate interference to other user equipment, we leverage the primal-dual method to design a constrained UCB-type kernelized bandit algorithm. Our theoretical analysis indicates that the proposed algorithm can adaptively adjust the beam in time-varying environments, such that both the cumulative regret of the received signal and constraint violations have sublinear bounds with respect to time. This result is of independent interest for applications such as adaptive pricing and news ranking. In addition, the algorithm assumes the channel is a black-box function and does not require any prior knowledge for dynamic channel modeling, and thus is applicable in a variety of scenarios. We further show that if the information about the channel variation is known, the algorithm will have better theoretical guarantees and performance. Finally, we conduct simulations to highlight the effectiveness of the proposed algorithm.
△ Less
Submitted 2 July, 2022;
originally announced July 2022.