-
Resilience of the Electric Grid through Trustable IoT-Coordinated Assets
Authors:
Vineet J. Nair,
Venkatesh Venkataramanan,
Priyank Srivastava,
Partha S. Sarker,
Anurag Srivastava,
Laurentiu D. Marinovici,
Jun Zha,
Christopher Irwin,
Prateek Mittal,
John Williams,
H. Vincent Poor,
Anuradha M. Annaswamy
Abstract:
The electricity grid has evolved from a physical system to a cyber-physical system with digital devices that perform measurement, control, communication, computation, and actuation. The increased penetration of distributed energy resources (DERs) that include renewable generation, flexible loads, and storage provides extraordinary opportunities for improvements in efficiency and sustainability. Ho…
▽ More
The electricity grid has evolved from a physical system to a cyber-physical system with digital devices that perform measurement, control, communication, computation, and actuation. The increased penetration of distributed energy resources (DERs) that include renewable generation, flexible loads, and storage provides extraordinary opportunities for improvements in efficiency and sustainability. However, they can introduce new vulnerabilities in the form of cyberattacks, which can cause significant challenges in ensuring grid resilience. %, i.e. the ability to rapidly restore grid services in the face of severe disruptions. We propose a framework in this paper for achieving grid resilience through suitably coordinated assets including a network of Internet of Things (IoT) devices. A local electricity market is proposed to identify trustable assets and carry out this coordination. Situational Awareness (SA) of locally available DERs with the ability to inject power or reduce consumption is enabled by the market, together with a monitoring procedure for their trustability and commitment. With this SA, we show that a variety of cyberattacks can be mitigated using local trustable resources without stressing the bulk grid. The demonstrations are carried out using a variety of platforms with a high-fidelity co-simulation platform, real-time hardware-in-the-loop validation, and a utility-friendly simulator.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Capture Point Control in Thruster-Assisted Bipedal Locomotion
Authors:
Shreyansh Pitroda,
Aditya Bondada,
Kaushik Venkatesh Krishnamurthy,
Adarsh Salagame,
Chenghao Wang,
Taoran Liu,
Bibek Gupta,
Eric Sihite,
Reza Nemovi,
Alireza Ramezani,
Morteza Gharib
Abstract:
Despite major advancements in control design that are robust to unplanned disturbances, bipedal robots are still susceptible to falling over and struggle to negotiate rough terrains. By utilizing thrusters in our bipedal robot, we can perform additional posture manipulation and expand the modes of locomotion to enhance the robot's stability and ability to negotiate rough and difficult-to-navigate…
▽ More
Despite major advancements in control design that are robust to unplanned disturbances, bipedal robots are still susceptible to falling over and struggle to negotiate rough terrains. By utilizing thrusters in our bipedal robot, we can perform additional posture manipulation and expand the modes of locomotion to enhance the robot's stability and ability to negotiate rough and difficult-to-navigate terrains. In this paper, we present our efforts in designing a controller based on capture point control for our thruster-assisted walking model named Harpy and explore its control design possibilities. While capture point control based on centroidal models for bipedal systems has been extensively studied, the incorporation of external forces that can influence the dynamics of linear inverted pendulum models, often used in capture point-based works, has not been explored before. The inclusion of these external forces can lead to interesting interpretations of locomotion, such as virtual buoyancy studied in aquatic-legged locomotion. This paper outlines the dynamical model of our robot, the capture point method we use to assist the upper body stabilization, and the simulation work done to show the controller's feasibility.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Thruster-Assisted Incline Walking
Authors:
Kaushik Venkatesh Krishnamurthy,
Chenghao Wang,
Shreyansh Pitroda,
Adarsh Salagame,
Eric Sihite,
Reza Nemovi,
Alireza Ramezani,
Morteza Gharib
Abstract:
In this study, our aim is to evaluate the effectiveness of thruster-assisted steep slope walking for the Husky Carbon, a quadrupedal robot equipped with custom-designed actuators and plural electric ducted fans, through simulation prior to conducting experimental trials. Thruster-assisted steep slope walking draws inspiration from wing-assisted incline running (WAIR) observed in birds, and intrigu…
▽ More
In this study, our aim is to evaluate the effectiveness of thruster-assisted steep slope walking for the Husky Carbon, a quadrupedal robot equipped with custom-designed actuators and plural electric ducted fans, through simulation prior to conducting experimental trials. Thruster-assisted steep slope walking draws inspiration from wing-assisted incline running (WAIR) observed in birds, and intriguingly incorporates posture manipulation and thrust vectoring, a locomotion technique not previously explored in the animal kingdom. Our approach involves develo** a reduced-order model of the Husky robot, followed by the application of an optimization-based controller utilizing collocation methods and dynamics interpolation to determine control actions. Through simulation testing, we demonstrate the feasibility of hardware implementation of our controller.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Qubit-efficient Variational Quantum Algorithms for Image Segmentation
Authors:
Supreeth Mysore Venkatesh,
Antonio Macaluso,
Marlon Nuske,
Matthias Klusch,
Andreas Dengel
Abstract:
Quantum computing is expected to transform a range of computational tasks beyond the reach of classical algorithms. In this work, we examine the application of variational quantum algorithms (VQAs) for unsupervised image segmentation to partition images into separate semantic regions. Specifically, we formulate the task as a graph cut optimization problem and employ two established qubit-efficient…
▽ More
Quantum computing is expected to transform a range of computational tasks beyond the reach of classical algorithms. In this work, we examine the application of variational quantum algorithms (VQAs) for unsupervised image segmentation to partition images into separate semantic regions. Specifically, we formulate the task as a graph cut optimization problem and employ two established qubit-efficient VQAs, which we refer to as Parametric Gate Encoding (PGE) and Ancilla Basis Encoding (ABE), to find the optimal segmentation mask. In addition, we propose Adaptive Cost Encoding (ACE), a new approach that leverages the same circuit architecture as ABE but adopts a problem-dependent cost function. We benchmark PGE, ABE and ACE on synthetically generated images, focusing on quality and trainability. ACE shows consistently faster convergence in training the parameterized quantum circuits in comparison to PGE and ABE. Furthermore, we provide a theoretical analysis of the scalability of these approaches against the Quantum Approximate Optimization Algorithm (QAOA), showing a significant cutback in the quantum resources, especially in the number of qubits that logarithmically depends on the number of pixels. The results validate the strengths of ACE, while concurrently highlighting its inherent limitations and challenges. This paves way for further research in quantum-enhanced computer vision.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Narrow-Path, Dynamic Walking Using Integrated Posture Manipulation and Thrust Vectoring
Authors:
Kaushik Venkatesh Krishnamurthy,
Chenghao Wang,
Shreyansh Pitroda,
Adarsh Salagame,
Eric Sihite,
Reza Nemovi,
Alireza Ramezani,
Morteza Gharib
Abstract:
This research concentrates on enhancing the navigational capabilities of Northeastern Universitys Husky, a multi-modal quadrupedal robot, that can integrate posture manipulation and thrust vectoring, to traverse through narrow pathways such as walking over pipes and slacklining. The Husky is outfitted with thrusters designed to stabilize its body during dynamic walking over these narrow paths. The…
▽ More
This research concentrates on enhancing the navigational capabilities of Northeastern Universitys Husky, a multi-modal quadrupedal robot, that can integrate posture manipulation and thrust vectoring, to traverse through narrow pathways such as walking over pipes and slacklining. The Husky is outfitted with thrusters designed to stabilize its body during dynamic walking over these narrow paths. The project involves modeling the robot using the HROM (Husky Reduced Order Model) and develo** an optimal control framework. This framework is based on polynomial approximation of the HROM and a collocation approach to derive optimal thruster commands necessary for achieving dynamic walking on narrow paths. The effectiveness of the modeling and control design approach is validated through simulations conducted using Matlab.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
On Structural Non-commutativity in Affine Feedback of SISO Nonlinear Systems
Authors:
Venkatesh G. S.
Abstract:
The affine feedback connection of SISO nonlinear systems modeled by Chen--Fliess series is shown to be a group action on the plant which is isomorphic to the semi-direct product of shuffle and additive group of non-commutative formal power series. The additive and multiplicative feedback loops in an affine feedback connection are thus proven to be structurally non-commutative. A flip in the order…
▽ More
The affine feedback connection of SISO nonlinear systems modeled by Chen--Fliess series is shown to be a group action on the plant which is isomorphic to the semi-direct product of shuffle and additive group of non-commutative formal power series. The additive and multiplicative feedback loops in an affine feedback connection are thus proven to be structurally non-commutative. A flip in the order of these loops results in a net additive feedback loop.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
PnP Restoration with Domain Adaptation for SANS
Authors:
Shirin Shoushtari,
Edward P. Chandler,
Jialiang Zhang,
Manjula Senanayake,
Sai Venkatesh **ali,
Marcus Foston,
Ulugbek S. Kamilov
Abstract:
Small Angle Neutron Scattering (SANS) is a non-destructive technique utilized to probe the nano- to mesoscale structure of materials by analyzing the scattering pattern of neutrons. Accelerating SANS acquisition for in-situ analysis is essential, but it often reduces the signal-to-noise ratio (SNR), highlighting the need for methods to enhance SNR even with short acquisition times. While deep lear…
▽ More
Small Angle Neutron Scattering (SANS) is a non-destructive technique utilized to probe the nano- to mesoscale structure of materials by analyzing the scattering pattern of neutrons. Accelerating SANS acquisition for in-situ analysis is essential, but it often reduces the signal-to-noise ratio (SNR), highlighting the need for methods to enhance SNR even with short acquisition times. While deep learning (DL) can be used for enhancing SNR of low quality SANS, the amount of experimental data available for training is usually severely limited. We address this issue by proposing a Plug-and-play Restoration for SANS (PR-SANS) that uses domain-adapted priors. The prior in PR-SANS is initially trained on a set of generic images and subsequently fine-tuned using a limited amount of experimental SANS data. We present a theoretical convergence analysis of PR-SANS by focusing on the error resulting from using inexact domain-adapted priors instead of the ideal ones. We demonstrate with experimentally collected SANS data that PR-SANS can recover high-SNR 2D SANS detector images from low-SNR detector images, effectively increasing the SNR. This advancement enables a reduction in acquisition times by a factor of 12 while maintaining the original signal quality.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Spintronic Implementation of UNet for Image Segmentation
Authors:
Venkatesh Vadde,
Bhaskaran Muralidharan,
Abhishek Sharma
Abstract:
Image segmentation plays a crucial role in computer vision applications like self-driving cars, satellite imagery analysis, and medical diagnosis. Implementing these complex deep neural networks on conventional hardware is highly inefficient. In this work, we propose hardware implementation of UNet for segmentation tasks, using spintronic devices. Our approach involves designing hardware for convo…
▽ More
Image segmentation plays a crucial role in computer vision applications like self-driving cars, satellite imagery analysis, and medical diagnosis. Implementing these complex deep neural networks on conventional hardware is highly inefficient. In this work, we propose hardware implementation of UNet for segmentation tasks, using spintronic devices. Our approach involves designing hardware for convolution, deconvolution, ReLU, and max pooling layers of the UNet architecture. We demonstrate the synaptic behavior of the domain wall MTJ, and design convolution and deconvolution layers using the domain wall-based crossbar array. We utilize the orthogonal current injected MTJ with its continuous resistance change and showcase the ReLU and max pooling functions. We employ a hybrid simulation setup by coupling micromagnetic simulation, non-equilibrium Green's function, Landau-Lifshitz-Gilbert-Slonczewski equations, and circuit simulation with Python programming to incorporate the diverse physics of spin-transport, magnetization dynamics, and CMOS elements in our proposed designs. We evaluate our UNet design on the CamVid dataset and achieve segmentation accuracies that are comparable to software implementation. During training, our design consumes 43.59pJ of energy for synaptic weight updates.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Real-time Low-latency Music Source Separation using Hybrid Spectrogram-TasNet
Authors:
Satvik Venkatesh,
Arthur Benilov,
Philip Coleman,
Frederic Roskam
Abstract:
There have been significant advances in deep learning for music demixing in recent years. However, there has been little attention given to how these neural networks can be adapted for real-time low-latency applications, which could be helpful for hearing aids, remixing audio streams and live shows. In this paper, we investigate the various challenges involved in adapting current demixing models i…
▽ More
There have been significant advances in deep learning for music demixing in recent years. However, there has been little attention given to how these neural networks can be adapted for real-time low-latency applications, which could be helpful for hearing aids, remixing audio streams and live shows. In this paper, we investigate the various challenges involved in adapting current demixing models in the literature for this use case. Subsequently, inspired by the Hybrid Demucs architecture, we propose the Hybrid Spectrogram Time-domain Audio Separation Network HS-TasNet, which utilises the advantages of spectral and waveform domains. For a latency of 23 ms, the HS-TasNet obtains an overall signal-to-distortion ratio (SDR) of 4.65 on the MusDB test set, and increases to 5.55 with additional training data. These results demonstrate the potential of efficient demixing for real-time low-latency music applications.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling
Authors:
Raunaq Bhirangi,
Chenyu Wang,
Venkatesh Pattabiraman,
Carmel Majidi,
Abhinav Gupta,
Tess Hellebrekers,
Lerrel Pinto
Abstract:
Reasoning from sequences of raw sensory data is a ubiquitous problem across fields ranging from medical devices to robotics. These problems often involve using long sequences of raw sensor data (e.g. magnetometers, piezoresistors) to predict sequences of desirable physical quantities (e.g. force, inertial measurements). While classical approaches are powerful for locally-linear prediction problems…
▽ More
Reasoning from sequences of raw sensory data is a ubiquitous problem across fields ranging from medical devices to robotics. These problems often involve using long sequences of raw sensor data (e.g. magnetometers, piezoresistors) to predict sequences of desirable physical quantities (e.g. force, inertial measurements). While classical approaches are powerful for locally-linear prediction problems, they often fall short when using real-world sensors. These sensors are typically non-linear, are affected by extraneous variables (e.g. vibration), and exhibit data-dependent drift. For many problems, the prediction task is exacerbated by small labeled datasets since obtaining ground-truth labels requires expensive equipment. In this work, we present Hierarchical State-Space Models (HiSS), a conceptually simple, new technique for continuous sequential prediction. HiSS stacks structured state-space models on top of each other to create a temporal hierarchy. Across six real-world sensor datasets, from tactile-based state prediction to accelerometer-based inertial measurement, HiSS outperforms state-of-the-art sequence models such as causal Transformers, LSTMs, S4, and Mamba by at least 23% on MSE. Our experiments further indicate that HiSS demonstrates efficient scaling to smaller datasets and is compatible with existing data-filtering techniques. Code, datasets and videos can be found on https://hiss-csp.github.io.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Zak-OTFS and LDPC Codes
Authors:
Beyza Dabak,
Venkatesh Khammammetti,
Saif Khan Mohammed,
Robert Calderbank
Abstract:
Orthogonal Time Frequency Space (OTFS) is a framework for communications and active sensing that processes signals in the delay-Doppler (DD) domain. It is informed by 6G propagation environments, where Doppler spreads measured in kHz make it more and more difficult to estimate channels, and the standard model-dependent approach to wireless communication is starting to break down. We consider Zak-O…
▽ More
Orthogonal Time Frequency Space (OTFS) is a framework for communications and active sensing that processes signals in the delay-Doppler (DD) domain. It is informed by 6G propagation environments, where Doppler spreads measured in kHz make it more and more difficult to estimate channels, and the standard model-dependent approach to wireless communication is starting to break down. We consider Zak-OTFS where inverse Zak transform converts information symbols mounted on DD domain pulses to the time domain for transmission. Zak-OTFS modulation is parameterized by a delay period $τ_{p}$ and a Doppler period $ν_{p}$, where the product $τ_{p}ν_{p}=1$. When the channel spread is less than the delay period, and the Doppler spread is less than the Doppler period, the Zak-OTFS input-output relation can be predicted from the response to a single pilot symbol. The highly reliable channel estimates concentrate around the pilot location, and we configure low-density parity-check (LDPC) codes that take advantage of this prior information about reliability. It is advantageous to allocate information symbols to more reliable bins in the DD domain. We report simulation results for a Veh-A channel model where it is not possible to resolve all the paths, showing that LDPC coding extends the range of Doppler spreads for which reliable model-free communication is possible. We show that LDPC coding reduces sensitivity to the choice of transmit filter, making bandwidth expansion less necessary. Finally, we compare BER performance of Zak-OTFS to that of a multicarrier approximation (MC-OTFS), showing LDPC coding amplifies the gains previously reported for uncoded transmission.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Out-of-Distribution Detection and Data Drift Monitoring using Statistical Process Control
Authors:
Ghada Zamzmi,
Kesavan Venkatesh,
Brandon Nelson,
Smriti Prathapan,
Paul H. Yi,
Berkman Sahiner,
Jana G. Delfino
Abstract:
Background: Machine learning (ML) methods often fail with data that deviates from their training distribution. This is a significant concern for ML-enabled devices in clinical settings, where data drift may cause unexpected performance that jeopardizes patient safety.
Method: We propose a ML-enabled Statistical Process Control (SPC) framework for out-of-distribution (OOD) detection and drift mon…
▽ More
Background: Machine learning (ML) methods often fail with data that deviates from their training distribution. This is a significant concern for ML-enabled devices in clinical settings, where data drift may cause unexpected performance that jeopardizes patient safety.
Method: We propose a ML-enabled Statistical Process Control (SPC) framework for out-of-distribution (OOD) detection and drift monitoring. SPC is advantageous as it visually and statistically highlights deviations from the expected distribution. To demonstrate the utility of the proposed framework for monitoring data drift in radiological images, we investigated different design choices, including methods for extracting feature representations, drift quantification, and SPC parameter selection.
Results: We demonstrate the effectiveness of our framework for two tasks: 1) differentiating axial vs. non-axial computed tomography (CT) images and 2) separating chest x-ray (CXR) from other modalities. For both tasks, we achieved high accuracy in detecting OOD inputs, with 0.913 in CT and 0.995 in CXR, and sensitivity of 0.980 in CT and 0.984 in CXR. Our framework was also adept at monitoring data streams and identifying the time a drift occurred. In a simulation with 100 daily CXR cases, we detected a drift in OOD input percentage from 0-1% to 3-5% within two days, maintaining a low false-positive rate. Through additional experimental results, we demonstrate the framework's data-agnostic nature and independence from the underlying model's structure.
Conclusion: We propose a framework for OOD detection and drift monitoring that is agnostic to data, modality, and model. The framework is customizable and can be adapted for specific applications.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion
Authors:
**han Wang,
Long Chen,
Aparna Khare,
Anirudh Raju,
Pranav Dheram,
Di He,
Minhua Wu,
Andreas Stolcke,
Venkatesh Ravichandran
Abstract:
We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM). Experiments on the Switchboard human-human conversation dataset demonstrate that our approach consistently outperforms the baseline models with single modality. We also develop a novel multi-task instruction fine-tuning…
▽ More
We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM). Experiments on the Switchboard human-human conversation dataset demonstrate that our approach consistently outperforms the baseline models with single modality. We also develop a novel multi-task instruction fine-tuning strategy to further benefit from LLM-encoded knowledge for understanding the tasks and conversational contexts, leading to additional improvements. Our approach demonstrates the potential of combined LLMs and acoustic models for a more natural and conversational interaction between humans and speech-enabled AI agents.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
Two-pass Endpoint Detection for Speech Recognition
Authors:
Anirudh Raju,
Aparna Khare,
Di He,
Ilya Sklyar,
Long Chen,
Sam Alptekin,
Viet Anh Trinh,
Zhe Zhang,
Colin Vaz,
Venkatesh Ravichandran,
Roland Maas,
Ariya Rastrow
Abstract:
Endpoint (EP) detection is a key component of far-field speech recognition systems that assist the user through voice commands. The endpoint detector has to trade-off between accuracy and latency, since waiting longer reduces the cases of users being cut-off early. We propose a novel two-pass solution for endpointing, where the utterance endpoint detected from a first pass endpointer is verified b…
▽ More
Endpoint (EP) detection is a key component of far-field speech recognition systems that assist the user through voice commands. The endpoint detector has to trade-off between accuracy and latency, since waiting longer reduces the cases of users being cut-off early. We propose a novel two-pass solution for endpointing, where the utterance endpoint detected from a first pass endpointer is verified by a 2nd-pass model termed EP Arbitrator. Our method improves the trade-off between early cut-offs and latency over a baseline endpointer, as tested on datasets including voice-assistant transactional queries, conversational speech, and the public SLURP corpus. We demonstrate that our method shows improvements regardless of the first-pass EP model used.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
An Unobtrusive and Lightweight Ear-worn System for Continuous Epileptic Seizure Detection
Authors:
Abdul Aziz,
Nhat Pham,
Neel Vora,
Cody Reynolds,
Jaime Lehnen,
Pooja Venkatesh,
Zhuoran Yao,
Jay Harvey,
Tam Vu,
Kan Ding,
Phuc Nguyen
Abstract:
Epilepsy is one of the most common neurological diseases globally, affecting around 50 million people worldwide. Fortunately, up to 70 percent of people with epilepsy could live seizure-free if properly diagnosed and treated, and a reliable technique to monitor the onset of seizures could improve the quality of life of patients who are constantly facing the fear of random seizure attacks. The scal…
▽ More
Epilepsy is one of the most common neurological diseases globally, affecting around 50 million people worldwide. Fortunately, up to 70 percent of people with epilepsy could live seizure-free if properly diagnosed and treated, and a reliable technique to monitor the onset of seizures could improve the quality of life of patients who are constantly facing the fear of random seizure attacks. The scalp-based EEG test, despite being the gold standard for diagnosing epilepsy, is costly, necessitates hospitalization, demands skilled professionals for operation, and is discomforting for users. In this paper, we propose EarSD, a novel lightweight, unobtrusive, and socially acceptable ear-worn system to detect epileptic seizure onsets by measuring the physiological signals from behind the user's ears. EarSD includes an integrated custom-built sensing, computing, and communication PCB to collect and amplify the signals of interest, remove the noises caused by motion artifacts and environmental impacts, and stream the data wirelessly to the computer or mobile phone nearby, where data are uploaded to the host computer for further processing. We conducted both in-lab and in-hospital experiments with epileptic seizure patients who were hospitalized for seizure studies. The preliminary results confirm that EarSD can detect seizures with up to 95.3 percent accuracy by just using classical machine learning algorithms.
△ Less
Submitted 1 January, 2024;
originally announced January 2024.
-
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Authors:
Anirudh S. Sundar,
Chao-Han Huck Yang,
David M. Chan,
Shalini Ghosh,
Venkatesh Ravichandran,
Phani Sankar Nidadavolu
Abstract:
Training large foundation models using self-supervised objectives on unlabeled data, followed by fine-tuning on downstream tasks, has emerged as a standard procedure. Unfortunately, the efficacy of this approach is often constrained by both limited fine-tuning compute and scarcity in labeled downstream data. We introduce Multimodal Attention Merging (MAM), an attempt that facilitates direct knowle…
▽ More
Training large foundation models using self-supervised objectives on unlabeled data, followed by fine-tuning on downstream tasks, has emerged as a standard procedure. Unfortunately, the efficacy of this approach is often constrained by both limited fine-tuning compute and scarcity in labeled downstream data. We introduce Multimodal Attention Merging (MAM), an attempt that facilitates direct knowledge transfer from attention matrices of models rooted in high resource modalities, text and images, to those in resource-constrained domains, speech and audio, employing a zero-shot paradigm. MAM reduces the relative Word Error Rate (WER) of an Automatic Speech Recognition (ASR) model by up to 6.70%, and relative classification error of an Audio Event Classification (AEC) model by 10.63%. In cases where some data/compute is available, we present Learnable-MAM, a data-driven approach to merging attention matrices, resulting in a further 2.90% relative reduction in WER for ASR and 18.42% relative reduction in AEC compared to fine-tuning.
△ Less
Submitted 9 February, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
Quadrupedal Locomotion Control On Inclined Surfaces Using Collocation Method
Authors:
Adarsh Salagame,
Maria Gianello,
Chenghao Wang,
Kaushik Venkatesh,
Shreyansh Pitroda,
Rohit Rajput,
Eric Sihite,
Miriam Leeser,
Alireza Ramezani
Abstract:
Inspired by Chukars wing-assisted incline running (WAIR), in this work, we employ a high-fidelity model of our Husky Carbon quadrupedal-legged robot to walk over steep slopes of up to 45 degrees. Chukars use the aerodynamic forces generated by their flap** wings to manipulate ground contact forces and traverse steep slopes and even overhangs. By exploiting the thrusters on Husky, we employed a c…
▽ More
Inspired by Chukars wing-assisted incline running (WAIR), in this work, we employ a high-fidelity model of our Husky Carbon quadrupedal-legged robot to walk over steep slopes of up to 45 degrees. Chukars use the aerodynamic forces generated by their flap** wings to manipulate ground contact forces and traverse steep slopes and even overhangs. By exploiting the thrusters on Husky, we employed a collocation approach to rapidly resolving the joint and thruster actions. Our approach uses a polynomial approximation of the reduced-order dynamics of Husky, called HROM, to quickly and efficiently find optimal control actions that permit high-slope walking without violating friction cone conditions.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
A Review of Machine Learning Methods Applied to Video Analysis Systems
Authors:
Marios S. Pattichis,
Venkatesh Jatla,
Alvaro E. Ullao Cerna
Abstract:
The paper provides a survey of the development of machine-learning techniques for video analysis. The survey provides a summary of the most popular deep learning methods used for human activity recognition. We discuss how popular architectures perform on standard datasets and highlight the differences from real-life datasets dominated by multiple activities performed by multiple participants over…
▽ More
The paper provides a survey of the development of machine-learning techniques for video analysis. The survey provides a summary of the most popular deep learning methods used for human activity recognition. We discuss how popular architectures perform on standard datasets and highlight the differences from real-life datasets dominated by multiple activities performed by multiple participants over long periods. For real-life datasets, we describe the use of low-parameter models (with 200X or 1,000X fewer parameters) that are trained to detect a single activity after the relevant objects have been successfully detected. Our survey then turns to a summary of machine learning methods that are specifically developed for working with a small number of labeled video samples. Our goal here is to describe modern techniques that are specifically designed so as to minimize the amount of ground truth that is needed for training and testing video analysis systems. We provide summaries of the development of self-supervised learning, semi-supervised learning, active learning, and zero-shot learning for applications in video analysis. For each method, we provide representative examples.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
How Strong a Kick Should be to Topple Northeastern's Tumbling Robot?
Authors:
Adarsh Salagame,
Neha Bhattachan,
Andre Caetano,
Ian McCarthy,
Henry Noyes,
Brandon Petersen,
Alexander Qiu,
Matthew Schroeter,
Nolan Smithwick,
Konrad Sroka,
Jason Widjaja,
Yash Bohra,
Kaushik Venkatesh,
Kruthika Gangaraju,
Paul Ghanem,
Ioannis Mandralis,
Eric Sihite,
Arash Kalantari,
Alireza Ramezani
Abstract:
Rough terrain locomotion has remained one of the most challenging mobility questions. In 2022, NASA's Innovative Advanced Concepts (NIAC) Program invited US academic institutions to participate NASA's Breakthrough, Innovative \& Game-changing (BIG) Idea competition by proposing novel mobility systems that can negotiate extremely rough terrain, lunar bumpy craters. In this competition, Northeastern…
▽ More
Rough terrain locomotion has remained one of the most challenging mobility questions. In 2022, NASA's Innovative Advanced Concepts (NIAC) Program invited US academic institutions to participate NASA's Breakthrough, Innovative \& Game-changing (BIG) Idea competition by proposing novel mobility systems that can negotiate extremely rough terrain, lunar bumpy craters. In this competition, Northeastern University won NASA's top Artemis Award award by proposing an articulated robot tumbler called COBRA (Crater Observing Bio-inspired Rolling Articulator). This report briefly explains the underlying principles that made COBRA successful in competing with other concepts ranging from cable-driven to multi-legged designs from six other participating US institutions.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
Improving fairness for spoken language understanding in atypical speech with Text-to-Speech
Authors:
Helin Wang,
Venkatesh Ravichandran,
Milind Rao,
Becky Lammers,
Myra Sydnor,
Nicholas Maragakis,
Ankur A. Butala,
Jayne Zhang,
Lora Clawson,
Victoria Chovaz,
Laureano Moro-Velazquez
Abstract:
Spoken language understanding (SLU) systems often exhibit suboptimal performance in processing atypical speech, typically caused by neurological conditions and motor impairments. Recent advancements in Text-to-Speech (TTS) synthesis-based augmentation for more fair SLU have struggled to accurately capture the unique vocal characteristics of atypical speakers, largely due to insufficient data. To a…
▽ More
Spoken language understanding (SLU) systems often exhibit suboptimal performance in processing atypical speech, typically caused by neurological conditions and motor impairments. Recent advancements in Text-to-Speech (TTS) synthesis-based augmentation for more fair SLU have struggled to accurately capture the unique vocal characteristics of atypical speakers, largely due to insufficient data. To address this issue, we present a novel data augmentation method for atypical speakers by finetuning a TTS model, called Aty-TTS. Aty-TTS models speaker and atypical characteristics via knowledge transferring from a voice conversion model. Then, we use the augmented data to train SLU models adapted to atypical speech. To train these data augmentation models and evaluate the resulting SLU systems, we have collected a new atypical speech dataset containing intent annotation. Both objective and subjective assessments validate that Aty-TTS is capable of generating high-quality atypical speech. Furthermore, it serves as an effective data augmentation strategy, contributing to more fair SLU systems that can better accommodate individuals with atypical speech patterns.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Stochastic Control with Distributionally Robust Constraints for Cyber-Physical Systems Vulnerable to Attacks
Authors:
Nishanth Venkatesh,
Aditya Dave,
Ioannis Faros,
Andreas A. Malikopoulos
Abstract:
In this paper, we investigate the control of a cyber-physical system (CPS) while accounting for its vulnerability to external attacks. We formulate a constrained stochastic problem with a robust constraint to ensure robust operation against potential attacks. We seek to minimize the expected cost subject to a constraint limiting the worst-case expected damage an attacker can impose on the CPS. We…
▽ More
In this paper, we investigate the control of a cyber-physical system (CPS) while accounting for its vulnerability to external attacks. We formulate a constrained stochastic problem with a robust constraint to ensure robust operation against potential attacks. We seek to minimize the expected cost subject to a constraint limiting the worst-case expected damage an attacker can impose on the CPS. We present a dynamic programming decomposition to compute the optimal control strategy in this robust-constrained formulation and prove its recursive feasibility. We also illustrate the utility of our results by applying them to a numerical simulation.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Heart Rate Detection Using an Event Camera
Authors:
Aniket Jagtap,
RamaKrishna Venkatesh Saripalli,
Joe Lemley,
Waseem Shariff,
Alan F. Smeaton
Abstract:
Event cameras, also known as neuromorphic cameras, are an emerging technology that offer advantages over traditional shutter and frame-based cameras, including high temporal resolution, low power consumption, and selective data acquisition. In this study, we propose to harnesses the capabilities of event-based cameras to capture subtle changes in the surface of the skin caused by the pulsatile flo…
▽ More
Event cameras, also known as neuromorphic cameras, are an emerging technology that offer advantages over traditional shutter and frame-based cameras, including high temporal resolution, low power consumption, and selective data acquisition. In this study, we propose to harnesses the capabilities of event-based cameras to capture subtle changes in the surface of the skin caused by the pulsatile flow of blood in the wrist region. We investigate whether an event camera could be used for continuous noninvasive monitoring of heart rate (HR). Event camera video data from 25 participants, comprising varying age groups and skin colours, was collected and analysed. Ground-truth HR measurements obtained using conventional methods were used to evaluate of the accuracy of automatic detection of HR from event camera data. Our experimental results and comparison to the performance of other non-contact HR measurement methods demonstrate the feasibility of using event cameras for pulse detection. We also acknowledge the challenges and limitations of our method, such as light-induced flickering and the sub-conscious but naturally-occurring tremors of an individual during data capture.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Safeguarding Learning-based Control for Smart Energy Systems with Sampling Specifications
Authors:
Chih-Hong Cheng,
Venkatesh Prasad Venkataramanan,
Pragya Kirti Gupta,
Yun-Fei Hsu,
Simon Burton
Abstract:
We study challenges using reinforcement learning in controlling energy systems, where apart from performance requirements, one has additional safety requirements such as avoiding blackouts. We detail how these safety requirements in real-time temporal logic can be strengthened via discretization into linear temporal logic (LTL), such that the satisfaction of the LTL formulae implies the satisfacti…
▽ More
We study challenges using reinforcement learning in controlling energy systems, where apart from performance requirements, one has additional safety requirements such as avoiding blackouts. We detail how these safety requirements in real-time temporal logic can be strengthened via discretization into linear temporal logic (LTL), such that the satisfaction of the LTL formulae implies the satisfaction of the original safety requirements. The discretization enables advanced engineering methods such as synthesizing shields for safe reinforcement learning as well as formal verification, where for statistical model checking, the probabilistic guarantee acquired by LTL model checking forms a lower bound for the satisfaction of the original real-time safety requirements.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
Hovering Control of Flap** Wings in Tandem with Multi-Rotors
Authors:
Aniket Dhole,
Bibek Gupta,
Adarsh Salagame,
Xuejian Niu,
Yizhe Xu,
Kaushik Venkatesh,
Paul Ghanem,
Ioannis Mandralis,
Eric Sihite,
Alireza Ramezani
Abstract:
This work briefly covers our efforts to stabilize the flight dynamics of Northeastern's tailless bat-inspired micro aerial vehicle, Aerobat. Flap** robots are not new. A plethora of examples is mainly dominated by insect-style design paradigms that are passively stable. However, Aerobat, in addition for being tailless, possesses morphing wings that add to the inherent complexity of flight contro…
▽ More
This work briefly covers our efforts to stabilize the flight dynamics of Northeastern's tailless bat-inspired micro aerial vehicle, Aerobat. Flap** robots are not new. A plethora of examples is mainly dominated by insect-style design paradigms that are passively stable. However, Aerobat, in addition for being tailless, possesses morphing wings that add to the inherent complexity of flight control. The robot can dynamically adjust its wing platform configurations during gait cycles, increasing its efficiency and agility. We employ a guard design with manifold small thrusters to stabilize Aerobat's position and orientation in hovering, a flap** system in tandem with a multi-rotor. For flight control purposes, we take an approach based on assuming the guard cannot observe Aerobat's states. Then, we propose an observer to estimate the unknown states of the guard which are then used for closed-loop hovering control of the Guard-Aerobat platform.
△ Less
Submitted 31 July, 2023;
originally announced August 2023.
-
Spatio-Temporal Deep Learning-Assisted Reduced Security-Constrained Unit Commitment
Authors:
Arun Venkatesh Ramesh,
Xingpeng Li
Abstract:
Security-constrained unit commitment (SCUC) is a computationally complex process utilized in power system day-ahead scheduling and market clearing. SCUC is run daily and requires state-of-the-art algorithms to speed up the process. The constraints and data associated with SCUC are both geographically and temporally correlated to ensure the reliability of the solution, which further increases the c…
▽ More
Security-constrained unit commitment (SCUC) is a computationally complex process utilized in power system day-ahead scheduling and market clearing. SCUC is run daily and requires state-of-the-art algorithms to speed up the process. The constraints and data associated with SCUC are both geographically and temporally correlated to ensure the reliability of the solution, which further increases the complexity. In this paper, an advanced machine learning (ML) model is used to study the patterns in power system historical data, which inherently considers both spatial and temporal (ST) correlations in constraints. The ST-correlated ML model is trained to understand spatial correlation by considering graph neural networks (GNN) whereas temporal sequences are studied using long short-term memory (LSTM) networks. The proposed approach is validated on several test systems namely, IEEE 24-Bus system, IEEE-73 Bus system, IEEE 118-Bus system, and synthetic South-Carolina (SC) 500-Bus system. Moreover, B-θ and power transfer distribution factor (PTDF) based SCUC formulations were considered in this research. Simulation results demonstrate that the ST approach can effectively predict generator commitment schedule and classify critical and non-critical lines in the system which are utilized for model reduction of SCUC to obtain computational enhancement without loss in solution quality
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
Modelling Quasi-Orthographic Captures for Surface Imaging
Authors:
Maniratnam Mandal,
Venkatesh K. Subramanian
Abstract:
Surveillance and surveying are two important applications of empirical research. A major part of terrain modelling is supported by photographic surveys which are used for capturing expansive natural surfaces using a wide range of sensors -- visual, infrared, ultrasonic, radio, etc. A natural surface is non-smooth, unpredictable and fast-varying, and it is difficult to capture all features and reco…
▽ More
Surveillance and surveying are two important applications of empirical research. A major part of terrain modelling is supported by photographic surveys which are used for capturing expansive natural surfaces using a wide range of sensors -- visual, infrared, ultrasonic, radio, etc. A natural surface is non-smooth, unpredictable and fast-varying, and it is difficult to capture all features and reconstruct them accurately. An orthographic image of a surface provides a detailed holistic view capturing its relevant features. In a perfect orthographic reconstruction, images must be captured normal to each point on the surface which is practically impossible. In this paper, a detailed analysis of the constraints on imaging distance is also provided. A novel method is formulated to determine an approximate orthographic region on a surface surrounding the point of focus and additionally, some methods for approximating the orthographic boundary for faster computation is also proposed. The approximation methods have been compared in terms of computational efficiency and accuracy.
△ Less
Submitted 14 May, 2023;
originally announced May 2023.
-
Maximum Likelihood based Phase-Retrieval using Fresnel Propagation Forward Models with Optional Constraints
Authors:
K. Aditya Mohan,
Jean-Baptiste Forien,
Venkatesh Sridhar,
Jefferson A. Cuadra,
Dilworth Parkinson
Abstract:
X-ray phase-contrast tomography (XPCT) is widely used for high contrast 3D imaging using either synchrotron or laboratory microfocus X-ray sources. XPCT enables an order of magnitude improvement in image contrast of the reconstructed material interfaces with low X-ray absorption contrast. The dominant approaches to 3D reconstruction using XPCT relies on the use of phase-retrieval algorithms that m…
▽ More
X-ray phase-contrast tomography (XPCT) is widely used for high contrast 3D imaging using either synchrotron or laboratory microfocus X-ray sources. XPCT enables an order of magnitude improvement in image contrast of the reconstructed material interfaces with low X-ray absorption contrast. The dominant approaches to 3D reconstruction using XPCT relies on the use of phase-retrieval algorithms that make one or more limiting approximations for the experimental configuration and material properties. Since many experimental scenarios violate such approximations, the resulting reconstructions contain blur, artifacts, or other quantitative inaccuracies. Our solution to this problem is to formulate new iterative non-linear phase-retrieval (NLPR) algorithms that avoid such limiting approximations. Compared to the widely used state-of-the-art approaches, we show that our proposed algorithms result in sharp and quantitatively accurate reconstruction with reduced artifacts. Unlike existing NLPR algorithms, our approaches avoid the laborious manual tuning of regularization hyper-parameters while still achieving the stated goals. As an alternative to regularization, we propose explicit constraints on the material properties to constrain the solution space and solve the phase-retrieval problem. These constraints are easily user-configurable since they follow directly from the imaged object's dimensions and material properties.
△ Less
Submitted 2 October, 2023; v1 submitted 29 April, 2023;
originally announced May 2023.
-
Hardware-Impaired Rician-Faded Cell-Free Massive MIMO Systems With Channel Aging
Authors:
Venkatesh Tentu,
Dheeraj N Amudala,
Anish Chattopadhyay,
Rohit Budhiraja
Abstract:
We study the impact of channel aging on the uplink of a cell-free (CF) massive multiple-input multiple-output (mMIMO) system by considering i) spatially-correlated Rician-faded channels; ii) hardware impairments at the access points and user equipments (UEs); and iii) two-layer large-scale fading decoding (LSFD). We first derive a closed-form spectral efficiency (SE) expression for this system, an…
▽ More
We study the impact of channel aging on the uplink of a cell-free (CF) massive multiple-input multiple-output (mMIMO) system by considering i) spatially-correlated Rician-faded channels; ii) hardware impairments at the access points and user equipments (UEs); and iii) two-layer large-scale fading decoding (LSFD). We first derive a closed-form spectral efficiency (SE) expression for this system, and later propose two novel optimization techniques to optimize the non-convex SE metric by exploiting the minorization-maximization (MM) method. The first one requires a numerical optimization solver, and has a high computation complexity. The second one with closed-form transmit power updates, has a trivial computation complexity. We numerically show that i) the two-layer LSFD scheme effectively mitigates the interference due to channel aging for both low- and high-velocity UEs; and ii) increasing the number of AP antennas does not mitigate the SE deterioration due to channel aging. We numerically characterize the optimal pilot length required to maximize the SE for various UE speeds. We also numerically show that the proposed closed-form MM optimization yields the same SE as that of the first technique, which requires numerical solver, and that too with a much reduced time-complexity.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Connected and Automated Vehicles in Mixed-Traffic: Learning Human Driver Behavior for Effective On-Ramp Merging
Authors:
Nishanth Venkatesh,
Viet-Anh Le,
Aditya Dave,
Andreas A. Malikopoulos
Abstract:
Highway merging scenarios featuring mixed traffic conditions pose significant modeling and control challenges for connected and automated vehicles (CAVs) interacting with incoming on-ramp human-driven vehicles (HDVs). In this paper, we present an approach to learn an approximate information state model of CAV-HDV interactions for a CAV to maneuver safely during highway merging. In our approach, th…
▽ More
Highway merging scenarios featuring mixed traffic conditions pose significant modeling and control challenges for connected and automated vehicles (CAVs) interacting with incoming on-ramp human-driven vehicles (HDVs). In this paper, we present an approach to learn an approximate information state model of CAV-HDV interactions for a CAV to maneuver safely during highway merging. In our approach, the CAV learns the behavior of an incoming HDV using approximate information states before generating a control strategy to facilitate merging. First, we validate the efficacy of this framework on real-world data by using it to predict the behavior of an HDV in mixed traffic situations extracted from the Next-Generation Simulation repository. Then, we generate simulation data for HDV-CAV interactions in a highway merging scenario using a standard inverse reinforcement learning approach. Without assuming a prior knowledge of the generating model, we show that our approximate information state model learns to predict the future trajectory of the HDV using only observations. Subsequently, we generate safe control policies for a CAV while merging with HDVs, demonstrating a spectrum of driving behaviors, from aggressive to conservative. We demonstrate the effectiveness of the proposed approach by performing numerical simulations.
△ Less
Submitted 1 April, 2023;
originally announced April 2023.
-
Worst-Case Control and Learning Using Partial Observations Over an Infinite Time-Horizon
Authors:
Aditya Dave,
Ioannis Faros,
Nishanth Venkatesh,
Andreas A. Malikopoulos
Abstract:
Safety-critical cyber-physical systems require control strategies whose worst-case performance is robust against adversarial disturbances and modeling uncertainties. In this paper, we present a framework for approximate control and learning in partially observed systems to minimize the worst-case discounted cost over an infinite time horizon. We model disturbances to the system as finite-valued un…
▽ More
Safety-critical cyber-physical systems require control strategies whose worst-case performance is robust against adversarial disturbances and modeling uncertainties. In this paper, we present a framework for approximate control and learning in partially observed systems to minimize the worst-case discounted cost over an infinite time horizon. We model disturbances to the system as finite-valued uncertain variables with unknown probability distributions. For problems with known system dynamics, we construct a dynamic programming (DP) decomposition to compute the optimal control strategy. Our first contribution is to define information states that improve the computational tractability of this DP without loss of optimality. Then, we describe a simplification for a class of problems where the incurred cost is observable at each time instance. Our second contribution is defining an approximate information state that can be constructed or learned directly from observed data for problems with observable costs. We derive bounds on the performance loss of the resulting approximate control strategy and illustrate the effectiveness of our approach in partially observed decision-making problems with a numerical example.
△ Less
Submitted 31 March, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Few-Shot Domain Adaptation for Low Light RAW Image Enhancement
Authors:
K. Ram Prabhakar,
Vishal Vinod,
Nihar Ranjan Sahoo,
R. Venkatesh Babu
Abstract:
Enhancing practical low light raw images is a difficult task due to severe noise and color distortions from short exposure time and limited illumination. Despite the success of existing Convolutional Neural Network (CNN) based methods, their performance is not adaptable to different camera domains. In addition, such methods also require large datasets with short-exposure and corresponding long-exp…
▽ More
Enhancing practical low light raw images is a difficult task due to severe noise and color distortions from short exposure time and limited illumination. Despite the success of existing Convolutional Neural Network (CNN) based methods, their performance is not adaptable to different camera domains. In addition, such methods also require large datasets with short-exposure and corresponding long-exposure ground truth raw images for each camera domain, which is tedious to compile. To address this issue, we present a novel few-shot domain adaptation method to utilize the existing source camera labeled data with few labeled samples from the target camera to improve the target domain's enhancement quality in extreme low-light imaging. Our experiments show that only ten or fewer labeled samples from the target camera domain are sufficient to achieve similar or better enhancement performance than training a model with a large labeled target camera dataset. To support research in this direction, we also present a new low-light raw image dataset captured with a Nikon camera, comprising short-exposure and their corresponding long-exposure ground truth images.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
Cross-utterance ASR Rescoring with Graph-based Label Propagation
Authors:
Srinath Tankasala,
Long Chen,
Andreas Stolcke,
Anirudh Raju,
Qianli Deng,
Chander Chandak,
Aparna Khare,
Roland Maas,
Venkatesh Ravichandran
Abstract:
We propose a novel approach for ASR N-best hypothesis rescoring with graph-based label propagation by leveraging cross-utterance acoustic similarity. In contrast to conventional neural language model (LM) based ASR rescoring/reranking models, our approach focuses on acoustic information and conducts the rescoring collaboratively among utterances, instead of individually. Experiments on the VCTK da…
▽ More
We propose a novel approach for ASR N-best hypothesis rescoring with graph-based label propagation by leveraging cross-utterance acoustic similarity. In contrast to conventional neural language model (LM) based ASR rescoring/reranking models, our approach focuses on acoustic information and conducts the rescoring collaboratively among utterances, instead of individually. Experiments on the VCTK dataset demonstrate that our approach consistently improves ASR performance, as well as fairness across speaker groups with different accents. Our approach provides a low-cost solution for mitigating the majoritarian bias of ASR systems, without the need to train new domain- or accent-specific models.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
Adaptive Endpointing with Deep Contextual Multi-armed Bandits
Authors:
Do June Min,
Andreas Stolcke,
Anirudh Raju,
Colin Vaz,
Di He,
Venkatesh Ravichandran,
Viet Anh Trinh
Abstract:
Current endpointing (EP) solutions learn in a supervised framework, which does not allow the model to incorporate feedback and improve in an online setting. Also, it is a common practice to utilize costly grid-search to find the best configuration for an endpointing model. In this paper, we aim to provide a solution for adaptive endpointing by proposing an efficient method for choosing an optimal…
▽ More
Current endpointing (EP) solutions learn in a supervised framework, which does not allow the model to incorporate feedback and improve in an online setting. Also, it is a common practice to utilize costly grid-search to find the best configuration for an endpointing model. In this paper, we aim to provide a solution for adaptive endpointing by proposing an efficient method for choosing an optimal endpointing configuration given utterance-level audio features in an online setting, while avoiding hyperparameter grid-search. Our method does not require ground truth labels, and only uses online learning from reward signals without requiring annotated labels. Specifically, we propose a deep contextual multi-armed bandit-based approach, which combines the representational power of neural networks with the action exploration behavior of Thompson modeling algorithms. We compare our approach to several baselines, and show that our deep bandit models also succeed in reducing early cutoff errors while maintaining low latency.
△ Less
Submitted 23 March, 2023;
originally announced March 2023.
-
X-ray Spectral Estimation using Dictionary Learning
Authors:
Wenrui Li,
Venkatesh Sridhar,
K. Aditya Mohan,
Saransh Singh,
Jean-Baptiste Forien,
Xin Liu,
Gregery T. Buzzard,
Charles A. Bouman
Abstract:
As computational tools for X-ray computed tomography (CT) become more quantitatively accurate, knowledge of the source-detector spectral response is critical for quantitative system-independent reconstruction and material characterization capabilities. Directly measuring the spectral response of a CT system is hard, which motivates spectral estimation using transmission data obtained from a collec…
▽ More
As computational tools for X-ray computed tomography (CT) become more quantitatively accurate, knowledge of the source-detector spectral response is critical for quantitative system-independent reconstruction and material characterization capabilities. Directly measuring the spectral response of a CT system is hard, which motivates spectral estimation using transmission data obtained from a collection of known homogeneous objects. However, the associated inverse problem is ill-conditioned, making accurate estimation of the spectrum challenging, particularly in the absence of a close initial guess. In this paper, we describe a dictionary-based spectral estimation method that yields accurate results without the need for any initial estimate of the spectral response. Our method utilizes a MAP estimation framework that combines a physics-based forward model along with an $L_0$ sparsity constraint and a simplex constraint on the dictionary coefficients. Our method uses a greedy support selection method and a new pair-wise iterated coordinate descent method to compute the above estimate. We demonstrate that our dictionary-based method outperforms a state-of-the-art method as shown in a cross-validation experiment on four real datasets collected at beamline 8.3.2 of the Advanced Light Source (ALS).
△ Less
Submitted 26 February, 2023;
originally announced February 2023.
-
LSFD for Rician-Faded Cell-Free mMIMO Systems With Channel Aging and Hardware Impairments
Authors:
Anish Chattopadhyay,
Venkatesh Tentu,
Dheeraj Naidu Amudala,
Rohit Budhiraja
Abstract:
We study the impact of channel aging on the uplink of a cell-free massive multiple-input multiple-output system with hardware impairments. We consider a dynamic analog-to-digital converter architecture at the access points (APs), and low-resolution digital-to-analog converters at the user equipments (UEs). We derive a closed-form spectral efficiency expression by considering i) practical spatially…
▽ More
We study the impact of channel aging on the uplink of a cell-free massive multiple-input multiple-output system with hardware impairments. We consider a dynamic analog-to-digital converter architecture at the access points (APs), and low-resolution digital-to-analog converters at the user equipments (UEs). We derive a closed-form spectral efficiency expression by considering i) practical spatially-correlated Rician channels; ii) hardware impairments at the APs and the UEs; iii) channel aging; and iv) large-scale fading decoding (LSFD). We show that LSFD can effectively mitigate the detrimental effects of i) channel aging for both low and high UE velocities; and ii) inter-user interference for low-velocity UEs but not for high-velocity UEs.
△ Less
Submitted 21 February, 2023;
originally announced February 2023.
-
Approximate Information States for Worst-Case Control and Learning in Uncertain Systems
Authors:
Aditya Dave,
Nishanth Venkatesh,
Andreas A. Malikopoulos
Abstract:
In this paper, we investigate discrete-time decision-making problems in uncertain systems with partially observed states. We consider a non-stochastic model, where uncontrolled disturbances acting on the system take values in bounded sets with unknown distributions. We present a general framework for decision-making in such problems by using the notion of the information state and approximate info…
▽ More
In this paper, we investigate discrete-time decision-making problems in uncertain systems with partially observed states. We consider a non-stochastic model, where uncontrolled disturbances acting on the system take values in bounded sets with unknown distributions. We present a general framework for decision-making in such problems by using the notion of the information state and approximate information state, and introduce conditions to identify an uncertain variable that can be used to compute an optimal strategy through a dynamic program (DP). Next, we relax these conditions and define approximate information states that can be learned from output data without knowledge of system dynamics. We use approximate information states to formulate a DP that yields a strategy with a bounded performance loss. Finally, we illustrate the application of our results in control and reinforcement learning using numerical examples.
△ Less
Submitted 5 April, 2024; v1 submitted 12 January, 2023;
originally announced January 2023.
-
Hardware-Aware Pilot Decontamination Precoding for Multi-cell mMIMO Systems With Rician Fading
Authors:
Harshit Kesarwani,
Dheeraj Naidu Amudala,
Venkatesh Tentu,
Rohit Budhiraja
Abstract:
We consider a hardware-impaired multi-cell Rician faded massive multi-input multi-output (mMIMO) system with two-layer pilot decontamination precoding, also known as large-scale fading precoding (LSFP). Each BS is equipped with a flexible dynamic analog-to-digital converter (ADC)/digital-to-analog converter (DAC) architecture and the user equipments (UEs) have low-resolution ADCs. Further, both BS…
▽ More
We consider a hardware-impaired multi-cell Rician faded massive multi-input multi-output (mMIMO) system with two-layer pilot decontamination precoding, also known as large-scale fading precoding (LSFP). Each BS is equipped with a flexible dynamic analog-to-digital converter (ADC)/digital-to-analog converter (DAC) architecture and the user equipments (UEs) have low-resolution ADCs. Further, both BS and UEs have hardwareimpaired radio frequency chains. The dynamic ADC/DAC architecture allows us to vary the resolution of ADC/DAC connected to each BS antenna, and suitably choose them to maximize the SE. We propose a distortion-aware minimum mean squared error (DA-MMSE) precoder and investigate its usage with two-layer LSFP and conventional single-layer precoding (SLP) for hardware-impaired mMIMO systems. We discuss the use cases of LSFP and SLP with DA-MMSE and distortion-unaware MMSE (DU-MMSE) precoders, which will provide critical insights to the system designer regarding their usage in practical systems.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech
Authors:
Xin Zhang,
Iván Vallés-Pérez,
Andreas Stolcke,
Chengzhu Yu,
Jasha Droppo,
Olabanji Shonibare,
Roberto Barra-Chicote,
Venkatesh Ravichandran
Abstract:
Stuttering is a speech disorder where the natural flow of speech is interrupted by blocks, repetitions or prolongations of syllables, words and phrases. The majority of existing automatic speech recognition (ASR) interfaces perform poorly on utterances with stutter, mainly due to lack of matched training data. Synthesis of speech with stutter thus presents an opportunity to improve ASR for this ty…
▽ More
Stuttering is a speech disorder where the natural flow of speech is interrupted by blocks, repetitions or prolongations of syllables, words and phrases. The majority of existing automatic speech recognition (ASR) interfaces perform poorly on utterances with stutter, mainly due to lack of matched training data. Synthesis of speech with stutter thus presents an opportunity to improve ASR for this type of speech. We describe Stutter-TTS, an end-to-end neural text-to-speech model capable of synthesizing diverse types of stuttering utterances. We develop a simple, yet effective prosody-control strategy whereby additional tokens are introduced into source text during training to represent specific stuttering characteristics. By choosing the position of the stutter tokens, Stutter-TTS allows word-level control of where stuttering occurs in the synthesized utterance. We are able to synthesize stutter events with high accuracy (F1-scores between 0.63 and 0.84, depending on stutter type). By fine-tuning an ASR model on synthetic stuttered speech we are able to reduce word error by 5.7% relative on stuttered utterances, with only minor (<0.2% relative) degradation for fluent utterances.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
Towards Transformer-based Homogenization of Satellite Imagery for Landsat-8 and Sentinel-2
Authors:
Venkatesh Thirugnana Sambandham,
Konstantin Kirchheim,
Sayan Mukhopadhaya,
Frank Ortmeier
Abstract:
Landsat-8 (NASA) and Sentinel-2 (ESA) are two prominent multi-spectral imaging satellite projects that provide publicly available data. The multi-spectral imaging sensors of the satellites capture images of the earth's surface in the visible and infrared region of the electromagnetic spectrum. Since the majority of the earth's surface is constantly covered with clouds, which are not transparent at…
▽ More
Landsat-8 (NASA) and Sentinel-2 (ESA) are two prominent multi-spectral imaging satellite projects that provide publicly available data. The multi-spectral imaging sensors of the satellites capture images of the earth's surface in the visible and infrared region of the electromagnetic spectrum. Since the majority of the earth's surface is constantly covered with clouds, which are not transparent at these wavelengths, many images do not provide much information. To increase the temporal availability of cloud-free images of a certain area, one can combine the observations from multiple sources. However, the sensors of satellites might differ in their properties, making the images incompatible. This work provides a first glance at the possibility of using a transformer-based model to reduce the spectral and spatial differences between observations from both satellite projects. We compare the results to a model based on a fully convolutional UNet architecture. Somewhat surprisingly, we find that, while deep models outperform classical approaches, the UNet significantly outperforms the transformer in our experiments.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
Privacy-Preserving Deep Learning Model for Covid-19 Disease Detection
Authors:
Vijay Srinivas Tida Sai Venkatesh Chilukoti,
Sonya Hsu,
Xiali Hei
Abstract:
Recent studies demonstrated that X-ray radiography showed higher accuracy than Polymerase Chain Reaction (PCR) testing for COVID-19 detection. Therefore, applying deep learning models to X-rays and radiography images increases the speed and accuracy of determining COVID-19 cases. However, due to Health Insurance Portability and Accountability (HIPAA) compliance, the hospitals were unwilling to sha…
▽ More
Recent studies demonstrated that X-ray radiography showed higher accuracy than Polymerase Chain Reaction (PCR) testing for COVID-19 detection. Therefore, applying deep learning models to X-rays and radiography images increases the speed and accuracy of determining COVID-19 cases. However, due to Health Insurance Portability and Accountability (HIPAA) compliance, the hospitals were unwilling to share patient data due to privacy concerns. To maintain privacy, we propose differential private deep learning models to secure the patients' private information. The dataset from the Kaggle website is used to evaluate the designed model for COVID-19 detection. The EfficientNet model version was selected according to its highest test accuracy. The injection of differential privacy constraints into the best-obtained model was made to evaluate performance. The accuracy is noted by varying the trainable layers, privacy loss, and limiting information from each sample. We obtained 84\% accuracy with a privacy loss of 10 during the fine-tuning process.
△ Less
Submitted 9 October, 2022; v1 submitted 7 September, 2022;
originally announced September 2022.
-
Improving GANs for Long-Tailed Data through Group Spectral Regularization
Authors:
Harsh Rangwani,
Naman Jaswani,
Tejan Karmali,
Varun Jampani,
R. Venkatesh Babu
Abstract:
Deep long-tailed learning aims to train useful deep networks on practical, real-world imbalanced distributions, wherein most labels of the tail classes are associated with a few samples. There has been a large body of work to train discriminative models for visual recognition on long-tailed distribution. In contrast, we aim to train conditional Generative Adversarial Networks, a class of image gen…
▽ More
Deep long-tailed learning aims to train useful deep networks on practical, real-world imbalanced distributions, wherein most labels of the tail classes are associated with a few samples. There has been a large body of work to train discriminative models for visual recognition on long-tailed distribution. In contrast, we aim to train conditional Generative Adversarial Networks, a class of image generation models on long-tailed distributions. We find that similar to recognition, state-of-the-art methods for image generation also suffer from performance degradation on tail classes. The performance degradation is mainly due to class-specific mode collapse for tail classes, which we observe to be correlated with the spectral explosion of the conditioning parameter matrix. We propose a novel group Spectral Regularizer (gSR) that prevents the spectral explosion alleviating mode collapse, which results in diverse and plausible image generation even for tail classes. We find that gSR effectively combines with existing augmentation and regularization techniques, leading to state-of-the-art image generation performance on long-tailed data. Extensive experiments demonstrate the efficacy of our regularizer on long-tailed datasets with different degrees of imbalance.
△ Less
Submitted 21 August, 2022;
originally announced August 2022.
-
Feasibility Layer Aided Machine Learning Approach for Day-Ahead Operations
Authors:
Arun Venkatesh Ramesh,
Xingpeng Li
Abstract:
Day-ahead operations involves a complex and computationally intensive optimization process to determine the generator commitment schedule and dispatch. The optimization process is a mixed-integer linear program (MILP) also known as security-constrained unit commitment (SCUC). Independent system operators (ISOs) run SCUC daily and require state-of-the-art algorithms to speed up the process. Existin…
▽ More
Day-ahead operations involves a complex and computationally intensive optimization process to determine the generator commitment schedule and dispatch. The optimization process is a mixed-integer linear program (MILP) also known as security-constrained unit commitment (SCUC). Independent system operators (ISOs) run SCUC daily and require state-of-the-art algorithms to speed up the process. Existing patterns in historical information can be leveraged for model reduction of SCUC, which can provide significant time savings. In this paper, machine learning (ML) based classification approaches, namely logistic regression, neural networks, random forest and K-nearest neighbor, were studied for model reduction of SCUC. The ML was then aided with a feasibility layer (FL) and post-process technique to ensure high-quality solutions. The proposed approach is validated on several test systems namely, IEEE 24-Bus system, IEEE-73 Bus system, IEEE 118-Bus system, 500-Bus system, and Polish 2383-Bus system. Moreover, model reduction of a stochastic SCUC (SSCUC) was demonstrated utilizing a modified IEEE 24-Bus system with renewable generation. Simulation results demonstrate a high training accuracy to identify commitment schedule while FL and post-process ensure ML predictions do not lead to infeasible solutions with minimal loss in solution quality.
△ Less
Submitted 13 August, 2022;
originally announced August 2022.
-
A Letter on Progress Made on Husky Carbon: A Legged-Aerial, Multi-modal Platform
Authors:
Adarsh Salagame,
Shoghair Manjikian,
Chenghao Wang,
Kaushik Venkatesh Krishnamurthy,
Shreyansh Pitroda,
Bibek Gupta,
Tobias Jacob,
Benjamin Mottis,
Eric Sihite,
Milad Ramezani,
Alireza Ramezani
Abstract:
Animals, such as birds, widely use multi-modal locomotion by combining legged and aerial mobility with dominant inertial effects. The robotic biomimicry of this multi-modal locomotion feat can yield ultra-flexible systems in terms of their ability to negotiate their task spaces. The main objective of this paper is to discuss the challenges in achieving multi-modal locomotion, and to report our pro…
▽ More
Animals, such as birds, widely use multi-modal locomotion by combining legged and aerial mobility with dominant inertial effects. The robotic biomimicry of this multi-modal locomotion feat can yield ultra-flexible systems in terms of their ability to negotiate their task spaces. The main objective of this paper is to discuss the challenges in achieving multi-modal locomotion, and to report our progress in develo** our quadrupedal robot capable of multi-modal locomotion (legged and aerial locomotion), the Husky Carbon. We report the mechanical and electrical components utilized in our robot, in addition to the simulation and experimentation done to achieve our goal in develo** a versatile multi-modal robotic platform.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
Automatic Segmentation of Coronal Holes in Solar Images and Solar Prediction Map Classification
Authors:
Venkatesh Jatla
Abstract:
Solar image analysis relies on the detection of coronal holes for predicting disruptions to earth's magnetic field. The coronal holes act as sources of solar wind that can reach the earth. Thus, coronal holes are used in physical models for predicting the evolution of solar wind and its potential for interfering with the earth's magnetic field. Due to inherent uncertainties in the physical models,…
▽ More
Solar image analysis relies on the detection of coronal holes for predicting disruptions to earth's magnetic field. The coronal holes act as sources of solar wind that can reach the earth. Thus, coronal holes are used in physical models for predicting the evolution of solar wind and its potential for interfering with the earth's magnetic field. Due to inherent uncertainties in the physical models, there is a need for a classification system that can be used to select the physical models that best match the observed coronal holes.
The physical model classification problem is decomposed into three subproblems. First, he thesis develops a method for coronal hole segmentation. Second, the thesis develops methods for matching coronal holes from different maps. Third, based on the matching results, the thesis develops a physical map classification system.
A level-set segmentation method is used for detecting coronal holes that are observed in extreme ultra-violet images (EUVI) and magnetic field images. For validating the segmentation approach, two independent manual segmentations were combined to produce 46 consensus maps. Overall, the level-set segmentation approach produces significant improvements over current approaches.
Physical map classification is based on coronal hole matching between the physical maps and (i) the consensus maps (semi-automated), or (ii) the segmented maps (fully-automated). Based on the matching results, the system uses area differences,shortest distances between matched clusters, number and areas of new and missing coronal hole clusters to classify each map. The results indicate that the automated segmentation and classification system performs better than individual humans.
△ Less
Submitted 20 July, 2022;
originally announced July 2022.
-
Graph-based Multi-View Fusion and Local Adaptation: Mitigating Within-Household Confusability for Speaker Identification
Authors:
Long Chen,
Yixiong Meng,
Venkatesh Ravichandran,
Andreas Stolcke
Abstract:
Speaker identification (SID) in the household scenario (e.g., for smart speakers) is an important but challenging problem due to limited number of labeled (enrollment) utterances, confusable voices, and demographic imbalances. Conventional speaker recognition systems generalize from a large random sample of speakers, causing the recognition to underperform for households drawn from specific cohort…
▽ More
Speaker identification (SID) in the household scenario (e.g., for smart speakers) is an important but challenging problem due to limited number of labeled (enrollment) utterances, confusable voices, and demographic imbalances. Conventional speaker recognition systems generalize from a large random sample of speakers, causing the recognition to underperform for households drawn from specific cohorts or otherwise exhibiting high confusability. In this work, we propose a graph-based semi-supervised learning approach to improve household-level SID accuracy and robustness with locally adapted graph normalization and multi-signal fusion with multi-view graphs. Unlike other work on household SID, fairness, and signal fusion, this work focuses on speaker label inference (scoring) and provides a simple solution to realize household-specific adaptation and multi-signal fusion without tuning the embeddings or training a fusion network. Experiments on the VoxCeleb dataset demonstrate that our approach consistently improves the performance across households with different customer cohorts and degrees of confusability.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
CPES-QSM: A Quantitative Method Towards the Secure Operation of Cyber-Physical Energy Systems
Authors:
Juan Ospina,
Venkatesh Venkataramanan,
Charalambos Konstantinou
Abstract:
Power systems are evolving into cyber-physical energy systems (CPES) due to the integration of modern communication and Internet-of-Things (IoT) devices. CPES security evaluation is challenging since the physical and cyber layers are often not considered holistically. Existing literature focuses on only optimizing the operation of either the physical or cyber layer while ignoring the interactions…
▽ More
Power systems are evolving into cyber-physical energy systems (CPES) due to the integration of modern communication and Internet-of-Things (IoT) devices. CPES security evaluation is challenging since the physical and cyber layers are often not considered holistically. Existing literature focuses on only optimizing the operation of either the physical or cyber layer while ignoring the interactions between them. This paper proposes a metric, the Cyber-Physical Energy System Quantitative Security Metric (CPES-QSM), that quantifies the interaction between the cyber and physical layers across three domains: electrical, cyber-risk, and network topology. A method for incorporating the proposed cyber-metric into operational decisions is also proposed by formulating a cyber-constrained AC optimal power flow (C-ACOPF) that considers the status of all the CPES layers. The cyber-constrained ACOPF considers the vulnerabilities of physical and cyber networks by incorporating factors such as voltage stability, contingencies, graph-theory, and IoT cyber risks, while using a multi-criteria decision-making technique. Simulation studies are conducted using standard IEEE test systems to evaluate the effectiveness of the proposed metric and the C-ACOPF formulation.
△ Less
Submitted 26 September, 2022; v1 submitted 7 June, 2022;
originally announced June 2022.
-
An Improved Adaptive Smo for Speed Estimation of Sensorless Dsfoc Induction Motor Drives and Stability Analysis using Lyapunov Theorem at Low Frequencies
Authors:
Appalabathula Venkatesh
Abstract:
In this paper, An Improved Adaptive Sliding Mode Observer (ASMO) is proposed to a Sensorless DSFOC Induction Motor Drives and their stability is analyzed. ASMO is used to estimate the Rotor Speed, Rotor Resistance, Flux, Stator and Rotor currents and the developed electromagnetic Torques.To improve the robustness and accuracy of an adaptive SMO during very low frequency operation, the sliding mode…
▽ More
In this paper, An Improved Adaptive Sliding Mode Observer (ASMO) is proposed to a Sensorless DSFOC Induction Motor Drives and their stability is analyzed. ASMO is used to estimate the Rotor Speed, Rotor Resistance, Flux, Stator and Rotor currents and the developed electromagnetic Torques.To improve the robustness and accuracy of an adaptive SMO during very low frequency operation, the sliding mode flux observer(SMFO) uses independent gains as the correction terms. The gains of current and rotor flux SMOs are designed using Lyapunov stability theory to guarantee the stability and fast convergence of the estimated variables. In this paper concentrated on Simulink Blocks and their graphs are analyzed with the help of mathematical approach. Also, comparison of results with the basic conventional controllers are done and the results proved that the proposed ASMO method shows excellent Transient and Steady state speed estimation by the Adaptive Estimators, particularly at low frequencies.
△ Less
Submitted 18 May, 2022;
originally announced May 2022.
-
A High Capacity Preamble Sequence for Random Access in Beyond 5G Networks: Design and Analysis
Authors:
Sagar Pawar,
Lokesh Bommisetty,
T. G. Venkatesh
Abstract:
The widely used Zadoff-Chu sequence (ZC sequence) for random access preamble in 5G has limitations in terms of the total number of preambles generated, forcing the reuse of preambles. Hence, the probability of collision of preambles of UEs increase, resulting in the failure of random access procedure. To truly qualify beyond 5G networks as green technology, the preamble capacity should be increase…
▽ More
The widely used Zadoff-Chu sequence (ZC sequence) for random access preamble in 5G has limitations in terms of the total number of preambles generated, forcing the reuse of preambles. Hence, the probability of collision of preambles of UEs increase, resulting in the failure of random access procedure. To truly qualify beyond 5G networks as green technology, the preamble capacity should be increased without sacrificing energy efficiency. In this paper, we propose a new candidate preamble sequence called $mALL$ sequence using the concept of cover sequences to achieve higher preamble capacity without degrading the power efficiency and hence minimizing device's carbon footprint. We compare the performance of $mALL$ sequence with Zadoff-Chu sequence and other sequences in the literature, such as $mZC$ and $aZC$ sequences. We evaluate the performance of the preamble sequences in terms of periodic correlation, detection probability and the effect of diversity combining. Also, this paper explores the Peak to Average Power Ratio (PAPR) and Cubic Metric(CM) for these sequences, as these are essential parameters to evaluate energy efficiency. We show that the preamble capacity of the proposed $mALL$ sequence is $10^{4}$ times higher than that of legacy ZC sequence without any deterioration in the detection performance.
△ Less
Submitted 10 April, 2022;
originally announced April 2022.
-
VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices
Authors:
Venkatesh S. Kadandale,
Juan F. Montesinos,
Gloria Haro
Abstract:
In this paper, we address the problem of lip-voice synchronisation in videos containing human face and voice. Our approach is based on determining if the lips motion and the voice in a video are synchronised or not, depending on their audio-visual correspondence score. We propose an audio-visual cross-modal transformer-based model that outperforms several baseline models in the audio-visual synchr…
▽ More
In this paper, we address the problem of lip-voice synchronisation in videos containing human face and voice. Our approach is based on determining if the lips motion and the voice in a video are synchronised or not, depending on their audio-visual correspondence score. We propose an audio-visual cross-modal transformer-based model that outperforms several baseline models in the audio-visual synchronisation task on the standard lip-reading speech benchmark dataset LRS2. While the existing methods focus mainly on lip synchronisation in speech videos, we also consider the special case of the singing voice. The singing voice is a more challenging use case for synchronisation due to sustained vowel sounds. We also investigate the relevance of lip synchronisation models trained on speech datasets in the context of singing voice. Finally, we use the frozen visual features learned by our lip synchronisation model in the singing voice separation task to outperform a baseline audio-visual model which was trained end-to-end. The demos, source code, and the pre-trained models are available on https://ipcv.github.io/VocaLiST/
△ Less
Submitted 30 June, 2022; v1 submitted 5 April, 2022;
originally announced April 2022.
-
VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer
Authors:
Juan F. Montesinos,
Venkatesh S. Kadandale,
Gloria Haro
Abstract:
This paper presents an audio-visual approach for voice separation which produces state-of-the-art results at a low latency in two scenarios: speech and singing voice. The model is based on a two-stage network. Motion cues are obtained with a lightweight graph convolutional network that processes face landmarks. Then, both audio and motion features are fed to an audio-visual transformer which produ…
▽ More
This paper presents an audio-visual approach for voice separation which produces state-of-the-art results at a low latency in two scenarios: speech and singing voice. The model is based on a two-stage network. Motion cues are obtained with a lightweight graph convolutional network that processes face landmarks. Then, both audio and motion features are fed to an audio-visual transformer which produces a fairly good estimation of the isolated target source. In a second stage, the predominant voice is enhanced with an audio-only network. We present different ablation studies and comparison to state-of-the-art methods. Finally, we explore the transferability of models trained for speech separation in the task of singing voice separation. The demos, code, and weights are available in https://ipcv.github.io/VoViT/
△ Less
Submitted 19 July, 2022; v1 submitted 8 March, 2022;
originally announced March 2022.