-
Creating a Lens of Chinese Culture: A Multimodal Dataset for Chinese Pun Rebus Art Understanding
Authors:
Tuo Zhang,
Tiantian Feng,
Yibin Ni,
Mengqin Cao,
Ruying Liu,
Katharine Butler,
Yanjun Weng,
Mi Zhang,
Shrikanth S. Narayanan,
Salman Avestimehr
Abstract:
Large vision-language models (VLMs) have demonstrated remarkable abilities in understanding everyday content. However, their performance in the domain of art, particularly culturally rich art forms, remains less explored. As a pearl of human wisdom and creativity, art encapsulates complex cultural narratives and symbolism. In this paper, we offer the Pun Rebus Art Dataset, a multimodal dataset for…
▽ More
Large vision-language models (VLMs) have demonstrated remarkable abilities in understanding everyday content. However, their performance in the domain of art, particularly culturally rich art forms, remains less explored. As a pearl of human wisdom and creativity, art encapsulates complex cultural narratives and symbolism. In this paper, we offer the Pun Rebus Art Dataset, a multimodal dataset for art understanding deeply rooted in traditional Chinese culture. We focus on three primary tasks: identifying salient visual elements, matching elements with their symbolic meanings, and explanations for the conveyed messages. Our evaluation reveals that state-of-the-art VLMs struggle with these tasks, often providing biased and hallucinated explanations and showing limited improvement through in-context learning. By releasing the Pun Rebus Art Dataset, we aim to facilitate the development of VLMs that can better understand and interpret culturally specific content, promoting greater inclusiveness beyond English-based corpora.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
TI-ASU: Toward Robust Automatic Speech Understanding through Text-to-speech Imputation Against Missing Speech Modality
Authors:
Tiantian Feng,
Xuan Shi,
Rahul Gupta,
Shrikanth S. Narayanan
Abstract:
Automatic Speech Understanding (ASU) aims at human-like speech interpretation, providing nuanced intent, emotion, sentiment, and content understanding from speech and language (text) content conveyed in speech. Typically, training a robust ASU model relies heavily on acquiring large-scale, high-quality speech and associated transcriptions. However, it is often challenging to collect or use speech…
▽ More
Automatic Speech Understanding (ASU) aims at human-like speech interpretation, providing nuanced intent, emotion, sentiment, and content understanding from speech and language (text) content conveyed in speech. Typically, training a robust ASU model relies heavily on acquiring large-scale, high-quality speech and associated transcriptions. However, it is often challenging to collect or use speech data for training ASU due to concerns such as privacy. To approach this setting of enabling ASU when speech (audio) modality is missing, we propose TI-ASU, using a pre-trained text-to-speech model to impute the missing speech. We report extensive experiments evaluating TI-ASU on various missing scales, both multi- and single-modality settings, and the use of LLMs. Our findings show that TI-ASU yields substantial benefits to improve ASU in scenarios where even up to 95% of training speech is missing. Moreover, we show that TI-ASU is adaptive to dropout training, improving model robustness in addressing missing speech during inference.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
Optimum Beamforming and Grating Lobe Mitigation for Intelligent Reflecting Surfaces
Authors:
Sai Sanjay Narayanan,
Uday K Khankhoje,
Radha Krishna Ganti
Abstract:
Ensuring adequate wireless coverage in upcoming communication technologies such as 6G is expected to be challenging. This is because user demands of higher datarate require an increase in carrier frequencies, which in turn reduce the diffraction effects (and hence coverage) in complex multipath environments. Intelligent reflecting surfaces have been proposed as a way of restoring coverage by adapt…
▽ More
Ensuring adequate wireless coverage in upcoming communication technologies such as 6G is expected to be challenging. This is because user demands of higher datarate require an increase in carrier frequencies, which in turn reduce the diffraction effects (and hence coverage) in complex multipath environments. Intelligent reflecting surfaces have been proposed as a way of restoring coverage by adaptively reflecting incoming electromagnetic waves in desired directions. This is accomplished by judiciously adding extra phases at different points on the surface. In practice, these extra phases are only available in discrete quantities due to hardware constraints. Computing these extra phases is computationally challenging when they can only be picked from a discrete distribution, and existing approaches for solving this problem were either heuristic or based on evolutionary algorithms. We solve this problem by proposing fast algorithms with provably optimal solutions. Our algorithms have linear complexity, and are presented with rigorous proofs for their optimality. We show that the proposed algorithms exhibit better performance. We analyze situations when unwanted grating lobes arise in the radiation pattern, and discuss mitigation strategies, such as the use of triangular lattices and prephasing techniques, to eliminate them. We also demonstrate how our algorithms can leverage these techniques to deliver optimum beamforming solutions.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Synthesizing Controller for Safe Navigation using Control Density Function
Authors:
Joseph Moyalan,
Sriram S. K. S Narayanan,
Andrew Zheng,
Umesh Vaidya
Abstract:
We consider the problem of navigating a nonlinear dynamical system from some initial set to some target set while avoiding collision with an unsafe set. We extend the concept of density function to control density function (CDF) for solving navigation problems with safety constraints. The occupancy-based interpretation of the measure associated with the density function is instrumental in imposing…
▽ More
We consider the problem of navigating a nonlinear dynamical system from some initial set to some target set while avoiding collision with an unsafe set. We extend the concept of density function to control density function (CDF) for solving navigation problems with safety constraints. The occupancy-based interpretation of the measure associated with the density function is instrumental in imposing the safety constraints. The navigation problem with safety constraints is formulated as a quadratic program (QP) using CDF. The existing approach using the control barrier function (CBF) also formulates the navigation problem with safety constraints as QP. One of the main advantages of the proposed QP using CDF compared to QP formulated using CBF is that both the convergence/stability and safety can be combined and imposed using the CDF. Simulation results involving the Duffing oscillator and safe navigation of Dubin car models are provided to verify the main findings of the paper.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
The NeurIPS 2023 Machine Learning for Audio Workshop: Affective Audio Benchmarks and Novel Data
Authors:
Alice Baird,
Rachel Manzelli,
Panagiotis Tzirakis,
Chris Gagne,
Haoqi Li,
Sadie Allen,
Sander Dieleman,
Brian Kulis,
Shrikanth S. Narayanan,
Alan Cowen
Abstract:
The NeurIPS 2023 Machine Learning for Audio Workshop brings together machine learning (ML) experts from various audio domains. There are several valuable audio-driven ML tasks, from speech emotion recognition to audio event detection, but the community is sparse compared to other ML areas, e.g., computer vision or natural language processing. A major limitation with audio is the available data; wi…
▽ More
The NeurIPS 2023 Machine Learning for Audio Workshop brings together machine learning (ML) experts from various audio domains. There are several valuable audio-driven ML tasks, from speech emotion recognition to audio event detection, but the community is sparse compared to other ML areas, e.g., computer vision or natural language processing. A major limitation with audio is the available data; with audio being a time-dependent modality, high-quality data collection is time-consuming and costly, making it challenging for academic groups to apply their often state-of-the-art strategies to a larger, more generalizable dataset. In this short white paper, to encourage researchers with limited access to large-datasets, the organizers first outline several open-source datasets that are available to the community, and for the duration of the workshop are making several propriety datasets available. Namely, three vocal datasets, Hume-Prosody, Hume-VocalBurst, an acted emotional speech dataset Modulate-Sonata, and an in-game streamer dataset Modulate-Stream. We outline the current baselines on these datasets but encourage researchers from across audio to utilize them outside of the initial baseline tasks.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Safe Motion Planning for Quadruped Robots Using Density Functions
Authors:
Sriram S. K. S Narayanan,
Andrew Zheng,
Umesh Vaidya
Abstract:
This paper presents a motion planning algorithm for quadruped locomotion based on density functions. We decompose the locomotion problem into a high-level density planner and a model predictive controller (MPC). Due to density functions having a physical interpretation through the notion of occupancy, it is intuitive to represent the environment with safety constraints. Hence, there is an ease of…
▽ More
This paper presents a motion planning algorithm for quadruped locomotion based on density functions. We decompose the locomotion problem into a high-level density planner and a model predictive controller (MPC). Due to density functions having a physical interpretation through the notion of occupancy, it is intuitive to represent the environment with safety constraints. Hence, there is an ease of use to constructing the planning problem with density. The proposed method uses a simplified model of the robot into an integrator system, where the high-level plan is in a feedback form formulated through an analytically constructed density function. We then use the MPC to optimize the reference trajectory, in which a low-level PID controller is used to obtain the torque level control. The overall framework is implemented in simulation, demonstrating our feedback density planner for legged locomotion. The implementation of work is available at \url{https://github.com/AndrewZheng-1011/legged_planner}
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
FedAIoT: A Federated Learning Benchmark for Artificial Intelligence of Things
Authors:
Samiul Alam,
Tuo Zhang,
Tiantian Feng,
Hui Shen,
Zhichao Cao,
Dong Zhao,
JeongGil Ko,
Kiran Somasundaram,
Shrikanth S. Narayanan,
Salman Avestimehr,
Mi Zhang
Abstract:
There is a significant relevance of federated learning (FL) in the realm of Artificial Intelligence of Things (AIoT). However, most existing FL works do not use datasets collected from authentic IoT devices and thus do not capture unique modalities and inherent challenges of IoT data. To fill this critical gap, in this work, we introduce FedAIoT, an FL benchmark for AIoT. FedAIoT includes eight da…
▽ More
There is a significant relevance of federated learning (FL) in the realm of Artificial Intelligence of Things (AIoT). However, most existing FL works do not use datasets collected from authentic IoT devices and thus do not capture unique modalities and inherent challenges of IoT data. To fill this critical gap, in this work, we introduce FedAIoT, an FL benchmark for AIoT. FedAIoT includes eight datasets collected from a wide range of IoT devices. These datasets cover unique IoT modalities and target representative applications of AIoT. FedAIoT also includes a unified end-to-end FL framework for AIoT that simplifies benchmarking the performance of the datasets. Our benchmark results shed light on the opportunities and challenges of FL for AIoT. We hope FedAIoT could serve as an invaluable resource to foster advancements in the important field of FL for AIoT. The repository of FedAIoT is maintained at https://github.com/AIoT-MLSys-Lab/FedAIoT.
△ Less
Submitted 19 June, 2024; v1 submitted 29 September, 2023;
originally announced October 2023.
-
Path-Integral Formula for Computing Koopman Eigenfunctions
Authors:
Shankar A. Deka,
Sriram S. K. S. Narayanan,
Umesh Vaidya
Abstract:
The paper is about the computation of the principal spectrum of the Koopman operator (i.e., eigenvalues and eigenfunctions). The principal eigenfunctions of the Koopman operator are the ones with the corresponding eigenvalues equal to the eigenvalues of the linearization of the nonlinear system at an equilibrium point. The main contribution of this paper is to provide a novel approach for computin…
▽ More
The paper is about the computation of the principal spectrum of the Koopman operator (i.e., eigenvalues and eigenfunctions). The principal eigenfunctions of the Koopman operator are the ones with the corresponding eigenvalues equal to the eigenvalues of the linearization of the nonlinear system at an equilibrium point. The main contribution of this paper is to provide a novel approach for computing the principal eigenfunctions using a path-integral formula. Furthermore, we provide conditions based on the stability property of the dynamical system and the eigenvalues of the linearization towards computing the principal eigenfunction using the path-integral formula. Further, we provide a Deep Neural Network framework that utilizes our proposed path-integral approach for eigenfunction computation in high-dimension systems. Finally, we present simulation results for the computation of principal eigenfunction and demonstrate their application for determining the stable and unstable manifolds and constructing the Lyapunov function.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
Safe Navigation using Density Functions
Authors:
Andrew Zheng,
Sriram S. K. S. Narayanan,
Umesh Vaidya
Abstract:
This paper presents a novel approach for safe control synthesis using the dual formulation of the navigation problem. The main contribution of this paper is in the analytical construction of density functions for almost everywhere navigation with safety constraints. In contrast to the existing approaches, where density functions are used for the analysis of navigation problems, we use density func…
▽ More
This paper presents a novel approach for safe control synthesis using the dual formulation of the navigation problem. The main contribution of this paper is in the analytical construction of density functions for almost everywhere navigation with safety constraints. In contrast to the existing approaches, where density functions are used for the analysis of navigation problems, we use density functions for the synthesis of safe controllers. We provide convergence proof using the proposed density functions for navigation with safety. Further, we use these density functions to design feedback controllers capable of navigating in cluttered environments and high-dimensional configuration spaces. The proposed analytical construction of density functions overcomes the problem associated with navigation functions, which are known to exist but challenging to construct, and potential functions, which suffer from local minima. Application of the developed framework is demonstrated on simple integrator dynamics and fully actuated robotic systems.
△ Less
Submitted 16 October, 2023; v1 submitted 27 June, 2023;
originally announced June 2023.
-
GPT-FL: Generative Pre-trained Model-Assisted Federated Learning
Authors:
Tuo Zhang,
Tiantian Feng,
Samiul Alam,
Dimitrios Dimitriadis,
Sunwoo Lee,
Mi Zhang,
Shrikanth S. Narayanan,
Salman Avestimehr
Abstract:
In this work, we propose GPT-FL, a generative pre-trained model-assisted federated learning (FL) framework. At its core, GPT-FL leverages generative pre-trained models to generate diversified synthetic data. These generated data are used to train a downstream model on the server, which is then fine-tuned with private client data under the standard FL framework. We show that GPT-FL consistently out…
▽ More
In this work, we propose GPT-FL, a generative pre-trained model-assisted federated learning (FL) framework. At its core, GPT-FL leverages generative pre-trained models to generate diversified synthetic data. These generated data are used to train a downstream model on the server, which is then fine-tuned with private client data under the standard FL framework. We show that GPT-FL consistently outperforms state-of-the-art FL methods in terms of model test accuracy, communication efficiency, and client sampling efficiency. Through comprehensive ablation analysis across various data modalities, we discover that the downstream model generated by synthetic data plays a crucial role in controlling the direction of gradient diversity during FL training, which enhances convergence speed and contributes to the notable accuracy boost observed with GPT-FL. Also, regardless of whether the target data falls within or outside the domain of the pre-trained generative model, GPT-FL consistently achieves significant performance gains, surpassing the results obtained by models trained solely with FL or synthetic data. The code is available at https://github.com/AvestimehrResearchGroup/GPT-FL.
△ Less
Submitted 17 June, 2024; v1 submitted 3 June, 2023;
originally announced June 2023.
-
SE(3) Koopman-MPC: Data-driven Learning and Control of Quadrotor UAVs
Authors:
Sriram S. K. S. Narayanan,
Duvan Tellez-Castro,
Sarang Sutavani,
Umesh Vaidya
Abstract:
In this paper, we propose a novel data-driven approach for learning and control of quadrotor UAVs based on the Koopman operator and extended dynamic mode decomposition (EDMD). Building observables for EDMD based on conventional methods like Euler angles (to represent orientation) is known to involve singularities. To address this issue, we employ a set of physics-informed observables based on the…
▽ More
In this paper, we propose a novel data-driven approach for learning and control of quadrotor UAVs based on the Koopman operator and extended dynamic mode decomposition (EDMD). Building observables for EDMD based on conventional methods like Euler angles (to represent orientation) is known to involve singularities. To address this issue, we employ a set of physics-informed observables based on the underlying topology of the nonlinear system. We use rotation matrices to directly represent the orientation dynamics and obtain a lifted linear representation of the nonlinear quadrotor dynamics in the SE(3) manifold. This EDMD model leads to accurate prediction and can be generalized to several validation sets. Further, we design a linear model predictive controller (MPC) based on the proposed EDMD model to track agile reference trajectories. Simulation results show that the proposed MPC controller can run as fast as 100 Hz and is able to track arbitrary reference trajectories with good accuracy. Implementation details can be found in \url{https://github.com/sriram-2502/KoopmanMPC_Quadrotor}.
△ Less
Submitted 16 October, 2023; v1 submitted 5 May, 2023;
originally announced May 2023.
-
Off-Road Navigation of Legged Robots Using Linear Transfer Operators
Authors:
Joseph Moyalan,
Andrew Zheng,
Sriram S. K. S Narayanan,
Umesh Vaidya
Abstract:
This paper presents the implementation of off-road navigation on legged robots using convex optimization through linear transfer operators. Given a traversability measure that captures the off-road environment, we lift the navigation problem into the density space using the Perron-Frobenius (P-F) operator. This allows the problem formulation to be represented as a convex optimization. Due to the o…
▽ More
This paper presents the implementation of off-road navigation on legged robots using convex optimization through linear transfer operators. Given a traversability measure that captures the off-road environment, we lift the navigation problem into the density space using the Perron-Frobenius (P-F) operator. This allows the problem formulation to be represented as a convex optimization. Due to the operator acting on an infinite-dimensional density space, we use data collected from the terrain to get a finite-dimension approximation of the convex optimization. Results of the optimal trajectory for off-road navigation are compared with a standard iterative planner, where we show how our convex optimization generates a more traversable path for the legged robot compared to the suboptimal iterative planner.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
Optimal Control for Quadruped Locomotion using LTV MPC
Authors:
Andrew Zheng,
Sriram S. K. S Narayanan
Abstract:
This paper presents a state-of-the-art optimal controller for quadruped locomotion. The robot dynamics is represented using a single rigid body (SRB) model. A linear time-varying model predictive controller (LTV MPC) is proposed by using linearization schemes. Simulation results show that the LTV MPC can execute various gaits, such as trot and crawl, and is capable of tracking desired reference tr…
▽ More
This paper presents a state-of-the-art optimal controller for quadruped locomotion. The robot dynamics is represented using a single rigid body (SRB) model. A linear time-varying model predictive controller (LTV MPC) is proposed by using linearization schemes. Simulation results show that the LTV MPC can execute various gaits, such as trot and crawl, and is capable of tracking desired reference trajectories even under unknown external disturbances. The LTV MPC is implemented as a quadratic program using qpOASES through the CasADi interface at 50 Hz. The proposed MPC can reach up to 1 m/s top speed with an acceleration of 0.5 m/s2 executing a trot gait. The implementation is available at https:// github.com/AndrewZheng-1011/Quad_ConvexMPC
△ Less
Submitted 16 October, 2023; v1 submitted 9 December, 2022;
originally announced December 2022.
-
FedAudio: A Federated Learning Benchmark for Audio Tasks
Authors:
Tuo Zhang,
Tiantian Feng,
Samiul Alam,
Sunwoo Lee,
Mi Zhang,
Shrikanth S. Narayanan,
Salman Avestimehr
Abstract:
Federated learning (FL) has gained substantial attention in recent years due to the data privacy concerns related to the pervasiveness of consumer devices that continuously collect data from users. While a number of FL benchmarks have been developed to facilitate FL research, none of them include audio data and audio-related tasks. In this paper, we fill this critical gap by introducing a new FL b…
▽ More
Federated learning (FL) has gained substantial attention in recent years due to the data privacy concerns related to the pervasiveness of consumer devices that continuously collect data from users. While a number of FL benchmarks have been developed to facilitate FL research, none of them include audio data and audio-related tasks. In this paper, we fill this critical gap by introducing a new FL benchmark for audio tasks which we refer to as FedAudio. FedAudio includes four representative and commonly used audio datasets from three important audio tasks that are well aligned with FL use cases. In particular, a unique contribution of FedAudio is the introduction of data noises and label errors to the datasets to emulate challenges when deploying FL systems in real-world settings. FedAudio also includes the benchmark results of the datasets and a PyTorch library with the objective of facilitating researchers to fairly compare their algorithms. We hope FedAudio could act as a catalyst to inspire new FL research for audio tasks and thus benefit the acoustic and speech research community. The datasets and benchmark results can be accessed at https://github.com/zhang-tuo-pdf/FedAudio.
△ Less
Submitted 8 February, 2023; v1 submitted 27 October, 2022;
originally announced October 2022.
-
Attribute Inference Attack of Speech Emotion Recognition in Federated Learning Settings
Authors:
Tiantian Feng,
Hanieh Hashemi,
Rajat Hebbar,
Murali Annavaram,
Shrikanth S. Narayanan
Abstract:
Speech emotion recognition (SER) processes speech signals to detect and characterize expressed perceived emotions. Many SER application systems often acquire and transmit speech data collected at the client-side to remote cloud platforms for inference and decision making. However, speech data carry rich information not only about emotions conveyed in vocal expressions, but also other sensitive dem…
▽ More
Speech emotion recognition (SER) processes speech signals to detect and characterize expressed perceived emotions. Many SER application systems often acquire and transmit speech data collected at the client-side to remote cloud platforms for inference and decision making. However, speech data carry rich information not only about emotions conveyed in vocal expressions, but also other sensitive demographic traits such as gender, age and language background. Consequently, it is desirable for SER systems to have the ability to classify emotion constructs while preventing unintended/improper inferences of sensitive and demographic information. Federated learning (FL) is a distributed machine learning paradigm that coordinates clients to train a model collaboratively without sharing their local data. This training approach appears secure and can improve privacy for SER. However, recent works have demonstrated that FL approaches are still vulnerable to various privacy attacks like reconstruction attacks and membership inference attacks. Although most of these have focused on computer vision applications, such information leakages exist in the SER systems trained using the FL technique. To assess the information leakage of SER systems trained using FL, we propose an attribute inference attack framework that infers sensitive attribute information of the clients from shared gradients or model parameters, corresponding to the FedSGD and the FedAvg training algorithms, respectively. As a use case, we empirically evaluate our approach for predicting the client's gender information using three SER benchmark datasets: IEMOCAP, CREMA-D, and MSP-Improv. We show that the attribute inference attack is achievable for SER systems trained using FL. We further identify that most information leakage possibly comes from the first layer in the SER model.
△ Less
Submitted 22 December, 2022; v1 submitted 26 December, 2021;
originally announced December 2021.
-
A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images
Authors:
Yongwan Lim,
Asterios Toutios,
Yannick Bliesener,
Ye Tian,
Sajan Goud Lingala,
Colin Vaz,
Tanner Sorensen,
Miran Oh,
Sarah Harper,
Weiyi Chen,
Yoonjeong Lee,
Johannes Töger,
Mairym Lloréns Montesserin,
Caitlin Smith,
Bianca Godinez,
Louis Goldstein,
Dani Byrd,
Krishna S. Nayak,
Shrikanth S. Narayanan
Abstract:
Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving articulators…
▽ More
Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving articulators and dynamic airway sha** during speech demands high spatio-temporal resolution and robust reconstruction methods. Further, while reconstructed images have been published, to-date there is no open dataset providing raw multi-coil RT-MRI data from an optimized speech production experimental setup. Such datasets could enable new and improved methods for dynamic image reconstruction, artifact correction, feature extraction, and direct extraction of linguistically-relevant biomarkers. The present dataset offers a unique corpus of 2D sagittal-view RT-MRI videos along with synchronized audio for 75 subjects performing linguistically motivated speech tasks, alongside the corresponding first-ever public domain raw RT-MRI data. The dataset also includes 3D volumetric vocal tract MRI during sustained speech sounds and high-resolution static anatomical T2-weighted upper airway MRI for each subject.
△ Less
Submitted 15 February, 2021;
originally announced February 2021.
-
Attention-gated convolutional neural networks for off-resonance correction of spiral real-time MRI
Authors:
Yongwan Lim,
Shrikanth S. Narayanan,
Krishna S. Nayak
Abstract:
Spiral acquisitions are preferred in real-time MRI because of their efficiency, which has made it possible to capture vocal tract dynamics during natural speech. A fundamental limitation of spirals is blurring and signal loss due to off-resonance, which degrades image quality at air-tissue boundaries. Here, we present a new CNN-based off-resonance correction method that incorporates an attention-g…
▽ More
Spiral acquisitions are preferred in real-time MRI because of their efficiency, which has made it possible to capture vocal tract dynamics during natural speech. A fundamental limitation of spirals is blurring and signal loss due to off-resonance, which degrades image quality at air-tissue boundaries. Here, we present a new CNN-based off-resonance correction method that incorporates an attention-gate mechanism. This leverages spatial and channel relationships of filtered outputs and improves the expressiveness of the networks. We demonstrate improved performance with the attention-gate, on 1.5 Tesla spiral speech RT-MRI, compared to existing off-resonance correction methods.
△ Less
Submitted 14 February, 2021;
originally announced February 2021.
-
Deblurring for Spiral Real-Time MRI Using Convolutional Neural Networks
Authors:
Yongwan Lim,
Shrikanth S Narayanan,
Krishna S Nayak
Abstract:
Spiral acquisitions are preferred in real-time MRI because of their time efficiency. A fundamental limitation of spirals is image blurring due to off-resonance, which degrades image quality significantly at air-tissue boundaries. Here, we demonstrate a simple CNN-based deblurring method for spiral real-time MRI of human speech production. We show the CNN-based deblurring is capable of restoring bl…
▽ More
Spiral acquisitions are preferred in real-time MRI because of their time efficiency. A fundamental limitation of spirals is image blurring due to off-resonance, which degrades image quality significantly at air-tissue boundaries. Here, we demonstrate a simple CNN-based deblurring method for spiral real-time MRI of human speech production. We show the CNN-based deblurring is capable of restoring blurred vocal tract tissue boundaries, without a need for exam-specific field maps. Deblurring performance is superior to a current auto-calibrated method, and slightly inferior to ideal reconstruction with perfect knowledge of the field maps.
△ Less
Submitted 29 May, 2020; v1 submitted 26 January, 2020;
originally announced January 2020.
-
Normalization Before Shaking Toward Learning Symmetrically Distributed Representation Without Margin in Speech Emotion Recognition
Authors:
Che-Wei Huang,
Shrikanth S. Narayanan
Abstract:
Regularization is crucial to the success of many practical deep learning models, in particular in a more often than not scenario where there are only a few to a moderate number of accessible training samples. In addition to weight decay, data augmentation and dropout, regularization based on multi-branch architectures, such as Shake-Shake regularization, has been proven successful in many applicat…
▽ More
Regularization is crucial to the success of many practical deep learning models, in particular in a more often than not scenario where there are only a few to a moderate number of accessible training samples. In addition to weight decay, data augmentation and dropout, regularization based on multi-branch architectures, such as Shake-Shake regularization, has been proven successful in many applications and attracted more and more attention. However, beyond model-based representation augmentation, it is unclear how Shake-Shake regularization helps to provide further improvement on classification tasks, let alone the baffling interaction between batch normalization and shaking. In this work, we present our investigation on Shake-Shake regularization, drawing connections to the vicinal risk minimization principle and discriminative feature learning in verification tasks. Furthermore, we identify a strong resemblance between batch normalized residual blocks and batch normalized recurrent neural networks, where both of them share a similar convergence behavior, which could be mitigated by a proper initialization of batch normalization. Based on the findings, our experiments on speech emotion recognition demonstrate simultaneously an improvement on the classification accuracy and a reduction on the generalization gap both with statistical significance.
△ Less
Submitted 5 August, 2018; v1 submitted 2 August, 2018;
originally announced August 2018.
-
Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition
Authors:
Che-Wei Huang,
Shrikanth. S. Narayanan
Abstract:
Deep convolutional neural networks are being actively investigated in a wide range of speech and audio processing applications including speech recognition, audio event detection and computational paralinguistics, owing to their ability to reduce factors of variations, for learning from speech. However, studies have suggested to favor a certain type of convolutional operations when building a deep…
▽ More
Deep convolutional neural networks are being actively investigated in a wide range of speech and audio processing applications including speech recognition, audio event detection and computational paralinguistics, owing to their ability to reduce factors of variations, for learning from speech. However, studies have suggested to favor a certain type of convolutional operations when building a deep convolutional neural network for speech applications although there has been promising results using different types of convolutional operations. In this work, we study four types of convolutional operations on different input features for speech emotion recognition under noisy and clean conditions in order to derive a comprehensive understanding. Since affective behavioral information has been shown to reflect temporally varying of mental state and convolutional operation are applied locally in time, all deep neural networks share a deep recurrent sub-network architecture for further temporal modeling. We present detailed quantitative module-wise performance analysis to gain insights into information flows within the proposed architectures. In particular, we demonstrate the interplay of affective information and the other irrelevant information during the progression from one module to another. Finally we show that all of our deep neural networks provide state-of-the-art performance on the eNTERFACE'05 corpus.
△ Less
Submitted 13 January, 2018; v1 submitted 7 June, 2017;
originally announced June 2017.
-
Selection of Arginine-Rich Anti-Gold Antibodies Engineered for Plasmonic Colloid Self-Assembly
Authors:
Purvi Jain,
Anandakumar Soshee,
S Shankara Narayanan,
Jadab Sharma,
Christian Girard,
Erik Dujardin,
Clément Nizak
Abstract:
Antibodies are affinity proteins with a wide spectrum of applications in analytical and therapeutic biology. Proteins showing specific recognition for a chosen molecular target can be isolated and their encoding sequence identified in vitro from a large and diverse library by phage display selection. In this work, we show that this standard biochemical technique rapidly yields a collection of anti…
▽ More
Antibodies are affinity proteins with a wide spectrum of applications in analytical and therapeutic biology. Proteins showing specific recognition for a chosen molecular target can be isolated and their encoding sequence identified in vitro from a large and diverse library by phage display selection. In this work, we show that this standard biochemical technique rapidly yields a collection of antibody protein binders for an inorganic target of major technological importance: crystalline metallic gold surfaces. 21 distinct anti-gold antibody proteins emerged from a large random library of antibodies and were sequenced. The systematic statistical analysis of all the protein sequences reveals a strong occurrence of arginine in anti-gold antibodies, which corroborates recent molecular dynamics predictions on the crucial role of arginine in protein/gold interactions. Once tethered to small gold nanoparticles using histidine tag chemistry, the selected antibodies could drive the self-assembly of the colloids onto the surface of single crystalline gold platelets as a first step towards programmable protein-driven construction of complex plasmonic architectures. Electrodynamic simulations based on the Green Dyadic Method suggest that the antibody-driven assembly demonstrated here could be exploited to significantly modify the plasmonic modal properties of the gold platelets. Our work shows that molecular biology tools can be used to design the interaction between fully folded proteins and inorganic surfaces with potential applications in the bottom-up construction of plasmonic hybrid nanomaterials.
△ Less
Submitted 13 May, 2014;
originally announced May 2014.
-
Generalized Ambiguity Decomposition for Understanding Ensemble Diversity
Authors:
Kartik Audhkhasi,
Abhinav Sethy,
Bhuvana Ramabhadran,
Shrikanth S. Narayanan
Abstract:
Diversity or complementarity of experts in ensemble pattern recognition and information processing systems is widely-observed by researchers to be crucial for achieving performance improvement upon fusion. Understanding this link between ensemble diversity and fusion performance is thus an important research question. However, prior works have theoretically characterized ensemble diversity and hav…
▽ More
Diversity or complementarity of experts in ensemble pattern recognition and information processing systems is widely-observed by researchers to be crucial for achieving performance improvement upon fusion. Understanding this link between ensemble diversity and fusion performance is thus an important research question. However, prior works have theoretically characterized ensemble diversity and have linked it with ensemble performance in very restricted settings. We present a generalized ambiguity decomposition (GAD) theorem as a broad framework for answering these questions. The GAD theorem applies to a generic convex ensemble of experts for any arbitrary twice-differentiable loss function. It shows that the ensemble performance approximately decomposes into a difference of the average expert performance and the diversity of the ensemble. It thus provides a theoretical explanation for the empirically-observed benefit of fusing outputs from diverse classifiers and regressors. It also provides a loss function-dependent, ensemble-dependent, and data-dependent definition of diversity. We present extensions of this decomposition to common regression and classification loss functions, and report a simulation-based analysis of the diversity term and the accuracy of the decomposition. We finally present experiments on standard pattern recognition data sets which indicate the accuracy of the decomposition for real-world classification and regression problems.
△ Less
Submitted 28 December, 2013;
originally announced December 2013.