Search | arXiv e-print repository

Multimodal Modeling For Spoken Language Identification

Authors: Shikhar Bharadwaj, Min Ma, Shikhar Vashishth, Ankur Bapna, Sriram Ganapathy, Vera Axelrod, Siddharth Dalmia, Wei Han, Yu Zhang, Daan van Esch, Sandy Ritchie, Partha Talukdar, Jason Riesa

Abstract: Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance. Conventionally, it is modeled as a speech-based language identification task. Prior techniques have been constrained to a single modality; however in the case of video data there is a wealth of other metadata that may be beneficial for this task. In this work, we propose MuSeLI,… ▽ More Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance. Conventionally, it is modeled as a speech-based language identification task. Prior techniques have been constrained to a single modality; however in the case of video data there is a wealth of other metadata that may be beneficial for this task. In this work, we propose MuSeLI, a Multimodal Spoken Language Identification method, which delves into the use of various metadata sources to enhance language identification. Our study reveals that metadata such as video title, description and geographic location provide substantial information to identify the spoken language of the multimedia recording. We conduct experiments using two diverse public datasets of YouTube videos, and obtain state-of-the-art results on the language identification task. We additionally conduct an ablation study that describes the distinct contribution of each modality for language recognition. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2307.10982 [pdf, other]

MASR: Multi-label Aware Speech Representation

Authors: Anjali Raj, Shikhar Bharadwaj, Sriram Ganapathy, Min Ma, Shikhar Vashishth

Abstract: In the recent years, speech representation learning is constructed primarily as a self-supervised learning (SSL) task, using the raw audio signal alone, while ignoring the side-information that is often available for a given speech recording. In this paper, we propose MASR, a Multi-label Aware Speech Representation learning framework, which addresses the aforementioned limitations. MASR enables th… ▽ More In the recent years, speech representation learning is constructed primarily as a self-supervised learning (SSL) task, using the raw audio signal alone, while ignoring the side-information that is often available for a given speech recording. In this paper, we propose MASR, a Multi-label Aware Speech Representation learning framework, which addresses the aforementioned limitations. MASR enables the inclusion of multiple external knowledge sources to enhance the utilization of meta-data information. The external knowledge sources are incorporated in the form of sample-level pair-wise similarity matrices that are useful in a hard-mining loss. A key advantage of the MASR framework is that it can be combined with any choice of SSL method. Using MASR representations, we perform evaluations on several downstream tasks such as language identification, speech recognition and other non-semantic tasks such as speaker and emotion recognition. In these experiments, we illustrate significant performance improvements for the MASR over other established benchmarks. We perform a detailed analysis on the language identification task to provide insights on how the proposed loss function enables the representations to separate closely related languages. △ Less

Submitted 25 September, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: Accepted at ASRU 2023

arXiv:2306.04374 [pdf, other]

Label Aware Speech Representation Learning For Language Identification

Authors: Shikhar Vashishth, Shikhar Bharadwaj, Sriram Ganapathy, Ankur Bapna, Min Ma, Wei Han, Vera Axelrod, Partha Talukdar

Abstract: Speech representation learning approaches for non-semantic tasks such as language recognition have either explored supervised embedding extraction methods using a classifier model or self-supervised representation learning approaches using raw data. In this paper, we propose a novel framework of combining self-supervised representation learning with the language label information for the pre-train… ▽ More Speech representation learning approaches for non-semantic tasks such as language recognition have either explored supervised embedding extraction methods using a classifier model or self-supervised representation learning approaches using raw data. In this paper, we propose a novel framework of combining self-supervised representation learning with the language label information for the pre-training task. This framework, termed as Label Aware Speech Representation (LASR) learning, uses a triplet based objective function to incorporate language labels along with the self-supervised loss function. The speech representations are further fine-tuned for the downstream task. The language recognition experiments are performed on two public datasets - FLEURS and Dhwani. In these experiments, we illustrate that the proposed LASR framework improves over the state-of-the-art systems on language identification. We also report an analysis of the robustness of LASR approach to noisy/missing labels as well as its application to multi-lingual speech recognition tasks. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: Accepted at Interspeech 2023

arXiv:2104.00235 [pdf, ps, other]

doi 10.21437/Interspeech.2021-1339

Multilingual and code-switching ASR challenges for low resource Indian languages

Authors: Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan, Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, Karthik Sankaranarayanan, Tejaswi Seeram, Basil Abraham

Abstract: Recently, there is increasing interest in multilingual automatic speech recognition (ASR) where a speech recognition system caters to multiple low resource languages by taking advantage of low amounts of labeled corpora in multiple languages. With multilingualism becoming common in today's world, there has been increasing interest in code-switching ASR as well. In code-switching, multiple language… ▽ More Recently, there is increasing interest in multilingual automatic speech recognition (ASR) where a speech recognition system caters to multiple low resource languages by taking advantage of low amounts of labeled corpora in multiple languages. With multilingualism becoming common in today's world, there has been increasing interest in code-switching ASR as well. In code-switching, multiple languages are freely interchanged within a single sentence or between sentences. The success of low-resource multilingual and code-switching ASR often depends on the variety of languages in terms of their acoustics, linguistic characteristics as well as the amount of data available and how these are carefully considered in building the ASR system. In this challenge, we would like to focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages, namely Hindi, Marathi, Odia, Tamil, Telugu, Gujarati and Bengali. For this purpose, we provide a total of ~600 hours of transcribed speech data, comprising train and test sets, in these languages including two code-switched language pairs, Hindi-English and Bengali-English. We also provide a baseline recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of multilingual and code-switching subtasks, respectively. △ Less

Submitted 31 March, 2021; originally announced April 2021.

Comments: 6 pages

arXiv:2010.07897 [pdf]

Spatial Registration Evaluation of [18F]-MK6240 PET

Authors: James Zou, Aubrey Johnson, Jeanelle France, Srinidhi Bharadwaj, Zeljko Tomljanovic, Yaakov Stern, Adam M. Brickman, Devangere P. Devanand, Jose A. Luchsinger, William C. Kreisl, Frank A. Provenzano

Abstract: Image registration is an important preprocessing step in neuroimaging which allows for the matching of anatomical and functional information between modalities and subjects. This can be challenging if there are gross differences in image geometry or in signal intensity, such as in the case of some molecular PET radioligands, where control subjects display relative lack of signal relative to noise… ▽ More Image registration is an important preprocessing step in neuroimaging which allows for the matching of anatomical and functional information between modalities and subjects. This can be challenging if there are gross differences in image geometry or in signal intensity, such as in the case of some molecular PET radioligands, where control subjects display relative lack of signal relative to noise within intracranial regions, and may have off target binding that may be confused as other regions, and may vary depending on subject. The use of intermediary images or volumes have been shown to aide registration in such cases. To account for this phenomena within our own longitudinal aging cohort, we generated a population specific MRI and PET template from a broad distribution of 30 amyloid negative subjects. We then registered the PET image of each of these subjects, as well as a holdout set of thirty 'template-naive' subjects to their corresponding MRI images using the template image as an intermediate using three different sets of registration parameters and procedures. To evaluate the performance of both conventional registration and our method, we compared these to the registration of the attenuation CT (acquired at time of PET acquisition) to MRI as the reference. We then used our template to directly derive SUVR values without the use of MRI. We found that conventional registration was comparable to an existing CT based standard, and there was no significant difference in errors collectively amongst all methods tested. In addition, there were no significant differences between existing and MR-less tau PET quantification methods. We conclude that a template-based method is a feasible alternative to, or salvage for, direct registration and MR-less quantification; and, may be preferred in cases where there is doubt about the similarity between two image modalities. △ Less

Submitted 15 October, 2020; originally announced October 2020.

Comments: 19 pages, 8 Figures, 4 Tables

arXiv:2008.04768 [pdf, other]

Constrained Active Classification Using Partially Observable Markov Decision Processes

Authors: Bo Wu, Niklas Lauffer, Mohamadreza Ahmadi, Suda Bharadwaj, Zhe Xu, Ufuk Topcu

Abstract: In this work, we study the problem of actively classifying the attributes of dynamical systems characterized as a finite set of Markov decision process (MDP) models. We are interested in finding strategies that actively interact with the dynamical system and observe its reactions so that the attribute of interest is classified efficiently with high confidence. We present a decision-theoretic frame… ▽ More In this work, we study the problem of actively classifying the attributes of dynamical systems characterized as a finite set of Markov decision process (MDP) models. We are interested in finding strategies that actively interact with the dynamical system and observe its reactions so that the attribute of interest is classified efficiently with high confidence. We present a decision-theoretic framework based on partially observable Markov decision processes (POMDPs). The proposed framework relies on assigning a classification belief (a probability distribution) to the attributes of interest. Given an initial belief, a confidence level over which a classification decision can be made, a cost bound, safe belief sets, and a finite time horizon, we compute POMDP strategies leading to classification decisions. We present three different algorithms to compute such strategies. The first algorithm computes the optimal strategy exactly by value iteration. To overcome the computational complexity of computing the exact solutions, we propose a second algorithm based on adaptive sampling and a third based on a Monte Carlo tree search to approximate the optimal probability of reaching a classification decision. We illustrate the proposed methodology using examples from medical diagnosis, security surveillance, and wildlife classification. △ Less

Submitted 4 January, 2023; v1 submitted 10 August, 2020; originally announced August 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:1810.00097

arXiv:2008.00164 [pdf, other]

Byzantine-Resilient Distributed Hypothesis Testing With Time-Varying Network Topology

Authors: Bo Wu, Steven Carr, Suda Bharadwaj, Zhe Xu, Ufuk Topcu

Abstract: We study the problem of distributed hypothesis testing over a network of mobile agents with limited communication and sensing ranges to infer the true hypothesis collaboratively. In particular, we consider a scenario where there is an unknown subset of compromised agents that may deliberately share altered information to undermine the team objective. We propose two distributed algorithms where eac… ▽ More We study the problem of distributed hypothesis testing over a network of mobile agents with limited communication and sensing ranges to infer the true hypothesis collaboratively. In particular, we consider a scenario where there is an unknown subset of compromised agents that may deliberately share altered information to undermine the team objective. We propose two distributed algorithms where each agent maintains and updates two sets of beliefs (i.e., probability distributions over the hypotheses), namely local and actual beliefs (LB and AB respectively for brevity). In both algorithms, at every time step, each agent shares its AB with other agents within its communication range and makes a local observation to update its LB. Then both algorithms can use the shared information to update ABs under certain conditions. One requires receiving a certain number of shared ABs at each time instant; the other accumulates shared ABs over time and updates after the number of shared ABs exceeds a prescribed threshold. Otherwise, both algorithms rely on the agent's current LB and AB to update the new AB. We prove under mild assumptions that the AB for every non-compromised agent converges almost surely to the true hypothesis, without requiring connectivity in the underlying time-varying network topology. Using a simulation of a team of unmanned aerial vehicles aiming to classify adversarial agents among themselves, we illustrate and compare the proposed algorithms. Finally, we show experimentally that the second algorithm consistently outperforms the first algorithm in terms of the speed of convergence. △ Less

Submitted 17 July, 2021; v1 submitted 31 July, 2020; originally announced August 2020.

arXiv:2006.15109 [pdf, other]

Person Re-identification by analyzing Dynamic Variations in Gait Sequences

Authors: Sandesh Bharadwaj, Kunal Chanda

Abstract: Gait recognition is a biometric technology that identifies individuals in a video sequence by analysing their style of walking or limb movement. However, this identification is generally sensitive to appearance changes and conventional feature descriptors such as Gait Energy Image (GEI) lose some of the dynamic information in the gait sequence. Active Energy Image (AEI) focuses more on dynamic mot… ▽ More Gait recognition is a biometric technology that identifies individuals in a video sequence by analysing their style of walking or limb movement. However, this identification is generally sensitive to appearance changes and conventional feature descriptors such as Gait Energy Image (GEI) lose some of the dynamic information in the gait sequence. Active Energy Image (AEI) focuses more on dynamic motion changes than GEI and is more suited to deal with appearance changes. We propose a new approach, which allows recognizing people by analysing the dynamic motion variations and identifying people without using a database of predicted changes. In the proposed method, the active energy image is calculated by averaging the difference frames of the silhouette sequence and divided into multiple segments. Affine moment invariants are computed as gait features for each section. Next, matching weights are calculated based on the similarity between extracted features and those in the database. Finally, the subject is identified by the weighted combination of similarities in all segments. The CASIA-B Gait Database is used as the principal dataset for the experimental analysis. △ Less

Submitted 26 June, 2020; originally announced June 2020.

Comments: Presented at ETCCS 2020, accepted for publication in Springer LNEE Proceedings

arXiv:1910.10380 [pdf, other]

Online Synthesis for Runtime Enforcement of Safety in Multi-Agent Systems

Authors: Dhananjay Raju, Suda Bharadwaj, Ufuk Topcu

Abstract: A shield is attached to a system to guarantee safety by correcting the system's behavior at runtime. Existing methods that employ design-time synthesis of shields do not scale to multi-agent systems. Moreover, such shields are typically implemented in a centralized manner, requiring global information on the state of all agents in the system. We address these limitations through a new approach whe… ▽ More A shield is attached to a system to guarantee safety by correcting the system's behavior at runtime. Existing methods that employ design-time synthesis of shields do not scale to multi-agent systems. Moreover, such shields are typically implemented in a centralized manner, requiring global information on the state of all agents in the system. We address these limitations through a new approach where the shields are synthesized at runtime and do not require global information. There is a shield onboard every agent, which can only modify the behavior of the corresponding agent. In this approach, which is fundamentally decentralized, the shield on every agent has two components: a pathfinder that corrects the behavior of the agent and an ordering mechanism that dynamically modifies the priority of the agent. The current priority determines if the shield uses the pathfinder to modify behavior of the agent. We derive an upper bound on the maximum deviation for any agent from its original behavior. We prove that the worst-case synthesis time is quadratic in the number of agents at runtime as opposed to exponential at design-time for existing methods. We test the performance of the decentralized, runtime shield synthesis approach on a collision-avoidance problem. For 50 agents in a 50x50 grid, the synthesis at runtime requires a few seconds per agent whenever a potential collision is detected. In contrast, the centralized design-time synthesis of shields for a similar setting is intractable beyond 4 agents in a 5x5 grid. △ Less

Submitted 27 February, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

arXiv:1810.00097 [pdf, other]

Cost-Bounded Active Classification Using Partially Observable Markov Decision Processes

Authors: Bo Wu, Mohamadreza Ahmadi, Suda Bharadwaj, Ufuk Topcu

Abstract: Active classification, i.e., the sequential decision-making process aimed at data acquisition for classification purposes, arises naturally in many applications, including medical diagnosis, intrusion detection, and object tracking. In this work, we study the problem of actively classifying dynamical systems with a finite set of Markov decision process (MDP) models. We are interested in finding st… ▽ More Active classification, i.e., the sequential decision-making process aimed at data acquisition for classification purposes, arises naturally in many applications, including medical diagnosis, intrusion detection, and object tracking. In this work, we study the problem of actively classifying dynamical systems with a finite set of Markov decision process (MDP) models. We are interested in finding strategies that actively interact with the dynamical system, and observe its reactions so that the true model is determined efficiently with high confidence. To this end, we present a decision-theoretic framework based on partially observable Markov decision processes (POMDPs). The proposed framework relies on assigning a classification belief (a probability distribution) to each candidate MDP model. Given an initial belief, some misclassification probabilities, a cost bound, and a finite time horizon, we design POMDP strategies leading to classification decisions. We present two different approaches to find such strategies. The first approach computes the optimal strategy "exactly" using value iteration. To overcome the computational complexity of finding exact solutions, the second approach is based on adaptive sampling to approximate the optimal probability of reaching a classification decision. We illustrate the proposed methodology using two examples from medical diagnosis and intruder detection. △ Less

Submitted 28 September, 2018; originally announced October 2018.

arXiv:1809.06480 [pdf, other]

Transfer Entropy in MDPs with Temporal Logic Specifications

Authors: Suda Bharadwaj, Mohamadreza Ahmadi, Takashi Tanaka, Ufuk Topcu

Abstract: Emerging applications in autonomy require control techniques that take into account uncertain environments, communication and sensing constraints, while satisfying highlevel mission specifications. Motivated by this need, we consider a class of Markov decision processes (MDPs), along with a transfer entropy cost function. In this context, we study highlevel mission specifications as co-safe linear… ▽ More Emerging applications in autonomy require control techniques that take into account uncertain environments, communication and sensing constraints, while satisfying highlevel mission specifications. Motivated by this need, we consider a class of Markov decision processes (MDPs), along with a transfer entropy cost function. In this context, we study highlevel mission specifications as co-safe linear temporal logic (LTL) formulae. We provide a method to synthesize a policy that minimizes the weighted sum of the transfer entropy and the probability of failure to satisfy the specification. We derive a set of coupled non-linear equations that an optimal policy must satisfy. We then use a modified Arimoto-Blahut algorithm to solve the non-linear equations. Finally, we demonstrated the proposed method on a navigation and path planning scenario of a Mars rover. △ Less

Submitted 17 September, 2018; originally announced September 2018.

Comments: 8 pages, 6 figures, Preprint accepted at the 57th IEEE Conference on Decision and Control, Miami Beach, FL, USA, December 17-19, 2018

arXiv:1709.05363 [pdf, other]

Synthesis of surveillance strategies via belief abstraction

Authors: Suda Bharadwaj, Rayna Dimitrova, Ufuk Topcu

Abstract: We study the problem of synthesizing a controller for a robot with a surveillance objective, that is, the robot is required to maintain knowledge of the location of a moving, possibly adversarial target. We formulate this problem as a one-sided partial-information game in which the winning condition for the agent is specified as a temporal logic formula. The specification formalizes the surveillan… ▽ More We study the problem of synthesizing a controller for a robot with a surveillance objective, that is, the robot is required to maintain knowledge of the location of a moving, possibly adversarial target. We formulate this problem as a one-sided partial-information game in which the winning condition for the agent is specified as a temporal logic formula. The specification formalizes the surveillance requirement given by the user, including additional non-surveillance tasks. In order to synthesize a surveillance strategy that meets the specification, we transform the partial-information game into a perfect-information one, using abstraction to mitigate the exponential blow-up typically incurred by such transformations. This enables the use of off-the-shelf tools for reactive synthesis. We use counterexample-guided refinement to automatically achieve abstraction precision that is sufficient to synthesize a surveillance strategy. We evaluate the proposed method on two case-studies, demonstrating its applicability to large state-spaces and diverse requirements. △ Less

Submitted 19 March, 2018; v1 submitted 15 September, 2017; originally announced September 2017.

ACM Class: I.2.4

Showing 1–12 of 12 results for author: Bharadwaj, S