Skip to main content

Showing 1–23 of 23 results for author: Sundararajan

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18679  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    Speakers Unembedded: Embedding-free Approach to Long-form Neural Diarization

    Authors: Xiang Li, Vivek Govindan, Rohit Paturi, Sundararajan Srinivasan

    Abstract: End-to-end neural diarization (EEND) models offer significant improvements over traditional embedding-based Speaker Diarization (SD) approaches but falls short on generalizing to long-form audio with large number of speakers. EEND-vector-clustering method mitigates this by combining local EEND with global clustering of speaker embeddings from local windows, but this requires an additional speaker… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted at INTERSPEECH 2024

  2. arXiv:2406.17266  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    AG-LSEC: Audio Grounded Lexical Speaker Error Correction

    Authors: Rohit Paturi, Xiang Li, Sundararajan Srinivasan

    Abstract: Speaker Diarization (SD) systems are typically audio-based and operate independently of the ASR system in traditional speech transcription pipelines and can have speaker errors due to SD and/or ASR reconciliation, especially around speaker turns and regions of speech overlap. To reduce these errors, a Lexical Speaker Error Correction (LSEC), in which an external language model provides lexical inf… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted at INTERSPEECH 2024

  3. arXiv:2405.08317  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models

    Authors: Raghuveer Peri, Sai Muralidhar Jayanthi, Srikanth Ronanki, Anshu Bhatia, Karel Mundnich, Saket Dingliwal, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Srikanth Vishnubhotla, Daniel Garcia-Romero, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

    Abstract: Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this work, we investigate the potential vulnerabilities of such instruction-following speech-language models to adversarial attacks and jailbreaking. Specifically, we… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 9+6 pages, Submitted to ACL 2024

  4. arXiv:2405.08295  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechVerse: A Large-scale Generalizable Audio Language Model

    Authors: Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, Zhaocheng Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

    Abstract: Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore devel… ▽ More

    Submitted 31 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Single Column, 13 page

  5. arXiv:2311.00697  [pdf, other

    cs.CL eess.AS

    End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

    Authors: Juan Zuluaga-Gomez, Zhaocheng Huang, Xing Niu, Rohit Paturi, Sundararajan Srinivasan, Prashant Mathur, Brian Thompson, Marcello Federico

    Abstract: Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers. In this paper, we tackle single-channel multi-speaker conversational ST with an end-to-end and multi-task training model, named Speaker-Turn Aware Conversational Speech Translation, that combin… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: Accepted at EMNLP 2023. Code: https://github.com/amazon-science/stac-speech-translation

  6. arXiv:2307.09954  [pdf, other

    eess.SY cs.MA cs.RO

    Priority-based DREAM Approach for Highly Manoeuvring Intruders in A Perimeter Defense Problem

    Authors: Shridhar Velhal, Suresh Sundaram, Narasimhan Sundararajan

    Abstract: In this paper, a Priority-based Dynamic REsource Allocation with decentralized Multi-task assignment (P-DREAM) approach is presented to protect a territory from highly manoeuvring intruders. In the first part, static optimization problems are formulated to compute the following parameters of the perimeter defense problem; the number of reserve stations, their locations, the priority region, the mo… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

  7. arXiv:2306.09313  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    Lexical Speaker Error Correction: Leveraging Language Models for Speaker Diarization Error Correction

    Authors: Rohit Paturi, Sundararajan Srinivasan, Xiang Li

    Abstract: Speaker diarization (SD) is typically used with an automatic speech recognition (ASR) system to ascribe speaker labels to recognized words. The conventional approach reconciles outputs from independently optimized ASR and SD systems, where the SD system typically uses only acoustic information to identify the speakers in the audio stream. This approach can lead to speaker errors especially around… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: Accepted at INTERSPEECH 2023

  8. Peek into the Future Camera-based Occupant Sensing in Configurable Cabins for Autonomous Vehicles

    Authors: Avinash Prabu, Renran Tian, Lingxi Li, Jialiang Le, Srinivasan Sundararajan, Saeed Barbat

    Abstract: The development of fully autonomous vehicles (AVs) can potentially eliminate drivers and introduce unprecedented seating design. However, highly flexible seat configurations may lead to occupants' unconventional poses and actions. Understanding occupant behaviors and prioritize safety features become eye-catching topics in the AV research frontier. Visual sensors have the advantages of cost-effici… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

    Comments: Conference: 2021 IEEE International Intelligent Transportation Systems Conference (ITSC) Link: https://ieeexplore.ieee.org/document/9564420

  9. arXiv:2212.07084  [pdf, other

    cs.CV eess.IV

    Fully Complex-valued Fully Convolutional Multi-feature Fusion Network (FC2MFN) for Building Segmentation of InSAR images

    Authors: Aniruddh Sikdar, Sumanth Udupa, Suresh Sundaram, Narasimhan Sundararajan

    Abstract: Building segmentation in high-resolution InSAR images is a challenging task that can be useful for large-scale surveillance. Although complex-valued deep learning networks perform better than their real-valued counterparts for complex-valued SAR data, phase information is not retained throughout the network, which causes a loss of information. This paper proposes a Fully Complex-valued, Fully Conv… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: Accepted for publication in IEEE Symposium Series On Computational Intelligence 2022, 8 pages, 6 figures

  10. arXiv:2211.13280  [pdf, other

    cs.CL cs.SD eess.AS

    Device Directedness with Contextual Cues for Spoken Dialog Systems

    Authors: Dhanush Bekal, Sundararajan Srinivasan, Sravan Bodapati, Srikanth Ronanki, Katrin Kirchhoff

    Abstract: In this work, we define barge-in verification as a supervised learning task where audio-only information is used to classify user spoken dialogue into true and false barge-ins. Following the success of pre-trained models, we use low-level speech representations from a self-supervised representation learning model for our downstream classification task. Further, we propose a novel technique to infu… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

  11. arXiv:2202.13870  [pdf, other

    cs.NI cs.LG eess.SY

    Simulating Network Paths with Recurrent Buffering Units

    Authors: Divyam Anshumaan, Sriram Balasubramanian, Shubham Tiwari, Nagarajan Natarajan, Sundararajan Sellamanickam, Venkata N. Padmanabhan

    Abstract: Simulating physical network paths (e.g., Internet) is a cornerstone research problem in the emerging sub-field of AI-for-networking. We seek a model that generates end-to-end packet delay values in response to the time-varying load offered by a sender, which is typically a function of the previously output delays. The problem setting is unique, and renders the state-of-the-art text and time-series… ▽ More

    Submitted 6 December, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

    Comments: Accepted in AAAI 2023, 19 pages, 14 figures

  12. arXiv:2112.05863  [pdf, other

    eess.AS cs.CL cs.LG cs.SD eess.SP

    Directed Speech Separation for Automatic Speech Recognition of Long Form Conversational Speech

    Authors: Rohit Paturi, Sundararajan Srinivasan, Katrin Kirchhoff, Daniel Garcia-Romero

    Abstract: Many of the recent advances in speech separation are primarily aimed at synthetic mixtures of short audio utterances with high degrees of overlap. Most of these approaches need an additional stitching step to stitch the separated speech chunks for long form audio. Since most of the approaches involve Permutation Invariant training (PIT), the order of separated speech chunks is nondeterministic and… ▽ More

    Submitted 6 September, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

    Comments: Accepted for publication at Interspeech 2022

  13. arXiv:2112.00158  [pdf

    eess.AS

    Representation learning through cross-modal conditional teacher-student training for speech emotion recognition

    Authors: Sundararajan Srinivasan, Zhaocheng Huang, Katrin Kirchhoff

    Abstract: Generic pre-trained speech and text representations promise to reduce the need for large labeled datasets on specific speech and language tasks. However, it is not clear how to effectively adapt these representations for speech emotion recognition. Recent public benchmarks show the efficacy of several popular self-supervised speech representations for emotion classification. In this study, we show… ▽ More

    Submitted 27 January, 2022; v1 submitted 30 November, 2021; originally announced December 2021.

    Comments: Accepted for publication at IEEE ICASSP 2022

  14. arXiv:2107.10135  [pdf, other

    cs.NI eess.SP

    Global Outliers Detection in Wireless Sensor Networks: A Novel Approach Integrating Time-Series Analysis, Entropy, and Random Forest-based Classification

    Authors: Mahmood Safaei, Maha Driss, Wadii Boulila, Elankovan A Sundararajan, Mitra Safaei

    Abstract: Wireless Sensor Networks (WSNs) have recently attracted greater attention worldwide due to their practicality in monitoring, communicating, and reporting specific physical phenomena. The data collected by WSNs is often inaccurate as a result of unavoidable environmental factors, which may include noise, signal weakness, or intrusion attacks depending on the specific situation. Sending high-noise d… ▽ More

    Submitted 21 July, 2021; originally announced July 2021.

  15. Robust EMRAN-aided Coupled Controller for Autonomous Vehicles

    Authors: Sauranil Debarshi, Suresh Sundaram, Narasimhan Sundararajan

    Abstract: This paper presents a coupled, neural network-aided longitudinal cruise and lateral path-tracking controller for an autonomous vehicle with model uncertainties and experiencing unknown external disturbances. Using a feedback error learning mechanism, an inverse vehicle dynamics learning scheme utilizing an adaptive Radial Basis Function (RBF) neural network, referred to as the Extended Minimal Res… ▽ More

    Submitted 8 January, 2022; v1 submitted 22 June, 2021; originally announced June 2021.

    Report number: Engineering Applications of Artificial Intelligence, vol. 110, p. 104717

  16. arXiv:2106.05792  [pdf, other

    eess.AS

    Speaker-conversation factorial designs for diarization error analysis

    Authors: Scott Seyfarth, Sundararajan Srinivasan, Katrin Kirchhoff

    Abstract: Speaker diarization accuracy can be affected by both acoustics and conversation characteristics. Determining the cause of diarization errors is difficult because speaker voice acoustics and conversation structure co-vary, and the interactions between acoustics, conversational structure, and diarization accuracy are complex. This paper proposes a methodology that can distinguish independent margina… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: 5 pages, 2 figures, Interspeech 2021

  17. arXiv:2105.11353  [pdf, other

    stat.ME eess.SY stat.AP

    Change Point Detection in Nonstationary Sub-Hourly Wind Time Series

    Authors: Sakitha Ariyarathne, Harsha Gangammanavar, Raanju R. Sundararajan

    Abstract: In this paper, we present a change point detection method for detecting change points in multivariate nonstationary wind speed time series. The change point method identifies changes in the covariance structure and decomposes the nonstationary multivariate time series into stationary segments. We also present parametric and nonparametric simulation techniques to simulate new wind time series withi… ▽ More

    Submitted 24 May, 2021; originally announced May 2021.

    Comments: 18 pages, 3 figures, 3 tables, and 5 sections

  18. arXiv:2103.05834  [pdf, other

    eess.AS

    Best of Both Worlds: Robust Accented Speech Recognition with Adversarial Transfer Learning

    Authors: Nilaksh Das, Sravan Bodapati, Monica Sunkara, Sundararajan Srinivasan, Duen Horng Chau

    Abstract: Training deep neural networks for automatic speech recognition (ASR) requires large amounts of transcribed speech. This becomes a bottleneck for training robust models for accented speech which typically contains high variability in pronunciation and other semantics, since obtaining large amounts of annotated accented data is both tedious and costly. Often, we only have access to large amounts of… ▽ More

    Submitted 9 March, 2021; originally announced March 2021.

  19. arXiv:2102.07381  [pdf, other

    eess.SY cs.MA cs.RO

    A Decentralized Multi-UAV Spatio-Temporal Multi-Task Allocation Approach for Perimeter Defense

    Authors: Shridhar Velhal, Suresh Sundaram, Narasimhan Sundararajan

    Abstract: This paper provides a new solution approach to a multi-player perimeter defense game, in which the intruders' team tries to enter the territory, and a team of defenders protects the territory by capturing intruders on the perimeter of the territory. The objective of the defenders is to detect and capture the intruders before the intruders enter the territory. Each defender independently senses the… ▽ More

    Submitted 15 February, 2021; originally announced February 2021.

  20. arXiv:2012.06756  [pdf, ps, other

    eess.SY

    Gap Reduced Minimum Error Robust Simultaneous Estimation For Unstable Nano Air Vehicle

    Authors: **raj V Pushpangathan, Harikumar Kandath, Suresh Sundaram, Narasimhan Sundararajan

    Abstract: This paper proposes a novel Gap Reduced Minimum Error Robust Simultaneous (GRMERS) estimator for resource-constrained Nano Aerial Vehicle (NAV) that enables a single estimator to provide simultaneous and robust estimation for a given N unstable and uncertain NAV plant models. The estimated full state feedback enables a stable flight for NAV. The GRMERS estimator is implemented utilizing a Minimum… ▽ More

    Submitted 12 December, 2020; originally announced December 2020.

  21. arXiv:1910.05339  [pdf, other

    cs.DC cs.SE eess.SY

    DeCaf: Diagnosing and Triaging Performance Issues in Large-Scale Cloud Services

    Authors: Chetan Bansal, Sundararajan Renganathan, Ashima Asudani, Olivier Midy, Mathru Janakiraman

    Abstract: Large scale cloud services use Key Performance Indicators (KPIs) for tracking and monitoring performance. They usually have Service Level Objectives (SLOs) baked into the customer agreements which are tied to these KPIs. Dependency failures, code bugs, infrastructure failures, and other problems can cause performance regressions. It is critical to minimize the time and manual effort in diagnosing… ▽ More

    Submitted 2 February, 2020; v1 submitted 11 October, 2019; originally announced October 2019.

    Comments: To be published in the proceedings of ICSE-SEIP '20, Seoul, Republic of Korea

  22. arXiv:1905.11883  [pdf, other

    eess.SP

    A Case Study on the Effects of Partial Solar Eclipse on Distributed Photovoltaic Systems and Management Areas

    Authors: Aditya Sundararajan, Temitayo O. Olowu, Longfei Wei, Shahinur Rahman, Arif I. Sarwat

    Abstract: Photovoltaic (PV) systems depend on irradiance, ambient temperature and module temperature. A solar eclipse causes significant changes in these parameters, thereby impacting PV generation profile, performance, and power quality of larger grid where they connect to. This paper presents a case study to evaluate the impacts of the solar eclipse of August 21, 2017 on two real-world grid-tied PV system… ▽ More

    Submitted 24 May, 2019; originally announced May 2019.

    Comments: Accepted by IET Smart Grid journal

  23. arXiv:1809.08709  [pdf, ps, other

    math.OC cs.DC eess.SY

    A Canonical Form for First-Order Distributed Optimization Algorithms

    Authors: Akhil Sundararajan, Bryan Van Scoy, Laurent Lessard

    Abstract: We consider the distributed optimization problem in which a network of agents aims to minimize the average of local functions. To solve this problem, several algorithms have recently been proposed where agents perform various combinations of communication with neighbors, local gradient computations, and updates to local state variables. In this paper, we present a canonical form that characterizes… ▽ More

    Submitted 15 July, 2019; v1 submitted 23 September, 2018; originally announced September 2018.

    Journal ref: American Control Conference, pp. 4075-4080, Jul 2019