Skip to main content

Showing 1–15 of 15 results for author: Peri, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.08317  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models

    Authors: Raghuveer Peri, Sai Muralidhar Jayanthi, Srikanth Ronanki, Anshu Bhatia, Karel Mundnich, Saket Dingliwal, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Srikanth Vishnubhotla, Daniel Garcia-Romero, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

    Abstract: Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this work, we investigate the potential vulnerabilities of such instruction-following speech-language models to adversarial attacks and jailbreaking. Specifically, we… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 9+6 pages, Submitted to ACL 2024

  2. arXiv:2307.00169  [pdf, other

    eess.AS cs.AI cs.LG

    VoxWatch: An open-set speaker recognition benchmark on VoxCeleb

    Authors: Raghuveer Peri, Seyed Omid Sadjadi, Daniel Garcia-Romero

    Abstract: Despite its broad practical applications such as in fraud prevention, open-set speaker identification (OSI) has received less attention in the speaker recognition community compared to speaker verification (SV). OSI deals with determining if a test speech sample belongs to a speaker from a set of pre-enrolled individuals (in-set) or if it is from an out-of-set speaker. In addition to the typical c… ▽ More

    Submitted 30 June, 2023; originally announced July 2023.

    Comments: 8 pages

  3. User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition in Federated Learning

    Authors: Tiantian Feng, Raghuveer Peri, Shrikanth Narayanan

    Abstract: Many existing privacy-enhanced speech emotion recognition (SER) frameworks focus on perturbing the original speech data through adversarial training within a centralized machine learning setup. However, this privacy protection scheme can fail since the adversary can still access the perturbed data. In recent years, distributed learning algorithms, especially federated learning (FL), have gained po… ▽ More

    Submitted 16 May, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

    Journal ref: Proc. Interspeech 2022

  4. arXiv:2203.15283  [pdf, other

    eess.AS cs.LG

    Mel Frequency Spectral Domain Defenses against Adversarial Attacks on Speech Recognition Systems

    Authors: Nicholas Mehlman, Anirudh Sreeram, Raghuveer Peri, Shrikanth Narayanan

    Abstract: A variety of recent works have looked into defenses for deep neural networks against adversarial attacks particularly within the image processing domain. Speech processing applications such as automatic speech recognition (ASR) are increasingly relying on deep learning models, and so are also prone to adversarial attacks. However, many of the defenses explored for ASR simply adapt the image-domain… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: This paper is 5 pages long and was submitted to Interspeech 2022

  5. arXiv:2203.09122  [pdf, other

    eess.AS

    To train or not to train adversarially: A study of bias mitigation strategies for speaker recognition

    Authors: Raghuveer Peri, Krishna Somandepalli, Shrikanth Narayanan

    Abstract: Speaker recognition is increasingly used in several everyday applications including smart speakers, customer care centers and other speech-driven analytics. It is crucial to accurately evaluate and mitigate biases present in machine learning (ML) based speech technologies, such as speaker recognition, to ensure their inclusive adoption. ML fairness studies with respect to various demographic facto… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

    Comments: Preprint submitted to Computer Speech and Language (Elsevier)

  6. arXiv:2107.05222  [pdf, other

    eess.AS cs.LG eess.SP

    Perceptual-based deep-learning denoiser as a defense against adversarial attacks on ASR systems

    Authors: Anirudh Sreeram, Nicholas Mehlman, Raghuveer Peri, Dillon Knox, Shrikanth Narayanan

    Abstract: In this paper we investigate speech denoising as a defense against adversarial attacks on automatic speech recognition (ASR) systems. Adversarial attacks attempt to force misclassification by adding small perturbations to the original speech signal. We propose to counteract this by employing a neural-network based denoiser as a pre-processor in the ASR pipeline. The denoiser is independent of the… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

    Comments: 5 pages, 4 figures submitted to ASRU 2021

  7. arXiv:2102.11265  [pdf, other

    eess.AS cs.CL cs.SD

    Automated Evaluation Of Psychotherapy Skills Using Speech And Language Technologies

    Authors: Nikolaos Flemotomos, Victor R. Martinez, Zhuohao Chen, Karan Singla, Victor Ardulov, Raghuveer Peri, Derek D. Caperton, James Gibson, Michael J. Tanana, Panayiotis Georgiou, Jake Van Epps, Sarah P. Lord, Tad Hirsch, Zac E. Imel, David C. Atkins, Shrikanth Narayanan

    Abstract: With the growing prevalence of psychological interventions, it is vital to have measures which rate the effectiveness of psychological care to assist in training, supervision, and quality assurance of services. Traditionally, quality assessment is addressed by human raters who evaluate recorded sessions along specific dimensions, often codified through constructs relevant to the approach and domai… ▽ More

    Submitted 27 March, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

    Comments: new version has an updated title

  8. arXiv:2102.06269  [pdf, other

    eess.IV cs.SD eess.AS

    Disentanglement for audio-visual emotion recognition using multitask setup

    Authors: Raghuveer Peri, Srinivas Parthasarathy, Charles Bradshaw, Shiva Sundaram

    Abstract: Deep learning models trained on audio-visual data have been successfully used to achieve state-of-the-art performance for emotion recognition. In particular, models trained with multitask learning have shown additional performance improvements. However, such multitask models entangle information between the tasks, encoding the mutual dependencies present in label distributions in the real world da… ▽ More

    Submitted 11 February, 2021; originally announced February 2021.

    Comments: Accepted for ICASSP 2021, 5 pages

  9. arXiv:2010.16038  [pdf, ps, other

    eess.AS

    Adversarial defense for deep speaker recognition using hybrid adversarial training

    Authors: Monisankha Pal, Arindam Jati, Raghuveer Peri, Chin-Cheng Hsu, Wael AbdAlmageed, Shrikanth Narayanan

    Abstract: Deep neural network based speaker recognition systems can easily be deceived by an adversary using minuscule imperceptible perturbations to the input speech samples. These adversarial attacks pose serious security threats to the speaker recognition systems that use speech biometric. To address this concern, in this work, we propose a new defense mechanism based on a hybrid adversarial training (HA… ▽ More

    Submitted 29 October, 2020; originally announced October 2020.

    Comments: Submitted to ICASSP 2021

  10. arXiv:2008.07685  [pdf, other

    eess.AS cs.LG cs.SD

    Adversarial Attack and Defense Strategies for Deep Speaker Recognition Systems

    Authors: Arindam Jati, Chin-Cheng Hsu, Monisankha Pal, Raghuveer Peri, Wael AbdAlmageed, Shrikanth Narayanan

    Abstract: Robust speaker recognition, including in the presence of malicious attacks, is becoming increasingly important and essential, especially due to the proliferation of several smart speakers and personal agents that interact with an individual's voice commands to perform diverse, and even sensitive tasks. Adversarial attack is a recently revived domain which is shown to be effective in breaking deep… ▽ More

    Submitted 17 August, 2020; originally announced August 2020.

  11. arXiv:2007.09635  [pdf, other

    eess.AS cs.SD

    Meta-learning with Latent Space Clustering in Generative Adversarial Network for Speaker Diarization

    Authors: Monisankha Pal, Manoj Kumar, Raghuveer Peri, Tae ** Park, So Hyun Kim, Catherine Lord, Somer Bishop, Shrikanth Narayanan

    Abstract: The performance of most speaker diarization systems with x-vector embeddings is both vulnerable to noisy environments and lacks domain robustness. Earlier work on speaker diarization using generative adversarial network (GAN) with an encoder network (ClusterGAN) to project input x-vectors into a latent space has shown promising performance on meeting data. In this paper, we extend the ClusterGAN n… ▽ More

    Submitted 19 July, 2020; originally announced July 2020.

    Comments: Submitted to IEEE/ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

  12. arXiv:2002.03520  [pdf, other

    eess.AS cs.SD

    An empirical analysis of information encoded in disentangled neural speaker representations

    Authors: Raghuveer Peri, Haoqi Li, Krishna Somandepalli, Arindam Jati, Shrikanth Narayanan

    Abstract: The primary characteristic of robust speaker representations is that they are invariant to factors of variability not related to speaker identity. Disentanglement of speaker representations is one of the techniques used to improve robustness of speaker representations to both intrinsic factors that are acquired during speech production (e.g., emotion, lexical content) and extrinsic factors that ar… ▽ More

    Submitted 7 April, 2020; v1 submitted 9 February, 2020; originally announced February 2020.

    Comments: Submitted to Speaker Odyssey 2020

  13. arXiv:1911.00940  [pdf, other

    eess.AS cs.SD eess.SP

    Robust speaker recognition using unsupervised adversarial invariance

    Authors: Raghuveer Peri, Monisankha Pal, Arindam Jati, Krishna Somandepalli, Shrikanth Narayanan

    Abstract: In this paper, we address the problem of speaker recognition in challenging acoustic conditions using a novel method to extract robust speaker-discriminative speech representations. We adopt a recently proposed unsupervised adversarial invariance architecture to train a network that maps speaker embeddings extracted using a pre-trained model onto two lower dimensional embedding spaces. The embeddi… ▽ More

    Submitted 3 November, 2019; originally announced November 2019.

    Comments: Submitted to ICASSP 2020

  14. arXiv:1910.11416  [pdf, ps, other

    eess.AS cs.SD

    A study of semi-supervised speaker diarization system using gan mixture model

    Authors: Monisankha Pal, Manoj Kumar, Raghuveer Peri, Shrikanth Narayanan

    Abstract: We propose a new speaker diarization system based on a recently introduced unsupervised clustering technique namely, generative adversarial network mixture model (GANMM). The proposed system uses x-vectors as front-end representation. Spectral embedding is used for dimensionality reduction followed by k-means initialization during GANMM pre-training. GANMM performs unsupervised speaker clustering… ▽ More

    Submitted 24 October, 2019; originally announced October 2019.

    Comments: Submitted to ICASSP 2020

  15. arXiv:1910.11398  [pdf, ps, other

    eess.AS cs.SD

    Speaker diarization using latent space clustering in generative adversarial network

    Authors: Monisankha Pal, Manoj Kumar, Raghuveer Peri, Tae ** Park, So Hyun Kim, Catherine Lord, Somer Bishop, Shrikanth Narayanan

    Abstract: In this work, we propose deep latent space clustering for speaker diarization using generative adversarial network (GAN) backprojection with the help of an encoder network. The proposed diarization system is trained jointly with GAN loss, latent variable recovery loss, and a clustering-specific loss. It uses x-vector speaker embeddings at the input, while the latent variables are sampled from a co… ▽ More

    Submitted 24 October, 2019; originally announced October 2019.

    Comments: Submitted to ICASSP 2020