Skip to main content

Showing 1–14 of 14 results for author: Kataria, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2303.03657  [pdf, other

    eess.AS

    Self-FiLM: Conditioning GANs with self-supervised representations for bandwidth extension based speaker recognition

    Authors: Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Thomas Thebaud, Najim Dehak

    Abstract: Speech super-resolution/Bandwidth Extension (BWE) can improve downstream tasks like Automatic Speaker Verification (ASV). We introduce a simple novel technique called Self-FiLM to inject self-supervision into existing BWE models via Feature-wise Linear Modulation. We hypothesize that such information captures domain/environment information, which can give zero-shot generalization. Self-FiLM Condit… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: Under review

  2. arXiv:2209.01702  [pdf, other

    eess.AS

    Time-domain speech super-resolution with GAN based modeling for telephony speaker verification

    Authors: Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Piotr Żelasko, Najim Dehak

    Abstract: Automatic Speaker Verification (ASV) technology has become commonplace in virtual assistants. However, its performance suffers when there is a mismatch between the train and test domains. Mixed bandwidth training, i.e., pooling training data from both domains, is a preferred choice for develo** a universal model that works for both narrowband and wideband domains. We propose complementing this t… ▽ More

    Submitted 4 September, 2022; originally announced September 2022.

    Comments: Submit to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  3. arXiv:2204.03851  [pdf, other

    eess.AS cs.CR cs.SD

    Defense against Adversarial Attacks on Hybrid Speech Recognition using Joint Adversarial Fine-tuning with Denoiser

    Authors: Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesus Villalba, Sanjeev Khudanpur, Najim Dehak

    Abstract: Adversarial attacks are a threat to automatic speech recognition (ASR) systems, and it becomes imperative to propose defenses to protect them. In this paper, we perform experiments to show that K2 conformer hybrid ASR is strongly affected by white-box adversarial attacks. We propose three defenses--denoiser pre-processor, adversarially fine-tuning ASR model, and adversarially fine-tuning joint mod… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: Submitted to Interspeech 2022

  4. arXiv:2204.03848  [pdf, ps, other

    eess.AS cs.CR cs.SD

    AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification

    Authors: Sonal Joshi, Saurabh Kataria, Jesus Villalba, Najim Dehak

    Abstract: Adversarial attacks pose a severe security threat to the state-of-the-art speaker identification systems, thereby making it vital to propose countermeasures against them. Building on our previous work that used representation learning to classify and detect adversarial attacks, we propose an improvement to it using AdvEst, a method to estimate adversarial perturbation. First, we prove our claim th… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: Submitted to InterSpeech 2022

  5. arXiv:2203.16614  [pdf, other

    eess.AS cs.SD

    Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification

    Authors: Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Najim Dehak

    Abstract: Speech systems developed for a particular choice of acoustic domain and sampling frequency do not translate easily to others. The usual practice is to learn domain adaptation and bandwidth extension models independently. Contrary to this, we propose to learn both tasks together. Particularly, we learn to map narrowband conversational telephone speech to wideband microphone speech. We developed par… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: submitted to Interspeech 2022

  6. arXiv:2104.01433  [pdf, other

    eess.AS

    Deep Feature CycleGANs: Speaker Identity Preserving Non-parallel Microphone-Telephone Domain Adaptation for Speaker Verification

    Authors: Saurabh Kataria, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velázquez, Najim Dehak

    Abstract: With the increase in the availability of speech from varied domains, it is imperative to use such out-of-domain data to improve existing speech systems. Domain adaptation is a prominent pre-processing approach for this. We investigate it for adapt microphone speech to the telephone domain. Specifically, we explore CycleGAN-based unpaired translation of microphone data to improve the x-vector/speak… ▽ More

    Submitted 3 April, 2021; originally announced April 2021.

  7. arXiv:2010.12692  [pdf, ps, other

    eess.AS

    Multi-Channel Speaker Verification for Single and Multi-talker Speech

    Authors: Saurabh Kataria, Shi-Xiong Zhang, Dong Yu

    Abstract: To improve speaker verification in real scenarios with interference speakers, noise, and reverberation, we propose to bring together advancements made in multi-channel speech features. Specifically, we combine spectral, spatial, and directional features, which includes inter-channel phase difference, multi-channel sinc convolutions, directional power ratio features, and angle features. To maximall… ▽ More

    Submitted 9 April, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

  8. arXiv:2010.11860  [pdf, other

    eess.AS cs.SD

    Perceptual Loss based Speech Denoising with an ensemble of Audio Pattern Recognition and Self-Supervised Models

    Authors: Saurabh Kataria, Jesús Villalba, Najim Dehak

    Abstract: Deep learning based speech denoising still suffers from the challenge of improving perceptual quality of enhanced signals. We introduce a generalized framework called Perceptual Ensemble Regularization Loss (PERL) built on the idea of perceptual losses. Perceptual loss discourages distortion to certain speech properties and we analyze it using six large-scale pre-trained models: speaker classifica… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

  9. arXiv:2005.08331  [pdf, ps, other

    eess.AS cs.SD

    Single Channel Far Field Feature Enhancement For Speaker Verification In The Wild

    Authors: Phani Sankar Nidadavolu, Saurabh Kataria, Paola García-Perera, Jesús Villalba, Najim Dehak

    Abstract: We investigated an enhancement and a domain adaptation approach to make speaker verification systems robust to perturbations of far-field speech. In the enhancement approach, using paired (parallel) reverberant-clean speech, we trained a supervised Generative Adversarial Network (GAN) along with a feature map** loss. For the domain adaptation approach, we trained a Cycle Consistent Generative Ad… ▽ More

    Submitted 17 May, 2020; originally announced May 2020.

    Comments: submitted to INTERSPEECH 2020

  10. arXiv:2002.00139  [pdf, other

    eess.AS cs.SD

    Analysis of Deep Feature Loss based Enhancement for Speaker Verification

    Authors: Saurabh Kataria, Phani Sankar Nidadavolu, Jesús Villalba, Najim Dehak

    Abstract: Data augmentation is conventionally used to inject robustness in Speaker Verification systems. Several recently organized challenges focus on handling novel acoustic environments. Deep learning based speech enhancement is a modern solution for this. Recently, a study proposed to optimize the enhancement network in the activation space of a pre-trained auxiliary network. This methodology, called de… ▽ More

    Submitted 27 April, 2020; v1 submitted 31 January, 2020; originally announced February 2020.

    Comments: 8 pages; accepted in Odyssey2020 workshop

  11. arXiv:1912.00938  [pdf

    eess.AS cs.SD

    Speaker detection in the wild: Lessons learned from JSALT 2019

    Authors: Paola Garcia, Jesus Villalba, Herve Bredin, Jun Du, Diego Castan, Alejandrina Cristia, Latane Bullock, Ling Guo, Koji Okabe, Phani Sankar Nidadavolu, Saurabh Kataria, Sizhu Chen, Leo Galmant, Marvin Lavechin, Lei Sun, Marie-Philippe Gill, Bar Ben-Yair, Sajjad Abdoli, Xin Wang, Wassim Bouaziz, Hadrien Titeux, Emmanuel Dupoux, Kong Aik Lee, Najim Dehak

    Abstract: This paper presents the problems and solutions addressed at the JSALT workshop when using a single microphone for speaker detection in adverse scenarios. The main focus was to tackle a wide range of conditions that go from meetings to wild speech. We describe the research threads we explored and a set of modules that was successful for these scenarios. The ultimate goal was to explore speaker dete… ▽ More

    Submitted 2 December, 2019; originally announced December 2019.

    Comments: Submitted to ICASSP 2020

  12. arXiv:1910.11915  [pdf, ps, other

    eess.AS cs.SD

    Unsupervised Feature Enhancement for speaker verification

    Authors: Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Paola García-Perera, Najim Dehak

    Abstract: The task of making speaker verification systems robust to adverse scenarios remain a challenging and an active area of research. We developed an unsupervised feature enhancement approach in log-filter bank domain with the end goal of improving speaker verification performance. We experimented with using both real speech recorded in adverse environments and degraded speech obtained by simulation to… ▽ More

    Submitted 14 February, 2020; v1 submitted 25 October, 2019; originally announced October 2019.

    Comments: 5 pages; accepted in ICASSP 2020

  13. arXiv:1910.11909  [pdf, other

    eess.AS cs.SD

    Low-Resource Domain Adaptation for Speaker Recognition Using Cycle-GANs

    Authors: Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Najim Dehak

    Abstract: Current speaker recognition technology provides great performance with the x-vector approach. However, performance decreases when the evaluation domain is different from the training domain, an issue usually addressed with domain adaptation approaches. Recently, unsupervised domain adaptation using cycle-consistent Generative Adversarial Netorks (CycleGAN) has received a lot of attention. CycleGAN… ▽ More

    Submitted 25 October, 2019; originally announced October 2019.

    Comments: 8 pages, accepted to ASRU 2019

  14. arXiv:1910.11905  [pdf, ps, other

    eess.AS cs.SD

    Feature Enhancement with Deep Feature Losses for Speaker Verification

    Authors: Saurabh Kataria, Phani Sankar Nidadavolu, Jesús Villalba, Nanxin Chen, Paola García, Najim Dehak

    Abstract: Speaker Verification still suffers from the challenge of generalization to novel adverse environments. We leverage on the recent advancements made by deep learning based speech enhancement and propose a feature-domain supervised denoising based solution. We propose to use Deep Feature Loss which optimizes the enhancement network in the hidden activation space of a pre-trained auxiliary speaker emb… ▽ More

    Submitted 14 February, 2020; v1 submitted 25 October, 2019; originally announced October 2019.

    Comments: 5 pages, accepted in ICASSP 2020