Skip to main content

Showing 1–7 of 7 results for author: García-Perera, P

Searching in archive eess. Search in all archives.
.
  1. arXiv:2401.15676  [pdf, other

    eess.AS cs.SD

    On Speaker Attribution with SURT

    Authors: Desh Raj, Matthew Wiesner, Matthew Maciejewski, Leibny Paola Garcia-Perera, Daniel Povey, Sanjeev Khudanpur

    Abstract: The Streaming Unmixing and Recognition Transducer (SURT) has recently become a popular framework for continuous, streaming, multi-talker speech recognition (ASR). With advances in architecture, objectives, and mixture simulation methods, it was demonstrated that SURT can be an efficient streaming method for speaker-agnostic transcription of real meetings. In this work, we push this framework furth… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

    Comments: 8 pages, 6 figures, 6 tables. Submitted to Odyssey 2024

  2. arXiv:2011.01997  [pdf, other

    eess.AS cs.SD

    DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs

    Authors: Desh Raj, Leibny Paola Garcia-Perera, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, Sanjeev Khudanpur

    Abstract: Several advances have been made recently towards handling overlap** speech for speaker diarization. Since speech and natural language tasks often benefit from ensemble techniques, we propose an algorithm for combining outputs from such diarization systems through majority voting. Our method, DOVER-Lap, is inspired from the recently proposed DOVER algorithm, but is designed to handle overlap**… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: Accepted to IEEE SLT 2021

  3. arXiv:2007.10248  [pdf, other

    cs.SD cs.LG eess.AS

    DNN Speaker Tracking with Embeddings

    Authors: Carlos Rodrigo Castillo-Sanchez, Leibny Paola Garcia-Perera, Anabel Martin-Gonzalez

    Abstract: In multi-speaker applications is common to have pre-computed models from enrolled speakers. Using these models to identify the instances in which these speakers intervene in a recording is the task of speaker tracking. In this paper, we propose a novel embedding-based speaker tracking method. Specifically, our design is based on a convolutional neural network that mimics a typical speaker verifica… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

  4. arXiv:2005.08331  [pdf, ps, other

    eess.AS cs.SD

    Single Channel Far Field Feature Enhancement For Speaker Verification In The Wild

    Authors: Phani Sankar Nidadavolu, Saurabh Kataria, Paola García-Perera, Jesús Villalba, Najim Dehak

    Abstract: We investigated an enhancement and a domain adaptation approach to make speaker verification systems robust to perturbations of far-field speech. In the enhancement approach, using paired (parallel) reverberant-clean speech, we trained a supervised Generative Adversarial Network (GAN) along with a feature map** loss. For the domain adaptation approach, we trained a Cycle Consistent Generative Ad… ▽ More

    Submitted 17 May, 2020; originally announced May 2020.

    Comments: submitted to INTERSPEECH 2020

  5. arXiv:1910.11915  [pdf, ps, other

    eess.AS cs.SD

    Unsupervised Feature Enhancement for speaker verification

    Authors: Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Paola García-Perera, Najim Dehak

    Abstract: The task of making speaker verification systems robust to adverse scenarios remain a challenging and an active area of research. We developed an unsupervised feature enhancement approach in log-filter bank domain with the end goal of improving speaker verification performance. We experimented with using both real speech recorded in adverse environments and degraded speech obtained by simulation to… ▽ More

    Submitted 14 February, 2020; v1 submitted 25 October, 2019; originally announced October 2019.

    Comments: 5 pages; accepted in ICASSP 2020

  6. arXiv:1910.11646  [pdf, other

    eess.AS cs.SD

    Overlap-aware diarization: resegmentation using neural end-to-end overlapped speech detection

    Authors: Latané Bullock, Hervé Bredin, Leibny Paola Garcia-Perera

    Abstract: We address the problem of effectively handling overlap** speech in a diarization system. First, we detail a neural Long Short-Term Memory-based architecture for overlap detection. Secondly, detected overlap regions are exploited in conjunction with a frame-level speaker posterior matrix to make two-speaker assignments for overlapped frames in the resegmentation step. The overlap detection module… ▽ More

    Submitted 25 October, 2019; originally announced October 2019.

    Comments: Submitted to ICASSP 2020

  7. arXiv:1910.10655  [pdf, other

    eess.AS

    End-to-end Domain-Adversarial Voice Activity Detection

    Authors: Marvin Lavechin, Marie-Philippe Gill, Ruben Bousbib, Hervé Bredin, Leibny Paola Garcia-Perera

    Abstract: Voice activity detection is the task of detecting speech regions in a given audio stream or recording. First, we design a neural network combining trainable filters and recurrent layers to tackle voice activity detection directly from the waveform. Experiments on the challenging DIHARD dataset show that the proposed end-to-end model reaches state-of-the-art performance and outperforms a variant wh… ▽ More

    Submitted 26 May, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

    Comments: submitted to Interspeech 2020

    ACM Class: I.2.7