Skip to main content

Showing 1–13 of 13 results for author: Thienpondt, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.09142  [pdf, other

    eess.AS cs.SD

    Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization

    Authors: Jenthe Thienpondt, Kris Demuynck

    Abstract: Current speaker diarization systems rely on an external voice activity detection model prior to speaker embedding extraction on the detected speech segments. In this paper, we establish that the attention system of a speaker embedding extractor acts as a weakly supervised internal VAD model and performs equally or better than comparable supervised VAD systems. Subsequently, speaker diarization can… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop

  2. arXiv:2401.08342  [pdf, other

    eess.AS

    ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings

    Authors: Jenthe Thienpondt, Kris Demuynck

    Abstract: In this paper, we present ECAPA2, a novel hybrid neural network architecture and training strategy to produce robust speaker embeddings. Most speaker verification models are based on either the 1D- or 2D-convolutional operation, often manifested as Time Delay Neural Networks or ResNets, respectively. Hybrid models are relatively unexplored without an intuitive explanation what constitutes best pra… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: proceedings of ASRU 2023

  3. arXiv:2307.04744  [pdf, other

    eess.AS

    Behavioral Analysis of Pathological Speaker Embeddings of Patients During Oncological Treatment of Oral Cancer

    Authors: Jenthe Thienpondt, Caroline M. Speksnijder, Kris Demuynck

    Abstract: In this paper, we analyze the behavior of speaker embeddings of patients during oral cancer treatment. First, we found that pre- and post-treatment speaker embeddings differ significantly, notifying a substantial change in voice characteristics. However, a partial recovery to pre-operative voice traits is observed after 12 months post-operation. Secondly, the same-speaker similarity at distinct tr… ▽ More

    Submitted 12 August, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

    Comments: proceedings of INTERSPEECH 2023

  4. arXiv:2304.03515  [pdf, other

    eess.AS cs.SD

    Margin-Mixup: A Method for Robust Speaker Verification in Multi-Speaker Audio

    Authors: Jenthe Thienpondt, Nilesh Madhu, Kris Demuynck

    Abstract: This paper is concerned with the task of speaker verification on audio with multiple overlap** speakers. Most speaker verification systems are designed with the assumption of a single speaker being present in a given audio segment. However, in a real-world setting this assumption does not always hold. In this paper, we demonstrate that current speaker verification systems are not robust against… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

    Comments: proceedings of ICASSP 2023

  5. arXiv:2206.09396  [pdf, other

    eess.AS cs.CL cs.SD

    Transfer Learning for Robust Low-Resource Children's Speech ASR with Transformers and Source-Filter War**

    Authors: Jenthe Thienpondt, Kris Demuynck

    Abstract: Automatic Speech Recognition (ASR) systems are known to exhibit difficulties when transcribing children's speech. This can mainly be attributed to the absence of large children's speech corpora to train robust ASR models and the resulting domain mismatch when decoding children's speech with systems trained on adult data. In this paper, we propose multiple enhancements to alleviate these issues. Fi… ▽ More

    Submitted 19 June, 2022; originally announced June 2022.

    Comments: proceedings of INTERSPEECH 2022

  6. Tackling the Score Shift in Cross-Lingual Speaker Verification by Exploiting Language Information

    Authors: Jenthe Thienpondt, Brecht Desplanques, Kris Demuynck

    Abstract: This paper contains a post-challenge performance analysis on cross-lingual speaker verification of the IDLab submission to the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). We show that current speaker embedding extractors consistently underestimate speaker similarity in within-speaker cross-lingual trials. Consequently, the typical training and scoring protocols do not put enough empha… ▽ More

    Submitted 19 June, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

    Comments: proceedings of ICASSP 2022

  7. arXiv:2109.04070  [pdf, other

    eess.AS cs.SD

    The IDLAB VoxCeleb Speaker Recognition Challenge 2021 System Description

    Authors: Jenthe Thienpondt, Brecht Desplanques, Kris Demuynck

    Abstract: This technical report describes the IDLab submission for track 1 and 2 of the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). This speaker verification competition focuses on short duration test recordings and cross-lingual trials. Currently, both Time Delay Neural Networks (TDNNs) and ResNets achieve state-of-the-art results in speaker verification. We opt to use a system fusion of hybri… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2104.02370

  8. Integrating Frequency Translational Invariance in TDNNs and Frequency Positional Information in 2D ResNets to Enhance Speaker Verification

    Authors: Jenthe Thienpondt, Brecht Desplanques, Kris Demuynck

    Abstract: This paper describes the IDLab submission for the text-independent task of the Short-duration Speaker Verification Challenge 2021 (SdSVC-21). This speaker verification competition focuses on short duration test recordings and cross-lingual trials, along with the constraint of limited availability of in-domain DeepMine Farsi training data. Currently, both Time Delay Neural Networks (TDNNs) and ResN… ▽ More

    Submitted 9 September, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

    Comments: proceedings of INTERSPEECH 2021

  9. ECAPA-TDNN Embeddings for Speaker Diarization

    Authors: Nauman Dawalatabad, Mirco Ravanelli, François Grondin, Jenthe Thienpondt, Brecht Desplanques, Hwidong Na

    Abstract: Learning robust speaker embeddings is a crucial step in speaker diarization. Deep neural networks can accurately capture speaker discriminative characteristics and popular deep embeddings such as x-vectors are nowadays a fundamental component of modern diarization systems. Recently, some improvements over the standard TDNN architecture used for x-vectors have been proposed. The ECAPA-TDNN model, f… ▽ More

    Submitted 3 April, 2021; originally announced April 2021.

  10. arXiv:2010.12468  [pdf, other

    eess.AS cs.SD

    The IDLAB VoxCeleb Speaker Recognition Challenge 2020 System Description

    Authors: Jenthe Thienpondt, Brecht Desplanques, Kris Demuynck

    Abstract: In this technical report we describe the IDLAB top-scoring submissions for the VoxCeleb Speaker Recognition Challenge 2020 (VoxSRC-20) in the supervised and unsupervised speaker verification tracks. For the supervised verification tracks we trained 6 state-of-the-art ECAPA-TDNN systems and 4 Resnet34 based systems with architectural variations. On all models we apply a large margin fine-tuning str… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

  11. The IDLAB VoxSRC-20 Submission: Large Margin Fine-Tuning and Quality-Aware Score Calibration in DNN Based Speaker Verification

    Authors: Jenthe Thienpondt, Brecht Desplanques, Kris Demuynck

    Abstract: In this paper we propose and analyse a large margin fine-tuning strategy and a quality-aware score calibration in text-independent speaker verification. Large margin fine-tuning is a secondary training stage for DNN based speaker verification systems trained with margin-based loss functions. It enables the network to create more robust speaker embeddings by enabling the use of longer training utte… ▽ More

    Submitted 6 April, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

    Comments: proceedings of ICASSP 2021

  12. Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization

    Authors: Jenthe Thienpondt, Brecht Desplanques, Kris Demuynck

    Abstract: In this paper we describe the top-scoring IDLab submission for the text-independent task of the Short-duration Speaker Verification (SdSV) Challenge 2020. The main difficulty of the challenge exists in the large degree of varying phonetic overlap between the potentially cross-lingual trials, along with the limited availability of in-domain DeepMine Farsi training data. We introduce domain-balanced… ▽ More

    Submitted 10 August, 2020; v1 submitted 15 July, 2020; originally announced July 2020.

    Comments: proceedings of INTERSPEECH 2020

  13. ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification

    Authors: Brecht Desplanques, Jenthe Thienpondt, Kris Demuynck

    Abstract: Current speaker verification techniques rely on a neural network to extract speaker representations. The successful x-vector architecture is a Time Delay Neural Network (TDNN) that applies statistics pooling to project variable-length utterances into fixed-length speaker characterizing embeddings. In this paper, we propose multiple enhancements to this architecture based on recent trends in the re… ▽ More

    Submitted 10 August, 2020; v1 submitted 14 May, 2020; originally announced May 2020.

    Comments: proceedings of INTERSPEECH 2020