Skip to main content

Showing 1–14 of 14 results for author: Bredin, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2403.02288  [pdf, other

    eess.AS

    PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings

    Authors: Joonas Kalda, Clément Pagés, Ricard Marxer, Tanel Alumäe, Hervé Bredin

    Abstract: A major drawback of supervised speech separation (SSep) systems is their reliance on synthetic data, leading to poor real-world generalization. Mixture invariant training (MixIT) was proposed as an unsupervised alternative that uses real recordings, yet struggles with overseparation and adapting to long-form audio. We introduce PixIT, a joint approach that combines permutation invariant training (… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: submitted to Speaker Odyssey 2024

  2. arXiv:2310.13025  [pdf, other

    cs.SD cs.AI cs.CL cs.NE eess.AS

    Powerset multi-class cross entropy loss for neural speaker diarization

    Authors: Alexis Plaquet, Hervé Bredin

    Abstract: Since its introduction in 2019, the whole end-to-end neural diarization (EEND) line of work has been addressing speaker diarization as a frame-wise multi-label classification problem with permutation-invariant training. Despite EEND showing great promise, a few recent works took a step back and studied the possible combination of (local) supervised EEND diarization with (global) unsupervised clust… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Journal ref: INTERSPEECH 2023, Aug 2023, Dublin, Ireland. pp.3222-3226

  3. arXiv:2306.01506  [pdf, other

    cs.CL eess.AS stat.ML

    BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models

    Authors: Marvin Lavechin, Yaya Sy, Hadrien Titeux, María Andrea Cruz Blandón, Okko Räsänen, Hervé Bredin, Emmanuel Dupoux, Alejandrina Cristia

    Abstract: Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels. In order to fully realize the potential of these approaches and further our understanding of how infants learn language, simulations must closely emulate real-life situations by training on developmentally plausible corpora and b… ▽ More

    Submitted 8 June, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: Proceedings of Interspeech 2023

  4. arXiv:2210.13248  [pdf, other

    eess.AS cs.SD

    Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation

    Authors: Marvin Lavechin, Marianne Métais, Hadrien Titeux, Alodie Boissonnet, Jade Copet, Morgane Rivière, Elika Bergelson, Alejandrina Cristia, Emmanuel Dupoux, Hervé Bredin

    Abstract: Most automatic speech processing systems register degraded performance when applied to noisy or reverberant speech. But how can one tell whether speech is noisy or reverberant? We propose Brouhaha, a neural network jointly trained to extract speech/non-speech segments, speech-to-noise ratios, and C50room acoustics from single-channel recordings. Brouhaha is trained using a data-driven approach in… ▽ More

    Submitted 25 May, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

  5. arXiv:2109.06483  [pdf, other

    eess.AS cs.SD

    Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation

    Authors: Juan M. Coria, Hervé Bredin, Sahar Ghannay, Sophie Rosset

    Abstract: We propose to address online speaker diarization as a combination of incremental clustering and local diarization applied to a rolling buffer updated every 500ms. Every single step of the proposed pipeline is designed to take full advantage of the strong ability of a recently proposed end-to-end overlap-aware segmentation to detect and separate overlap** speakers. In particular, we propose a mod… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: To appear in ASRU 2021. Code available at https://github.com/juanmc2005/StreamingSpeakerDiarization/

  6. arXiv:2104.04045  [pdf, other

    eess.AS cs.SD

    End-to-end speaker segmentation for overlap-aware resegmentation

    Authors: Hervé Bredin, Antoine Laurent

    Abstract: Speaker segmentation consists in partitioning a conversation between one or more speakers into speaker turns. Usually addressed as the late combination of three sub-tasks (voice activity detection, speaker change detection, and overlapped speech detection), we propose to train an end-to-end segmentation model that does it directly. Inspired by the original end-to-end neural speaker diarization app… ▽ More

    Submitted 10 June, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

    Comments: Camera-ready version for Interspeech 2021 with significantly better voice activity detection, overlapped speech detection, and speaker diarization results. The code used for results reported in v1 contained a small bug that has now been fixed

  7. arXiv:2005.12656  [pdf, other

    eess.AS

    An open-source voice type classifier for child-centered daylong recordings

    Authors: Marvin Lavechin, Ruben Bousbib, Hervé Bredin, Emmanuel Dupoux, Alejandrina Cristia

    Abstract: Spontaneous conversations in real-world settings such as those found in child-centered recordings have been shown to be amongst the most challenging audio files to process. Nevertheless, building speech processing models handling such a wide variety of conditions would be particularly useful for language acquisition studies in which researchers are interested in the quantity and quality of the spe… ▽ More

    Submitted 22 January, 2021; v1 submitted 26 May, 2020; originally announced May 2020.

    Comments: accepted to Interspeech 2020

    ACM Class: I.2.7

  8. arXiv:2003.14021  [pdf, ps, other

    cs.LG cs.SD eess.AS stat.ML

    A Comparison of Metric Learning Loss Functions for End-To-End Speaker Verification

    Authors: Juan M. Coria, Hervé Bredin, Sahar Ghannay, Sophie Rosset

    Abstract: Despite the growing popularity of metric learning approaches, very little work has attempted to perform a fair comparison of these techniques for speaker verification. We try to fill this gap and compare several metric learning loss functions in a systematic manner on the VoxCeleb dataset. The first family of loss functions is derived from the cross entropy loss (usually used for supervised classi… ▽ More

    Submitted 31 March, 2020; originally announced March 2020.

  9. arXiv:1912.00938  [pdf

    eess.AS cs.SD

    Speaker detection in the wild: Lessons learned from JSALT 2019

    Authors: Paola Garcia, Jesus Villalba, Herve Bredin, Jun Du, Diego Castan, Alejandrina Cristia, Latane Bullock, Ling Guo, Koji Okabe, Phani Sankar Nidadavolu, Saurabh Kataria, Sizhu Chen, Leo Galmant, Marvin Lavechin, Lei Sun, Marie-Philippe Gill, Bar Ben-Yair, Sajjad Abdoli, Xin Wang, Wassim Bouaziz, Hadrien Titeux, Emmanuel Dupoux, Kong Aik Lee, Najim Dehak

    Abstract: This paper presents the problems and solutions addressed at the JSALT workshop when using a single microphone for speaker detection in adverse scenarios. The main focus was to tackle a wide range of conditions that go from meetings to wild speech. We describe the research threads we explored and a set of modules that was successful for these scenarios. The ultimate goal was to explore speaker dete… ▽ More

    Submitted 2 December, 2019; originally announced December 2019.

    Comments: Submitted to ICASSP 2020

  10. arXiv:1911.02388  [pdf, other

    eess.AS cs.LG cs.SD

    The Speed Submission to DIHARD II: Contributions & Lessons Learned

    Authors: Md Sahidullah, Jose Patino, Samuele Cornell, Ruiqing Yin, Sunit Sivasankaran, Hervé Bredin, Pavel Korshunov, Alessio Brutti, Romain Serizel, Emmanuel Vincent, Nicholas Evans, Sébastien Marcel, Stefano Squartini, Claude Barras

    Abstract: This paper describes the speaker diarization systems developed for the Second DIHARD Speech Diarization Challenge (DIHARD II) by the Speed team. Besides describing the system, which considerably outperformed the challenge baselines, we also focus on the lessons learned from numerous approaches that we tried for single and multi-channel systems. We present several components of our diarization syst… ▽ More

    Submitted 6 November, 2019; originally announced November 2019.

  11. arXiv:1911.01255  [pdf, other

    eess.AS cs.SD

    pyannote.audio: neural building blocks for speaker diarization

    Authors: Hervé Bredin, Ruiqing Yin, Juan Manuel Coria, Gregory Gelly, Pavel Korshunov, Marvin Lavechin, Diego Fustes, Hadrien Titeux, Wassim Bouaziz, Marie-Philippe Gill

    Abstract: We introduce pyannote.audio, an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines. pyannote.audio also comes with pre-trained models covering a wide range of domains for voice activity detection,… ▽ More

    Submitted 4 November, 2019; originally announced November 2019.

    Comments: Submitted to ICASSP 2020

  12. arXiv:1910.11646  [pdf, other

    eess.AS cs.SD

    Overlap-aware diarization: resegmentation using neural end-to-end overlapped speech detection

    Authors: Latané Bullock, Hervé Bredin, Leibny Paola Garcia-Perera

    Abstract: We address the problem of effectively handling overlap** speech in a diarization system. First, we detail a neural Long Short-Term Memory-based architecture for overlap detection. Secondly, detected overlap regions are exploited in conjunction with a frame-level speaker posterior matrix to make two-speaker assignments for overlapped frames in the resegmentation step. The overlap detection module… ▽ More

    Submitted 25 October, 2019; originally announced October 2019.

    Comments: Submitted to ICASSP 2020

  13. arXiv:1910.10655  [pdf, other

    eess.AS

    End-to-end Domain-Adversarial Voice Activity Detection

    Authors: Marvin Lavechin, Marie-Philippe Gill, Ruben Bousbib, Hervé Bredin, Leibny Paola Garcia-Perera

    Abstract: Voice activity detection is the task of detecting speech regions in a given audio stream or recording. First, we design a neural network combining trainable filters and recurrent layers to tackle voice activity detection directly from the waveform. Experiments on the challenging DIHARD dataset show that the proposed end-to-end model reaches state-of-the-art performance and outperforms a variant wh… ▽ More

    Submitted 26 May, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

    Comments: submitted to Interspeech 2020

    ACM Class: I.2.7

  14. arXiv:1907.10393  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization

    Authors: Qingjian Lin, Ruiqing Yin, Ming Li, Hervé Bredin, Claude Barras

    Abstract: More and more neural network approaches have achieved considerable improvement upon submodules of speaker diarization system, including speaker change detection and segment-wise speaker embedding extraction. Still, in the clustering stage, traditional algorithms like probabilistic linear discriminant analysis (PLDA) are widely used for scoring the similarity between two speech segments. In this pa… ▽ More

    Submitted 23 July, 2019; originally announced July 2019.

    Comments: Accepted for INTERSPEECH 2019