Skip to main content

Showing 1–10 of 10 results for author: Cord-Landwehr, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.03155  [pdf, other

    eess.AS

    Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment

    Authors: Christoph Boeddeker, Tobias Cord-Landwehr, Reinhold Haeb-Umbach

    Abstract: Diarization is a crucial component in meeting transcription systems to ease the challenges of speech enhancement and attribute the transcriptions to the correct speaker. Particularly in the presence of overlap** or noisy speech, these systems have problems reliably assigning the correct speaker labels, leading to a significant amount of speaker confusion errors. We propose to add segment-level s… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted for Interspeech 2024

  2. arXiv:2401.03963  [pdf, other

    eess.AS

    Geodesic interpolation of frame-wise speaker embeddings for the diarization of meeting scenarios

    Authors: Tobias Cord-Landwehr, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla, Reinhold Haeb-Umbach

    Abstract: We propose a modified teacher-student training for the extraction of frame-wise speaker embeddings that allows for an effective diarization of meeting scenarios containing partially overlap** speech. To this end, a geodesic distance loss is used that enforces the embeddings computed from regions with two active speakers to lie on the shortest path on a sphere between the points given by the d-ve… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted at ICASSP 2024

  3. arXiv:2309.16482  [pdf, ps, other

    eess.AS cs.SD

    Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization

    Authors: Thilo von Neumann, Christoph Boeddeker, Tobias Cord-Landwehr, Marc Delcroix, Reinhold Haeb-Umbach

    Abstract: We propose a modular pipeline for the single-channel separation, recognition, and diarization of meeting-style recordings and evaluate it on the Libri-CSS dataset. Using a Continuous Speech Separation (CSS) system with a TF-GridNet separation architecture, followed by a speaker-agnostic speech recognizer, we achieve state-of-the-art recognition performance in terms of Optimal Reference Combination… ▽ More

    Submitted 6 May, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: Accepted at HSCMA Sattelite Workshop at ICASSP 2024

  4. A Teacher-Student approach for extracting informative speaker embeddings from speech mixtures

    Authors: Tobias Cord-Landwehr, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla, Reinhold Haeb-Umbach

    Abstract: We introduce a monaural neural speaker embeddings extractor that computes an embedding for each speaker present in a speech mixture. To allow for supervised training, a teacher-student approach is employed: the teacher computes the target embeddings from each speaker's utterance before the utterances are added to form the mixture, and the student embedding extractor is then tasked to reproduce tho… ▽ More

    Submitted 19 September, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Proceedings of INTERSPEECH

  5. arXiv:2306.00625  [pdf, other

    eess.AS

    Frame-wise and overlap-robust speaker embeddings for meeting diarization

    Authors: Tobias Cord-Landwehr, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla, Reinhold Haeb-Umbach

    Abstract: Using a Teacher-Student training approach we developed a speaker embedding extraction system that outputs embeddings at frame rate. Given this high temporal resolution and the fact that the student produces sensible speaker embeddings even for segments with speech overlap, the frame-wise embeddings serve as an appropriate representation of the input speech signal for an end-to-end neural meeting d… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: ICASSP 2023

  6. arXiv:2209.11494  [pdf, other

    eess.AS

    MMS-MSG: A Multi-purpose Multi-Speaker Mixture Signal Generator

    Authors: Tobias Cord-Landwehr, Thilo von Neumann, Christoph Boeddeker, Reinhold Haeb-Umbach

    Abstract: The scope of speech enhancement has changed from a monolithic view of single, independent tasks, to a joint processing of complex conversational speech recordings. Training and evaluation of these single tasks requires synthetic data with access to intermediate signals that is as close as possible to the evaluation scenario. As such data often is not available, many works instead use specialized d… ▽ More

    Submitted 23 September, 2022; originally announced September 2022.

    Comments: Accepted at IWAENC 2022

  7. arXiv:2205.00944  [pdf, other

    eess.AS cs.SD

    A Meeting Transcription System for an Ad-Hoc Acoustic Sensor Network

    Authors: Tobias Gburrek, Christoph Boeddeker, Thilo von Neumann, Tobias Cord-Landwehr, Joerg Schmalenstroeer, Reinhold Haeb-Umbach

    Abstract: We propose a system that transcribes the conversation of a typical meeting scenario that is captured by a set of initially unsynchronized microphone arrays at unknown positions. It consists of subsystems for signal synchronization, including both sampling rate and sampling time offset estimation, diarization based on speaker and microphone array position estimation, multi-channel speech enhancemen… ▽ More

    Submitted 2 May, 2022; originally announced May 2022.

    Comments: Submitted to INTERSPEECH 2022

  8. arXiv:2204.01338  [pdf, other

    cs.SD eess.AS

    An Initialization Scheme for Meeting Separation with Spatial Mixture Models

    Authors: Christoph Boeddeker, Tobias Cord-Landwehr, Thilo von Neumann, Reinhold Haeb-Umbach

    Abstract: Spatial mixture model (SMM) supported acoustic beamforming has been extensively used for the separation of simultaneously active speakers. However, it has hardly been considered for the separation of meeting data, that are characterized by long recordings and only partially overlap** speech. In this contribution, we show that the fact that often only a single speaker is active can be utilized fo… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: Submitted to INTERSPEECH 2022

  9. arXiv:2111.07578  [pdf, other

    eess.AS cs.SD

    Monaural source separation: From anechoic to reverberant environments

    Authors: Tobias Cord-Landwehr, Christoph Boeddeker, Thilo von Neumann, Catalin Zorila, Rama Doddipatla, Reinhold Haeb-Umbach

    Abstract: Impressive progress in neural network-based single-channel speech source separation has been made in recent years. But those improvements have been mostly reported on anechoic data, a situation that is hardly met in practice. Taking the SepFormer as a starting point, which achieves state-of-the-art performance on anechoic mixtures, we gradually modify it to optimize its performance on reverberant… ▽ More

    Submitted 10 May, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: Submitted to IWAENC 2022

  10. arXiv:2005.12963  [pdf, ps, other

    eess.AS cs.SD

    Contrastive Predictive Coding Supported Factorized Variational Autoencoder for Unsupervised Learning of Disentangled Speech Representations

    Authors: Janek Ebbers, Michael Kuhlmann, Tobias Cord-Landwehr, Reinhold Haeb-Umbach

    Abstract: In this work we address disentanglement of style and content in speech signals. We propose a fully convolutional variational autoencoder employing two encoders: a content encoder and a style encoder. To foster disentanglement, we propose adversarial contrastive predictive coding. This new disentanglement method does neither need parallel data nor any supervision. We show that the proposed techniqu… ▽ More

    Submitted 11 March, 2021; v1 submitted 26 May, 2020; originally announced May 2020.

    Comments: accepted by icassp 2021