Skip to main content

Showing 1–5 of 5 results for author: Broughton, S J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.12600  [pdf, other

    cs.SD eess.AS

    EEND-M2F: Masked-attention mask transformers for speaker diarization

    Authors: Marc Härkönen, Samuel J. Broughton, Lahiru Samarakoon

    Abstract: In this paper, we make the explicit connection between image segmentation methods and end-to-end diarization methods. From these insights, we propose a novel, fully end-to-end diarization model, EEND-M2F, based on the Mask2Former architecture. Speaker representations are computed in parallel using a stack of transformer decoders, in which irrelevant frames are explicitly masked from the cross atte… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: 14 pages, 2 figures

  2. arXiv:2312.07136  [pdf, other

    cs.SD eess.AS

    Robust End-to-End Diarization with Domain Adaptive Training and Multi-Task Learning

    Authors: Ivan Fung, Lahiru Samarakoon, Samuel J. Broughton

    Abstract: Due to the scarcity of publicly available diarization data, the model performance can be improved by training a single model with data from different domains. In this work, we propose to incorporate domain information to train a single end-to-end diarization model for multiple domains. First, we employ domain adaptive training with parameter-efficient adapters for on-the-fly model reconfiguration.… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: 7 pages, 2 figures, ASRU 2023

  3. arXiv:2312.06253  [pdf, other

    cs.SD eess.AS

    Transformer Attractors for Robust and Efficient End-to-End Neural Diarization

    Authors: Lahiru Samarakoon, Samuel J. Broughton, Marc Härkönen, Ivan Fung

    Abstract: End-to-end neural diarization with encoder-decoder based attractors (EEND-EDA) is a method to perform diarization in a single neural network. EDA handles the diarization of a flexible number of speakers by using an LSTM-based encoder-decoder that generates a set of speaker-wise attractors in an autoregressive manner. In this paper, we propose to replace EDA with a transformer-based attractor calcu… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: 8 pages, 1 figure, ASRU2023

  4. arXiv:2306.13863  [pdf, other

    cs.SD eess.AS

    Improving End-to-End Neural Diarization Using Conversational Summary Representations

    Authors: Samuel J. Broughton, Lahiru Samarakoon

    Abstract: Speaker diarization is a task concerned with partitioning an audio recording by speaker identity. End-to-end neural diarization with encoder-decoder based attractor calculation (EEND-EDA) aims to solve this problem by directly outputting diarization results for a flexible number of speakers. Currently, the EDA module responsible for generating speaker-wise attractors is conditioned on zero vectors… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

    Comments: 5 pages, 1 figure, INTERSPEECH 2023

  5. arXiv:2102.11420  [pdf, other

    cs.SD eess.AS

    Investigating Deep Neural Structures and their Interpretability in the Domain of Voice Conversion

    Authors: Samuel J. Broughton, Md Asif Jalal, Roger K. Moore

    Abstract: Generative Adversarial Networks (GANs) are machine learning networks based around creating synthetic data. Voice Conversion (VC) is a subset of voice translation that involves translating the paralinguistic features of a source speaker to a target speaker while preserving the linguistic information. The aim of non-parallel conditional GANs for VC is to translate an acoustic speech feature sequence… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

    Comments: For demo, see https://samuelbroughton.github.io/interpretability-demo-2020/