Skip to main content

Showing 1–14 of 14 results for author: Silnova, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.12622  [pdf, ps, other

    eess.AS

    Challenging margin-based speaker embedding extractors by using the variational information bottleneck

    Authors: Themos Stafylakis, Anna Silnova, Johan Rohdin, Oldrich Plchot, Lukas Burget

    Abstract: Speaker embedding extractors are typically trained using a classification loss over the training speakers. During the last few years, the standard softmax/cross-entropy loss has been replaced by the margin-based losses, yielding significant improvements in speaker recognition accuracy. Motivated by the fact that the margin merely reduces the logit of the target speaker during training, we consider… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2402.19325  [pdf, other

    cs.SD eess.AS

    Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?

    Authors: Lin Zhang, Themos Stafylakis, Federico Landini, Mireia Diez, Anna Silnova, Lukáš Burget

    Abstract: In this paper, we apply the variational information bottleneck approach to end-to-end neural diarization with encoder-decoder attractors (EEND-EDA). This allows us to investigate what information is essential for the model. EEND-EDA utilizes attractors, vector representations of speakers in a conversation. Our analysis shows that, attractors do not necessarily have to contain speaker characteristi… ▽ More

    Submitted 20 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted to Odyssey 2024. This arXiv version includes an appendix for more visualizations. Code: https://github.com/BUTSpeechFIT/EENDEDA_VIB

  3. arXiv:2310.02732  [pdf, ps, other

    eess.AS cs.SD

    Discriminative Training of VBx Diarization

    Authors: Dominik Klement, Mireia Diez, Federico Landini, Lukáš Burget, Anna Silnova, Marc Delcroix, Naohiro Tawara

    Abstract: Bayesian HMM clustering of x-vector sequences (VBx) has become a widely adopted diarization baseline model in publications and challenges. It uses an HMM to model speaker turns, a generatively trained probabilistic linear discriminant analysis (PLDA) for speaker distribution modeling, and Bayesian inference to estimate the assignment of x-vectors to speakers. This paper presents a new framework fo… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: Submitted to ICASSP 2024

  4. arXiv:2305.13580  [pdf, other

    eess.AS cs.SD

    Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization

    Authors: Marc Delcroix, Naohiro Tawara, Mireia Diez, Federico Landini, Anna Silnova, Atsunori Ogawa, Tomohiro Nakatani, Lukas Burget, Shoko Araki

    Abstract: Combining end-to-end neural speaker diarization (EEND) with vector clustering (VC), known as EEND-VC, has gained interest for leveraging the strengths of both methods. EEND-VC estimates activities and speaker embeddings for all speakers within an audio chunk and uses VC to associate these activities with speaker identities across different chunks. EEND-VC generates thus multiple streams of embeddi… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted at Interspeech 2023

  5. arXiv:2210.15441  [pdf, ps, other

    cs.SD eess.AS stat.ML

    Toroidal Probabilistic Spherical Discriminant Analysis

    Authors: Anna Silnova, Niko Brümmer, Albert Swart, Lukáš Burget

    Abstract: In speaker recognition, where speech segments are mapped to embeddings on the unit hypersphere, two scoring back-ends are commonly used, namely cosine scoring and PLDA. We have recently proposed PSDA, an analog to PLDA that uses Von Mises-Fisher distributions instead of Gaussians. In this paper, we present toroidal PSDA (T-PSDA). It extends PSDA with the ability to model within and between-speaker… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  6. arXiv:2203.15436  [pdf, other

    eess.AS

    Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries

    Authors: Themos Stafylakis, Ladislav Mošner, Oldřich Plchot, Johan Rohdin, Anna Silnova, Lukáš Burget, Jan "Honza'' Černocký

    Abstract: In this paper, we demonstrate a method for training speaker embedding extractors using weak annotation. More specifically, we are using the full VoxCeleb recordings and the name of the celebrities appearing on each video without knowledge of the time intervals the celebrities appear in the video. We show that by combining a baseline speaker diarization algorithm that requires no training or parame… ▽ More

    Submitted 9 August, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: Accepted at Interspeech 2022

  7. arXiv:2203.10300  [pdf, other

    eess.AS

    Analyzing speaker verification embedding extractors and back-ends under language and channel mismatch

    Authors: Anna Silnova, Themos Stafylakis, Ladislav Mosner, Oldrich Plchot, Johan Rohdin, Pavel Matejka, Lukas Burget, Ondrej Glembek, Niko Brummer

    Abstract: In this paper, we analyze the behavior and performance of speaker embeddings and the back-end scoring model under domain and language mismatch. We present our findings regarding ResNet-based speaker embedding architectures and show that reduced temporal stride yields improved performance. We then consider a PLDA back-end and show how a combination of small speaker subspace, language-dependent PLDA… ▽ More

    Submitted 19 March, 2022; originally announced March 2022.

    Comments: Submitted to Odyssey 2022, under review

  8. arXiv:2010.11718  [pdf, ps, other

    eess.AS cs.SD

    Analysis of the BUT Diarization System for VoxConverse Challenge

    Authors: Federico Landini, Ondřej Glembek, Pavel Matějka, Johan Rohdin, Lukáš Burget, Mireia Diez, Anna Silnova

    Abstract: This paper describes the system developed by the BUT team for the fourth track of the VoxCeleb Speaker Recognition Challenge, focusing on diarization on the VoxConverse dataset. The system consists of signal pre-processing, voice activity detection, speaker embedding extraction, an initial agglomerative hierarchical clustering followed by diarization using a Bayesian hidden Markov model, a reclust… ▽ More

    Submitted 9 February, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: Accepted to ICASSP 2021

  9. arXiv:2004.04096  [pdf, ps, other

    eess.AS cs.LG cs.SD stat.ML

    Probabilistic embeddings for speaker diarization

    Authors: Anna Silnova, Niko Brümmer, Johan Rohdin, Themos Stafylakis, Lukáš Burget

    Abstract: Speaker embeddings (x-vectors) extracted from very short segments of speech have recently been shown to give competitive performance in speaker diarization. We generalize this recipe by extracting from each speech segment, in parallel with the x-vector, also a diagonal precision matrix, thus providing a path for the propagation of information about the quality of the speech segment into a PLDA sco… ▽ More

    Submitted 6 November, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

    Comments: Awarded: Jack Godfrey Best Student Paper Award, at Odyssey 2020: The Speaker and Language Recognition Workshop, Tokio

  10. arXiv:2002.11356  [pdf, ps, other

    eess.AS

    BUT System for the Second DIHARD Speech Diarization Challenge

    Authors: Federico Landini, Shuai Wang, Mireia Diez, Lukáš Burget, Pavel Matějka, Kateřina Žmolíková, Ladislav Mošner, Anna Silnova, Oldřich Plchot, Ondřej Novotný, Hossein Zeinali, Johan Rohdin

    Abstract: This paper describes the winning systems developed by the BUT team for the four tracks of the Second DIHARD Speech Diarization Challenge. For tracks 1 and 2 the systems were mainly based on performing agglomerative hierarchical clustering (AHC) of x-vectors, followed by another x-vector clustering based on Bayes hidden Markov model and variational Bayes inference. We provide a comparison of the im… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

  11. arXiv:1910.12592  [pdf, ps, other

    eess.AS cs.CL cs.SD

    BUT System Description to VoxCeleb Speaker Recognition Challenge 2019

    Authors: Hossein Zeinali, Shuai Wang, Anna Silnova, Pavel Matějka, Oldřich Plchot

    Abstract: In this report, we describe the submission of Brno University of Technology (BUT) team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2019. We also provide a brief analysis of different systems on VoxCeleb-1 test sets. Submitted systems for both Fixed and Open conditions are a fusion of 4 Convolutional Neural Network (CNN) topologies. The first and second networks have ResNet34 topology an… ▽ More

    Submitted 16 October, 2019; originally announced October 2019.

  12. arXiv:1907.06112  [pdf, ps, other

    eess.AS cs.CL cs.SD

    BUT VOiCES 2019 System Description

    Authors: Hossein Zeinali, Pavel Matějka, Ladislav Mošner, Oldřich Plchot, Anna Silnova, Ondřej Novotný, Ján Profant, Ondřej Glembek, Lukáš Burget

    Abstract: This is a description of our effort in VOiCES 2019 Speaker Recognition challenge. All systems in the fixed condition are based on the x-vector paradigm with different features and DNN topologies. The single best system reaches 1.2% EER and a fusion of 3 systems yields 1.0% EER, which is 15% relative improvement. The open condition allowed us to use external data which we did for the PLDA adaptatio… ▽ More

    Submitted 13 July, 2019; originally announced July 2019.

  13. arXiv:1811.02331  [pdf, other

    eess.AS cs.SD

    Speaker verification using end-to-end adversarial language adaptation

    Authors: Johan Rohdin, Themos Stafylakis, Anna Silnova, Hossein Zeinali, Lukas Burget, Oldrich Plchot

    Abstract: In this paper we investigate the use of adversarial domain adaptation for addressing the problem of language mismatch between speaker recognition corpora. In the context of speaker verification, adversarial domain adaptation methods aim at minimizing certain divergences between the distribution that the utterance-level features follow (i.e. speaker embeddings) when drawn from source and target dom… ▽ More

    Submitted 6 November, 2018; originally announced November 2018.

  14. arXiv:1710.02369  [pdf, other

    eess.AS cs.SD

    End-to-end DNN Based Speaker Recognition Inspired by i-vector and PLDA

    Authors: Johan Rohdin, Anna Silnova, Mireia Diez, Oldrich Plchot, Pavel Matejka, Lukas Burget

    Abstract: Recently several end-to-end speaker verification systems based on deep neural networks (DNNs) have been proposed. These systems have been proven to be competitive for text-dependent tasks as well as for text-independent tasks with short utterances. However, for text-independent tasks with longer utterances, end-to-end systems are still outperformed by standard i-vector + PLDA systems. In this work… ▽ More

    Submitted 8 January, 2018; v1 submitted 6 October, 2017; originally announced October 2017.