Skip to main content

Showing 1–10 of 10 results for author: Lleida, E

.
  1. arXiv:2406.16145  [pdf, other

    cs.LG cs.AI

    Predefined Prototypes for Intra-Class Separation and Disentanglement

    Authors: Antonio Almudévar, Théo Mariotte, Alfonso Ortega, Marie Tahon, Luis Vicente, Antonio Miguel, Eduardo Lleida

    Abstract: Prototypical Learning is based on the idea that there is a point (which we call prototype) around which the embeddings of a class are clustered. It has shown promising results in scenarios with little labeled data or to design explainable models. Typically, prototypes are either defined as the average of the embeddings of a class or are designed to be trainable. In this work, we propose to predefi… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  2. arXiv:2305.02147  [pdf, other

    eess.AS cs.HC

    Improved Vocal Effort Transfer Vector Estimation for Vocal Effort-Robust Speaker Verification

    Authors: Iván López-Espejo, Santi Prieto, Alfonso Ortega, Eduardo Lleida

    Abstract: Despite the maturity of modern speaker verification technology, its performance still significantly degrades when facing non-neutrally-phonated (e.g., shouted and whispered) speech. To address this issue, in this paper, we propose a new speaker embedding compensation method based on a minimum mean square error (MMSE) estimator. This method models the joint distribution of the vocal effort transfer… ▽ More

    Submitted 4 July, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

  3. arXiv:2111.03842  [pdf, other

    eess.AS cs.LG cs.SD

    Class Token and Knowledge Distillation for Multi-head Self-Attention Speaker Verification Systems

    Authors: Victoria Mingote, Antonio Miguel, Alfonso Ortega, Eduardo Lleida

    Abstract: This paper explores three novel approaches to improve the performance of speaker verification (SV) systems based on deep neural networks (DNN) using Multi-head Self-Attention (MSA) mechanisms and memory layers. Firstly, we propose the use of a learnable vector called Class token to replace the average global pooling mechanism to extract the embeddings. Unlike global average pooling, our proposal t… ▽ More

    Submitted 10 February, 2023; v1 submitted 6 November, 2021; originally announced November 2021.

  4. arXiv:2110.14425  [pdf, other

    cs.SD cs.LG eess.AS

    Generalizing AUC Optimization to Multiclass Classification for Audio Segmentation With Limited Training Data

    Authors: Pablo Gimeno, Victoria Mingote, Alfonso Ortega, Antonio Miguel, Eduardo Lleida

    Abstract: Area under the ROC curve (AUC) optimisation techniques developed for neural networks have recently demonstrated their capabilities in different audio and speech related tasks. However, due to its intrinsic nature, AUC optimisation has focused only on binary tasks so far. In this paper, we introduce an extension to the AUC optimisation framework so that it can be easily applied to an arbitrary numb… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Journal ref: IEEE Signal Processing Letters, vol. 28, pp. 1135-1139, 2021

  5. arXiv:2008.02487  [pdf, other

    eess.AS cs.HC cs.LG cs.SD

    Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions

    Authors: Santi Prieto, Alfonso Ortega, Iván López-Espejo, Eduardo Lleida

    Abstract: The performance of speaker verification systems degrades when vocal effort conditions between enrollment and test (e.g., shouted vs. normal speech) are different. This is a potential situation in non-cooperative speaker verification tasks. In this paper, we present a study on different methods for linear compensation of embeddings making use of Gaussian mixture models to cluster shouted and normal… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

  6. arXiv:1904.05167  [pdf, other

    eess.AS cs.SD

    Speech Enhancement with Wide Residual Networks in Reverberant Environments

    Authors: Jorge Llombart, Dayana Ribas, Antonio Miguel, Luis Vicente, Alfonso Ortega, Eduardo Lleida

    Abstract: This paper proposes a speech enhancement method which exploits the high potential of residual connections in a Wide Residual Network architecture. This is supported on single dimensional convolutions computed alongside the time domain, which is a powerful approach to process contextually correlated representations through the temporal domain, such as speech feature sequences. We find the residual… ▽ More

    Submitted 9 April, 2019; originally announced April 2019.

    Comments: 5 pages, 4 figures. arXiv admin note: text overlap with arXiv:1901.00660, arXiv:1904.04511

  7. arXiv:1904.04511  [pdf, other

    eess.AS

    Progressive Speech Enhancement with Residual Connections

    Authors: Jorge Llombart, Dayana Ribas, Antonio Miguel, Luis Vicente, Alfonso Ortega, Eduardo Lleida

    Abstract: This paper studies the Speech Enhancement based on Deep Neural Networks. The proposed architecture gradually follows the signal transformation during enhancement by means of a visualization probe at each network block. Alongside the process, the enhancement performance is visually inspected and evaluated in terms of regression cost. This progressive scheme is based on Residual Networks. During the… ▽ More

    Submitted 9 April, 2019; originally announced April 2019.

    Comments: 5 pages, 5 figures

  8. arXiv:1901.11332  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Optimization of the Area Under the ROC Curve using Neural Network Supervectors for Text-Dependent Speaker Verification

    Authors: Victoria Mingote, Antonio Miguel, Alfonso Ortega, Eduardo Lleida

    Abstract: This paper explores two techniques to improve the performance of text-dependent speaker verification systems based on deep neural networks. Firstly, we propose a general alignment mechanism to keep the temporal structure of each phrase and obtain a supervector with the speaker and phrase information, since both are relevant for a text-dependent verification. As we show, it is possible to use diffe… ▽ More

    Submitted 30 April, 2019; v1 submitted 31 January, 2019; originally announced January 2019.

  9. arXiv:1812.11946  [pdf, ps, other

    eess.AS cs.LG cs.SD stat.ML

    Tied Hidden Factors in Neural Networks for End-to-End Speaker Recognition

    Authors: Antonio Miguel, Jorge Llombart, Alfonso Ortega, Eduardo Lleida

    Abstract: In this paper we propose a method to model speaker and session variability and able to generate likelihood ratios using neural networks in an end-to-end phrase dependent speaker verification system. As in Joint Factor Analysis, the model uses tied hidden variables to model speaker and session variability and a MAP adaptation of some of the parameters of the model. In the training procedure our met… ▽ More

    Submitted 27 December, 2018; originally announced December 2018.

    Journal ref: Proc. Interspeech 2017, 2819-2823

  10. arXiv:1812.09484  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Differentiable Supervector Extraction for Encoding Speaker and Phrase Information in Text Dependent Speaker Verification

    Authors: Victoria Mingote, Antonio Miguel, Alfonso Ortega, Eduardo Lleida

    Abstract: In this paper, we propose a new differentiable neural network alignment mechanism for text-dependent speaker verification which uses alignment models to produce a supervector representation of an utterance. Unlike previous works with similar approaches, we do not extract the embedding of an utterance from the mean reduction of the temporal dimension. Our system replaces the mean by a phrase alignm… ▽ More

    Submitted 22 December, 2018; originally announced December 2018.

    Comments: 5 pages, IberSPEECH 2018