Skip to main content

Showing 1–6 of 6 results for author: Mingote, V

Searching in archive eess. Search in all archives.
.
  1. arXiv:2309.07478  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Direct Text to Speech Translation System using Acoustic Units

    Authors: Victoria Mingote, Pablo Gimeno, Luis Vicente, Sameer Khurana, Antoine Laurent, Jarod Duret

    Abstract: This paper proposes a direct text to speech translation system using discrete acoustic units. This framework employs text in different source languages as input to generate speech in the target language without the need for text transcriptions in this language. Motivated by the success of acoustic units in previous works for direct speech to speech translation systems, we use the same pipeline to… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: 5 pages, 4 figures

  2. arXiv:2306.00789  [pdf, other

    cs.CL cs.AI eess.AS eess.SP

    Improved Cross-Lingual Transfer Learning For Automatic Speech Translation

    Authors: Sameer Khurana, Nauman Dawalatabad, Antoine Laurent, Luis Vicente, Pablo Gimeno, Victoria Mingote, James Glass

    Abstract: Research in multilingual speech-to-text translation is topical. Having a single model that supports multiple translation tasks is desirable. The goal of this work it to improve cross-lingual transfer learning in multilingual speech-to-text translation via semantic knowledge distillation. We show that by initializing the encoder of the encoder-decoder sequence-to-sequence translation model with SAM… ▽ More

    Submitted 25 January, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

  3. arXiv:2111.03842  [pdf, other

    eess.AS cs.LG cs.SD

    Class Token and Knowledge Distillation for Multi-head Self-Attention Speaker Verification Systems

    Authors: Victoria Mingote, Antonio Miguel, Alfonso Ortega, Eduardo Lleida

    Abstract: This paper explores three novel approaches to improve the performance of speaker verification (SV) systems based on deep neural networks (DNN) using Multi-head Self-Attention (MSA) mechanisms and memory layers. Firstly, we propose the use of a learnable vector called Class token to replace the average global pooling mechanism to extract the embeddings. Unlike global average pooling, our proposal t… ▽ More

    Submitted 10 February, 2023; v1 submitted 6 November, 2021; originally announced November 2021.

  4. arXiv:2110.14425  [pdf, other

    cs.SD cs.LG eess.AS

    Generalizing AUC Optimization to Multiclass Classification for Audio Segmentation With Limited Training Data

    Authors: Pablo Gimeno, Victoria Mingote, Alfonso Ortega, Antonio Miguel, Eduardo Lleida

    Abstract: Area under the ROC curve (AUC) optimisation techniques developed for neural networks have recently demonstrated their capabilities in different audio and speech related tasks. However, due to its intrinsic nature, AUC optimisation has focused only on binary tasks so far. In this paper, we introduce an extension to the AUC optimisation framework so that it can be easily applied to an arbitrary numb… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Journal ref: IEEE Signal Processing Letters, vol. 28, pp. 1135-1139, 2021

  5. arXiv:1901.11332  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Optimization of the Area Under the ROC Curve using Neural Network Supervectors for Text-Dependent Speaker Verification

    Authors: Victoria Mingote, Antonio Miguel, Alfonso Ortega, Eduardo Lleida

    Abstract: This paper explores two techniques to improve the performance of text-dependent speaker verification systems based on deep neural networks. Firstly, we propose a general alignment mechanism to keep the temporal structure of each phrase and obtain a supervector with the speaker and phrase information, since both are relevant for a text-dependent verification. As we show, it is possible to use diffe… ▽ More

    Submitted 30 April, 2019; v1 submitted 31 January, 2019; originally announced January 2019.

  6. arXiv:1812.09484  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Differentiable Supervector Extraction for Encoding Speaker and Phrase Information in Text Dependent Speaker Verification

    Authors: Victoria Mingote, Antonio Miguel, Alfonso Ortega, Eduardo Lleida

    Abstract: In this paper, we propose a new differentiable neural network alignment mechanism for text-dependent speaker verification which uses alignment models to produce a supervector representation of an utterance. Unlike previous works with similar approaches, we do not extract the embedding of an utterance from the mean reduction of the temporal dimension. Our system replaces the mean by a phrase alignm… ▽ More

    Submitted 22 December, 2018; originally announced December 2018.

    Comments: 5 pages, IberSPEECH 2018