Skip to main content

Showing 1–6 of 6 results for author: Gomez-Alanis, A

.
  1. arXiv:2401.07360  [pdf, other

    cs.CL cs.SD eess.AS

    Promptformer: Prompted Conformer Transducer for ASR

    Authors: Sergio Duarte-Torres, Arunasish Sen, Aman Rana, Lukas Drude, Alejandro Gomez-Alanis, Andreas Schwarz, Leif Rädel, Volker Leutnant

    Abstract: Context cues carry information which can improve multi-turn interactions in automatic speech recognition (ASR) systems. In this paper, we introduce a novel mechanism inspired by hyper-prompting to fuse textual context with acoustic representations in the attention mechanism. Results on a test set with multi-turn interactions show that our method achieves 5.9% relative word error rate reduction (rW… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

  2. arXiv:2210.16238  [pdf, ps, other

    eess.AS cs.LG cs.SD eess.SP

    Contextual-Utterance Training for Automatic Speech Recognition

    Authors: Alejandro Gomez-Alanis, Lukas Drude, Andreas Schwarz, Rupak Vignesh Swaminathan, Simon Wiesler

    Abstract: Recent studies of streaming automatic speech recognition (ASR) recurrent neural network transducer (RNN-T)-based systems have fed the encoder with past contextual information in order to improve its word error rate (WER) performance. In this paper, we first propose a contextual-utterance training technique which makes use of the previous and future contextual utterances in order to do an implicit… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  3. arXiv:2201.01226  [pdf, other

    eess.AS

    Adversarial Transformation of Spoofing Attacks for Voice Biometrics

    Authors: Alejandro Gomez-Alanis, Jose A. Gonzalez-Lopez, Antonio M. Peinado

    Abstract: Voice biometric systems based on automatic speaker verification (ASV) are exposed to \textit{spoofing} attacks which may compromise their security. To increase the robustness against such attacks, anti-spoofing or presentation attack detection (PAD) systems have been proposed for the detection of replay, synthesis and voice conversion based attacks. Recently, the scientific community has shown tha… ▽ More

    Submitted 4 January, 2022; originally announced January 2022.

  4. arXiv:2106.04423  [pdf, other

    cs.SD eess.AS

    PANACEA cough sound-based diagnosis of COVID-19 for the DiCOVA 2021 Challenge

    Authors: Madhu R. Kamble, Jose A. Gonzalez-Lopez, Teresa Grau, Juan M. Espin, Lorenzo Cascioli, Yiqing Huang, Alejandro Gomez-Alanis, Jose Patino, Roberto Font, Antonio M. Peinado, Angel M. Gomez, Nicholas Evans, Maria A. Zuluaga, Massimiliano Todisco

    Abstract: The COVID-19 pandemic has led to the saturation of public health services worldwide. In this scenario, the early diagnosis of SARS-Cov-2 infections can help to stop or slow the spread of the virus and to manage the demand upon health services. This is especially important when resources are also being stretched by heightened demand linked to other seasonal diseases, such as the flu. In this contex… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: Accepted in INTERSPEECH 2021

  5. arXiv:2012.15184  [pdf, other

    cs.SD cs.LG eess.AS

    Multi-view Temporal Alignment for Non-parallel Articulatory-to-Acoustic Speech Synthesis

    Authors: Jose A. Gonzalez-Lopez, Miriam Gonzalez-Atienza, Alejandro Gomez-Alanis, Jose L. Perez-Cordoba, Phil D. Green

    Abstract: Articulatory-to-acoustic (A2A) synthesis refers to the generation of audible speech from captured movement of the speech articulators. This technique has numerous applications, such as restoring oral communication to people who cannot longer speak due to illness or injury. Most successful techniques so far adopt a supervised learning framework, in which time-synchronous articulatory-and-speech rec… ▽ More

    Submitted 30 December, 2020; originally announced December 2020.

  6. arXiv:2009.02110  [pdf, other

    eess.AS cs.HC cs.LG

    Silent Speech Interfaces for Speech Restoration: A Review

    Authors: Jose A. Gonzalez-Lopez, Alejandro Gomez-Alanis, Juan M. Martín-Doñas, José L. Pérez-Córdoba, Angel M. Gomez

    Abstract: This review summarises the status of silent speech interface (SSI) research. SSIs rely on non-acoustic biosignals generated by the human body during speech production to enable communication whenever normal verbal communication is not possible or not desirable. In this review, we focus on the first case and present latest SSI research aimed at providing new alternative and augmentative communicati… ▽ More

    Submitted 27 September, 2020; v1 submitted 4 September, 2020; originally announced September 2020.