Skip to main content

Showing 1–6 of 6 results for author: Hernando, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.10598  [pdf, other

    eess.AS cs.SD

    Double Multi-Head Attention Multimodal System for Odyssey 2024 Speech Emotion Recognition Challenge

    Authors: Federico Costa, Miquel India, Javier Hernando

    Abstract: As computer-based applications are becoming more integrated into our daily lives, the importance of Speech Emotion Recognition (SER) has increased significantly. Promoting research with innovative approaches in SER, the Odyssey 2024 Speech Emotion Recognition Challenge was organized as part of the Odyssey 2024 Speaker and Language Recognition Workshop. In this paper we describe the Double Multi-He… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Odyssey 2024: The Speaker and Language Recognition Workshop

  2. Speaker Characterization by means of Attention Pooling

    Authors: Federico Costa, Miquel India, Javier Hernando

    Abstract: State-of-the-art Deep Learning systems for speaker verification are commonly based on speaker embedding extractors. These architectures are usually composed of a feature extractor front-end together with a pooling layer to encode variable-length utterances into fixed-length speaker vectors. The authors have recently proposed the use of a Double Multi-Head Self-Attention pooling for speaker recogni… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: IberSpeech 2022

    Journal ref: Proc. IberSPEECH 2022, 166-170

  3. arXiv:2301.01703  [pdf, other

    cs.IT eess.SP

    Technology Trends for Massive MIMO towards 6G

    Authors: Yiming Huo, Xingqin Lin, Boya Di, Hongliang Zhang, Francisco Javier Lorca Hernando, Ahmet Serdar Tan, Shahid Mumtaz, Özlem Tuğfe Demir, Kun Chen-Hu

    Abstract: At the dawn of the next-generation wireless systems and networks, massive multiple-input multiple-output (MIMO) has been envisioned as one of the enabling technologies. With the continued success of being applied in the 5G and beyond, the massive MIMO technology has demonstrated its advantageousness, integrability, and extendibility. Moreover, several evolutionary features and revolutionizing tren… ▽ More

    Submitted 5 January, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

    Comments: 7 pages, 5 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  4. arXiv:2010.10937  [pdf, other

    eess.AS cs.SD

    The UPC Speaker Verification System Submitted to VoxCeleb Speaker Recognition Challenge 2020 (VoxSRC-20)

    Authors: Umair Khan, Javier Hernando

    Abstract: This report describes the submission from Technical University of Catalonia (UPC) to the VoxCeleb Speaker Recognition Challenge (VoxSRC-20) at Interspeech 2020. The final submission is a combination of three systems. System-1 is an autoencoder based approach which tries to reconstruct similar i-vectors, whereas System-2 and -3 are Convolutional Neural Network (CNN) based siamese architectures. The… ▽ More

    Submitted 27 October, 2020; v1 submitted 21 October, 2020; originally announced October 2020.

    Comments: VoxSRC-20 Workshop (Interspeech 2020 Conference)

  5. arXiv:2008.01077  [pdf, other

    eess.AS cs.LG cs.SD

    Self-attention encoding and pooling for speaker recognition

    Authors: Pooyan Safari, Miquel India, Javier Hernando

    Abstract: The computing power of mobile devices limits the end-user applications in terms of storage size, processing, memory and energy consumption. These limitations motivate researchers for the design of more efficient deep models. On the other hand, self-attention networks based on Transformer architecture have attracted remarkable interests due to their high parallelization capabilities and strong perf… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

  6. arXiv:2007.13199  [pdf, other

    eess.AS cs.SD

    Double Multi-Head Attention for Speaker Verification

    Authors: Miquel India, Pooyan Safari, Javier Hernando

    Abstract: Most state-of-the-art Deep Learning systems for speaker verification are based on speaker embedding extractors. These architectures are commonly composed of a feature extractor front-end together with a pooling layer to encode variable-length utterances into fixed-length speaker vectors. In this paper we present Double Multi-Head Attention pooling, which extends our previous approach based on Self… ▽ More

    Submitted 9 January, 2021; v1 submitted 26 July, 2020; originally announced July 2020.