Skip to main content

Showing 1–12 of 12 results for author: Rouvier, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.05876  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Zero-Shot End-To-End Spoken Question Answering In Medical Domain

    Authors: Yanis Labrak, Adel Moumen, Richard Dufour, Mickael Rouvier

    Abstract: In the rapidly evolving landscape of spoken question-answering (SQA), the integration of large language models (LLMs) has emerged as a transformative development. Conventional approaches often entail the use of separate models for question audio transcription and answer selection, resulting in significant resource utilization and error accumulation. To tackle these challenges, we explore the effec… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

    Journal ref: InterSpeech 2024

  2. arXiv:2403.19634  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Asymmetric and trial-dependent modeling: the contribution of LIA to SdSV Challenge Task 2

    Authors: Pierre-Michel Bousquet, Mickael Rouvier

    Abstract: The SdSv challenge Task 2 provided an opportunity to assess efficiency and robustness of modern text-independent speaker verification systems. But it also made it possible to test new approaches, capable of taking into account the main issues of this challenge (duration, language, ...). This paper describes the contributions of our laboratory to the speaker recognition field. These contributions h… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: LIA system description for the Short Duration Speaker Verification (SdSv) challenge 2020 Task 2

  3. arXiv:2402.19443  [pdf, other

    cs.SD cs.AI eess.AS

    Probing the Information Encoded in Neural-based Acoustic Models of Automatic Speech Recognition Systems

    Authors: Quentin Raymondaud, Mickael Rouvier, Richard Dufour

    Abstract: Deep learning architectures have made significant progress in terms of performance in many research areas. The automatic speech recognition (ASR) field has thus benefited from these scientific and technological advances, particularly for acoustic modeling, now integrating deep neural network architectures. However, these performance gains have translated into increased complexity regarding the inf… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  4. arXiv:2312.16885  [pdf, other

    cs.SD eess.AS

    Jeffreys divergence-based regularization of neural network output distribution applied to speaker recognition

    Authors: Pierre-Michel Bousquet, Mickael Rouvier

    Abstract: A new loss function for speaker recognition with deep neural network is proposed, based on Jeffreys Divergence. Adding this divergence to the cross-entropy loss function allows to maximize the target value of the output distribution while smoothing the non-target values. This objective function provides highly discriminative features. Beyond this effect, we propose a theoretical justification of i… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted in ICASSP 2023

  5. arXiv:2309.06141  [pdf, other

    cs.SD eess.AS

    SynVox2: Towards a privacy-friendly VoxCeleb2 dataset

    Authors: Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Nicholas Evans, Massimiliano Todisco, Jean-François Bonastre, Mickael Rouvier

    Abstract: The success of deep learning in speaker recognition relies heavily on the use of large datasets. However, the data-hungry nature of deep learning methods has already being questioned on account the ethical, privacy, and legal concerns that arise when using large-scale datasets of natural speech collected from real human speakers. For example, the widely-used VoxCeleb2 dataset for speaker recogniti… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: conference

  6. arXiv:2309.05472  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

    Authors: Titouan Parcollet, Ha Nguyen, Solene Evain, Marcely Zanon Boito, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick Esteve, Mickael Rouvier, Jerome Goulian, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

    Abstract: Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-… ▽ More

    Submitted 18 March, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Published in Computer Science and Language. Preprint allowed

  7. arXiv:2211.01091  [pdf, ps, other

    eess.AS cs.AI cs.SD

    I4U System Description for NIST SRE'20 CTS Challenge

    Authors: Kong Aik Lee, Tomi Kinnunen, Daniele Colibro, Claudio Vair, Andreas Nautsch, Hanwu Sun, Liang He, Tianyu Liang, Qiongqiong Wang, Mickael Rouvier, Pierre-Michel Bousquet, Rohan Kumar Das, Ignacio Viñals Bailo, Meng Liu, Héctor Deldago, Xuechen Liu, Md Sahidullah, Sandro Cumani, Boning Zhang, Koji Okabe, Hitoshi Yamamoto, Ruijie Tao, Haizhou Li, Alfonso Ortega Giménez, Longbiao Wang , et al. (1 additional authors not shown)

    Abstract: This manuscript describes the I4U submission to the 2020 NIST Speaker Recognition Evaluation (SRE'20) Conversational Telephone Speech (CTS) Challenge. The I4U's submission was resulted from active collaboration among researchers across eight research teams - I$^2$R (Singapore), UEF (Finland), VALPT (Italy, Spain), NEC (Japan), THUEE (China), LIA (France), NUS (Singapore), INRIA (France) and TJU (C… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: SRE 2021, NIST Speaker Recognition Evaluation Workshop, CTS Speaker Recognition Challenge, 14-12 December 2021

  8. arXiv:2210.05291  [pdf, other

    cs.CL cs.SD eess.AS

    On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding

    Authors: Gaëlle Laperrière, Valentin Pelloin, Mickaël Rouvier, Themos Stafylakis, Yannick Estève

    Abstract: In this paper we examine the use of semantically-aligned speech representations for end-to-end spoken language understanding (SLU). We employ the recently-introduced SAMU-XLSR model, which is designed to generate a single embedding that captures the semantics at the utterance level, semantically aligned across different languages. This model combines the acoustic frame-level speech representation… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted in IEEE SLT 2022. This work was performed using HPC resources from GENCI/IDRIS (grant 2022 AD011012565) and received funding from the EU H2020 research and innovation programme under the Marie Sklodowska-Curie ESPERANTO project (grant agreement No 101007666), through the SELMA project (grant No 957017) and from the French ANR through the AISSPER project (ANR-19-CE23-0004)

  9. arXiv:2109.05977  [pdf, other

    eess.AS cs.SD

    Studying squeeze-and-excitation used in CNN for speaker verification

    Authors: Mickael Rouvier, Pierre-Michel Bousquet

    Abstract: In speaker verification, the extraction of voice representations is mainly based on the Residual Neural Network (ResNet) architecture. ResNet is built upon convolution layers which learn filters to capture local spatial patterns along all the input, then generate feature maps that jointly encode the spatial and channel information. Unfortunately, all feature maps in a convolution layer are learnt… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

  10. arXiv:2105.04310  [pdf, other

    eess.AS cs.SD

    Study on the temporal pooling used in deep neural networks for speaker verification

    Authors: Mickael Rouvier, Pierre-Michel Bousquet, Jarod Duret

    Abstract: The x-vector architecture has recently achieved state-of-the-art results on the speaker verification task. This architecture incorporates a central layer, referred to as temporal pooling, which stacks statistical parameters of the acoustic frame distribution. This work proposes to highlight the significant effect of the temporal pooling content on the training dynamics and task performance. An eva… ▽ More

    Submitted 10 May, 2021; originally announced May 2021.

  11. arXiv:1910.13689  [pdf, other

    cs.CL cs.SD eess.AS

    ON-TRAC Consortium End-to-End Speech Translation Systems for the IWSLT 2019 Shared Task

    Authors: Ha Nguyen, Natalia Tomashenko, Marcely Zanon Boito, Antoine Caubriere, Fethi Bougares, Mickael Rouvier, Laurent Besacier, Yannick Esteve

    Abstract: This paper describes the ON-TRAC Consortium translation systems developed for the end-to-end model task of IWSLT Evaluation 2019 for the English-to-Portuguese language pair. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Université), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). A single end-to-end model built as a neural encod… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

    Comments: IWSLT 2019 - First two authors contributed equally to this work

  12. arXiv:1904.07386  [pdf, other

    eess.AS cs.CL cs.SD

    I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

    Authors: Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, **g Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda , et al. (21 additional authors not shown)

    Abstract: The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the res… ▽ More

    Submitted 15 April, 2019; originally announced April 2019.

    Comments: 5 pages