Skip to main content

Showing 1–9 of 9 results for author: Mdhaffar, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00463  [pdf, other

    cs.LG cs.AI cs.CL cs.HC eess.AS

    Open-Source Conversational AI with SpeechBrain 1.0

    Authors: Mirco Ravanelli, Titouan Parcollet, Adel Moumen, Sylvain de Langen, Cem Subakan, Peter Plantinga, Yingzhi Wang, Pooneh Mousavi, Luca Della Libera, Artem Ploujnikov, Francesco Paissan, Davide Borra, Salah Zaiem, Zeyu Zhao, Shucong Zhang, Georgios Karakasidis, Sung-Lin Yeh, Aku Rouhe, Rudolf Braun, Florian Mai, Juan Zuluaga-Gomez, Seyed Mahed Mousavi, Andreas Nautsch, Xuechen Liu, Sangeet Sagar , et al. (5 additional authors not shown)

    Abstract: SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more.It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper presen… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Submitted to JMLR (Machine Learning Open Source Software)

  2. arXiv:2405.19342  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Sonos Voice Control Bias Assessment Dataset: A Methodology for Demographic Bias Assessment in Voice Assistants

    Authors: Chloé Sekkat, Fanny Leroy, Salima Mdhaffar, Blake Perry Smith, Yannick Estève, Joseph Dureau, Alice Coucke

    Abstract: Recent works demonstrate that voice assistants do not perform equally well for everyone, but research on demographic robustness of speech technologies is still scarce. This is mainly due to the rarity of large datasets with controlled demographic tags. This paper introduces the Sonos Voice Control Bias Assessment Dataset, an open dataset composed of voice assistant requests for North American Engl… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  3. arXiv:2309.05472  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

    Authors: Titouan Parcollet, Ha Nguyen, Solene Evain, Marcely Zanon Boito, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick Esteve, Mickael Rouvier, Jerome Goulian, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

    Abstract: Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-… ▽ More

    Submitted 18 March, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Published in Computer Science and Language. Preprint allowed

  4. arXiv:2302.10790  [pdf, other

    eess.AS cs.LG cs.SD

    Federated Learning for ASR based on Wav2vec 2.0

    Authors: Tuan Nguyen, Salima Mdhaffar, Natalia Tomashenko, Jean-François Bonastre, Yannick Estève

    Abstract: This paper presents a study on the use of federated learning to train an ASR model based on a wav2vec 2.0 model pre-trained by self supervision. Carried out on the well-known TED-LIUM 3 dataset, our experiments show that such a model can obtain, with no use of a language model, a word error rate of 10.92% on the official TED-LIUM 3 test set, without sharing any data from the different users. We al… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

    Comments: 5 pages, accepted in ICASSP 2023

  5. arXiv:2204.00803  [pdf, other

    cs.CL cs.SD eess.AS

    End-to-end model for named entity recognition from speech without paired training data

    Authors: Salima Mdhaffar, Jarod Duret, Titouan Parcollet, Yannick Estève

    Abstract: Recent works showed that end-to-end neural approaches tend to become very popular for spoken language understanding (SLU). Through the term end-to-end, one considers the use of a single model optimized to extract semantic information directly from the speech signal. A major issue for such models is the lack of paired audio and textual data with semantic annotation. In this paper, we propose an app… ▽ More

    Submitted 2 April, 2022; originally announced April 2022.

    Comments: Submitted to INTERSPEECH 2022

  6. arXiv:2111.04194  [pdf, other

    cs.CL cs.SD eess.AS

    Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition

    Authors: Salima Mdhaffar, Jean-François Bonastre, Marc Tommasi, Natalia Tomashenko, Yannick Estève

    Abstract: The widespread of powerful personal devices capable of collecting voice of their users has opened the opportunity to build speaker adapted speech recognition system (ASR) or to participate to collaborative learning of ASR. In both cases, personalized acoustic models (AM), i.e. fine-tuned AM with specific speaker data, can be built. A question that naturally arises is whether the dissemination of p… ▽ More

    Submitted 7 November, 2021; originally announced November 2021.

  7. arXiv:2111.03777  [pdf, other

    cs.CL cs.CR cs.SD eess.AS

    Privacy attacks for automatic speech recognition acoustic models in a federated learning framework

    Authors: Natalia Tomashenko, Salima Mdhaffar, Marc Tommasi, Yannick Estève, Jean-François Bonastre

    Abstract: This paper investigates methods to effectively retrieve speaker information from the personalized speaker adapted neural network acoustic models (AMs) in automatic speech recognition (ASR). This problem is especially important in the context of federated learning of ASR acoustic models where a global model is learnt on the server based on the updates received from multiple clients. We propose an a… ▽ More

    Submitted 14 January, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

    Comments: Submitted to ICASSP 2022

    Journal ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 6972-6976

  8. arXiv:2106.13045  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Where are we in semantic concept extraction for Spoken Language Understanding?

    Authors: Sahar Ghannay, Antoine Caubrière, Salima Mdhaffar, Gaëlle Laperrière, Bassam Jabaian, Yannick Estève

    Abstract: Spoken language understanding (SLU) topic has seen a lot of progress these last three years, with the emergence of end-to-end neural approaches. Spoken language understanding refers to natural language processing tasks related to semantic extraction from speech signal, like named entity recognition from speech or slot filling task in a context of human-machine dialogue. Classically, SLU tasks were… ▽ More

    Submitted 11 October, 2022; v1 submitted 24 June, 2021; originally announced June 2021.

    Comments: Accepted in the SPECOM 2021 conference

  9. LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

    Authors: Solene Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Esteve, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

    Abstract: Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient spee… ▽ More

    Submitted 10 June, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

    Comments: Will be presented at Interspeech 2021

    Journal ref: Proc. Interspeech 2021