Skip to main content

Showing 1–19 of 19 results for author: Zaiem, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00756  [pdf, other

    eess.AS cs.SD

    Less Forgetting for Better Generalization: Exploring Continual-learning Fine-tuning Methods for Speech Self-supervised Representations

    Authors: Salah Zaiem, Titouan Parcollet, Slim Essid

    Abstract: Despite being trained on massive and diverse datasets, speech self-supervised encoders are generally used for downstream purposes as mere frozen feature extractors or model initializers before fine-tuning. The former severely limits the exploitation of large encoders, while the latter hurts the robustness acquired during pretraining, especially in low-resource scenarios. This work explores middle-… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 5 Pages

  2. arXiv:2407.00463  [pdf, other

    cs.LG cs.AI cs.CL cs.HC eess.AS

    Open-Source Conversational AI with SpeechBrain 1.0

    Authors: Mirco Ravanelli, Titouan Parcollet, Adel Moumen, Sylvain de Langen, Cem Subakan, Peter Plantinga, Yingzhi Wang, Pooneh Mousavi, Luca Della Libera, Artem Ploujnikov, Francesco Paissan, Davide Borra, Salah Zaiem, Zeyu Zhao, Shucong Zhang, Georgios Karakasidis, Sung-Lin Yeh, Aku Rouhe, Rudolf Braun, Florian Mai, Juan Zuluaga-Gomez, Seyed Mahed Mousavi, Andreas Nautsch, Xuechen Liu, Sangeet Sagar , et al. (5 additional authors not shown)

    Abstract: SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more.It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper presen… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Submitted to JMLR (Machine Learning Open Source Software)

  3. arXiv:2406.10735  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    How Should We Extract Discrete Audio Tokens from Self-Supervised Models?

    Authors: Pooneh Mousavi, Jarod Duret, Salah Zaiem, Luca Della Libera, Artem Ploujnikov, Cem Subakan, Mirco Ravanelli

    Abstract: Discrete audio tokens have recently gained attention for their potential to bridge the gap between audio and language processing. Ideal audio tokens must preserve content, paralinguistic elements, speaker identity, and many other audio details. Current audio tokenization methods fall into two categories: Semantic tokens, acquired through quantization of Self-Supervised Learning (SSL) models, and N… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 4 pages, 2 figures, 2 tables, Accepted at Interspeech 2024

  4. arXiv:2310.16931  [pdf, other

    cs.CL cs.AI

    CL-MASR: A Continual Learning Benchmark for Multilingual ASR

    Authors: Luca Della Libera, Pooneh Mousavi, Salah Zaiem, Cem Subakan, Mirco Ravanelli

    Abstract: Modern multilingual automatic speech recognition (ASR) systems like Whisper have made it possible to transcribe audio in multiple languages with a single model. However, current state-of-the-art ASR models are typically evaluated on individual languages or in a multi-task setting, overlooking the challenge of continually learning new languages. There is insufficient research on how to add new lang… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: 16 pages, 5 figures, 5 tables

  5. arXiv:2309.12712  [pdf, other

    eess.AS cs.LG cs.SD

    Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences

    Authors: Hugo Malard, Salah Zaiem, Robin Algayres

    Abstract: Recent progress in Automatic Speech Recognition (ASR) has been coupled with a substantial increase in the model sizes, which may now contain billions of parameters, leading to slow inferences even with adapted hardware. In this context, several ASR models exist in various sizes, with different inference costs leading to different performance levels. Based on the observation that smaller models per… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  6. arXiv:2309.11327  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Leveraging Data Collection and Unsupervised Learning for Code-switched Tunisian Arabic Automatic Speech Recognition

    Authors: Ahmed Amine Ben Abdallah, Ata Kabboudi, Amir Kanoun, Salah Zaiem

    Abstract: Crafting an effective Automatic Speech Recognition (ASR) solution for dialects demands innovative approaches that not only address the data scarcity issue but also navigate the intricacies of linguistic diversity. In this paper, we address the aforementioned ASR challenge, focusing on the Tunisian dialect. First, textual and audio data is collected and in some cases annotated. Second, we explore s… ▽ More

    Submitted 25 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: 6 pages, submitted to ICASSP 2024

  7. arXiv:2309.09546  [pdf, other

    eess.AS cs.CL cs.SD

    Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

    Authors: George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj, Lucas Ondel Yang, Daniele Falavigna, Mohamed Nabih Ali, Alessio Brutti

    Abstract: The ability to dynamically adjust the computational load of neural models during inference is crucial for on-device processing scenarios characterised by limited and time-varying computational resources. A promising solution is presented by early-exit architectures, in which additional exit branches are appended to intermediate layers of the encoder. In self-attention models for automatic speech r… ▽ More

    Submitted 22 February, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted at the ICASSP Workshop Self-supervision in Audio, Speech and Beyond 2024

  8. arXiv:2308.14456  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads

    Authors: Salah Zaiem, Youcef Kemiche, Titouan Parcollet, Slim Essid, Mirco Ravanelli

    Abstract: Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data. The high number of proposed approaches fostered the emergence of comprehensive benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. However, while the number of considered tasks has bee… ▽ More

    Submitted 21 February, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: 18 Pages

  9. arXiv:2306.00481  [pdf, other

    eess.AS cs.LG

    Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations

    Authors: Salah Zaiem, Titouan Parcollet, Slim Essid

    Abstract: Self-Supervised Learning (SSL) has allowed leveraging large amounts of unlabeled speech data to improve the performance of speech recognition models even with small annotated datasets. Despite this, speech SSL representations may fail while facing an acoustic mismatch between the pretraining and target datasets. To address this issue, we propose a novel supervised domain adaptation method, designe… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 6 pages,INTERSPEECH 2023

  10. arXiv:2306.00452  [pdf, ps, other

    eess.AS cs.LG

    Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?

    Authors: Salah Zaiem, Youcef Kemiche, Titouan Parcollet, Slim Essid, Mirco Ravanelli

    Abstract: Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance on speech tasks using only small amounts of annotated data. The high number of proposed approaches fostered the need and rise of extended benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. Howe… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 6 pages

    Journal ref: INTERSPEECH 2023

  11. arXiv:2303.06740  [pdf, other

    eess.AS cs.LG

    Fine-tuning Strategies for Faster Inference using Speech Self-Supervised Models: A Comparative Study

    Authors: Salah Zaiem, Robin Algayres, Titouan Parcollet, Slim Essid, Mirco Ravanelli

    Abstract: Self-supervised learning (SSL) has allowed substantial progress in Automatic Speech Recognition (ASR) performance in low-resource settings. In this context, it has been demonstrated that larger self-supervised feature extractors are crucial for achieving lower downstream ASR error rates. Thus, better performance might be sanctioned with longer inferences. This article explores different approaches… ▽ More

    Submitted 12 March, 2023; originally announced March 2023.

    Comments: Submitted to ICASSP "Self-supervision in Audio, Speech and Beyond" workshop

  12. arXiv:2206.11332  [pdf, other

    cs.CL

    DP-Parse: Finding Word Boundaries from Raw Speech with an Instance Lexicon

    Authors: Robin Algayres, Tristan Ricoul, Julien Karadayi, Hugo Laurençon, Salah Zaiem, Abdelrahman Mohamed, Benoît Sagot, Emmanuel Dupoux

    Abstract: Finding word boundaries in continuous speech is challenging as there is little or no equivalent of a 'space' delimiter between words. Popular Bayesian non-parametric models for text segmentation use a Dirichlet process to jointly segment sentences and build a lexicon of word types. We introduce DP-Parse, which uses similar principles but only relies on an instance lexicon of word tokens, avoiding… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

  13. arXiv:2204.04170  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning

    Authors: Salah Zaiem, Titouan Parcollet, Slim Essid

    Abstract: Contrastive learning enables learning useful audio and speech representations without ground-truth labels by maximizing the similarity between latent representations of similar signal segments. In this framework various data augmentation techniques are usually exploited to help enforce desired invariances within the learned representations, improving performance on various audio tasks thanks to mo… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: Submitted to INTERSPEECH 2022

  14. arXiv:2107.00594  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Pretext Tasks selection for multitask self-supervised speech representation learning

    Authors: Salah Zaiem, Titouan Parcollet, Slim Essid, Abdel Heba

    Abstract: Through solving pretext tasks, self-supervised learning leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. In audio/speech signal processing, a wide range of features where engineered through decades of research efforts. As it turns out, learning to predict such features (a.k.a pseudo-labels) has proven to be a particularl… ▽ More

    Submitted 11 November, 2022; v1 submitted 1 July, 2021; originally announced July 2021.

  15. arXiv:2104.14297  [pdf, other

    cs.SD cs.LG eess.AS

    End-to-End Speech Recognition from Federated Acoustic Models

    Authors: Yan Gao, Titouan Parcollet, Salah Zaiem, Javier Fernandez-Marques, Pedro P. B. de Gusmao, Daniel J. Beutel, Nicholas D. Lane

    Abstract: Training Automatic Speech Recognition (ASR) models under federated learning (FL) settings has attracted a lot of attention recently. However, the FL scenarios often presented in the literature are artificial and fail to capture the complexity of real FL systems. In this paper, we construct a challenging and realistic ASR federated experimental setup consisting of clients with heterogeneous data di… ▽ More

    Submitted 9 July, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

  16. arXiv:2104.07388  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Conditional independence for pretext task selection in Self-supervised speech representation learning

    Authors: Salah Zaiem, Titouan Parcollet, Slim Essid

    Abstract: Through solving pretext tasks, self-supervised learning (SSL) leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. A common pretext task consists in pretraining a SSL model on pseudo-labels derived from the original signal. This technique is particularly relevant for speech data where various meaningful signal processing fea… ▽ More

    Submitted 1 July, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

    Comments: 5 pages, Accepted for presentation at Interspeech2021

  17. arXiv:2007.13542  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Evaluating the reliability of acoustic speech embeddings

    Authors: Robin Algayres, Mohamed Salah Zaiem, Benoit Sagot, Emmanuel Dupoux

    Abstract: Speech embeddings are fixed-size acoustic representations of variable-length speech sequences. They are increasingly used for a variety of tasks ranging from information retrieval to unsupervised term discovery and speech segmentation. However, there is currently no clear methodology to compare or optimise the quality of these embeddings in a task-neutral way. Here, we systematically compare two p… ▽ More

    Submitted 6 November, 2020; v1 submitted 27 July, 2020; originally announced July 2020.

    Comments: Conference paper at Interspeech 2020

  18. arXiv:1911.05438  [pdf, other

    cs.LG cs.MA stat.ML

    Learning to Communicate in Multi-Agent Reinforcement Learning : A Review

    Authors: Mohamed Salah Zaïem, Etienne Bennequin

    Abstract: We consider the issue of multiple agents learning to communicate through reinforcement learning within partially observable environments, with a focus on information asymmetry in the second part of our work. We provide a review of the recent algorithms developed to improve the agents' policy by allowing the sharing of information between agents and the learning of communication strategies, with a… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

  19. arXiv:1812.10119  [pdf, other

    cs.IR cs.CL stat.ML

    Sequence to Sequence Learning for Query Expansion

    Authors: Salah Zaiem, Fatiha Sadat

    Abstract: Using sequence to sequence algorithms for query expansion has not been explored yet in Information Retrieval literature nor in Question-Answering's. We tried to fill this gap in the literature with a custom Query Expansion engine trained and tested on open datasets. Starting from open datasets, we built a Query Expansion training set using sentence-embeddings-based Keyword Extraction. We therefore… ▽ More

    Submitted 25 December, 2018; originally announced December 2018.

    Comments: 8 pages, 2 figures, AAAI-19 Student Abstract and Poster Program