Skip to main content

Showing 1–8 of 8 results for author: Paissan, F

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.00463  [pdf, other

    cs.LG cs.AI cs.CL cs.HC eess.AS

    Open-Source Conversational AI with SpeechBrain 1.0

    Authors: Mirco Ravanelli, Titouan Parcollet, Adel Moumen, Sylvain de Langen, Cem Subakan, Peter Plantinga, Yingzhi Wang, Pooneh Mousavi, Luca Della Libera, Artem Ploujnikov, Francesco Paissan, Davide Borra, Salah Zaiem, Zeyu Zhao, Shucong Zhang, Georgios Karakasidis, Sung-Lin Yeh, Aku Rouhe, Rudolf Braun, Florian Mai, Juan Zuluaga-Gomez, Seyed Mahed Mousavi, Andreas Nautsch, Xuechen Liu, Sangeet Sagar , et al. (5 additional authors not shown)

    Abstract: SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more.It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper presen… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Submitted to JMLR (Machine Learning Open Source Software)

  2. arXiv:2405.17615  [pdf, other

    cs.SD cs.LG eess.AS eess.SP

    Listenable Maps for Zero-Shot Audio Classifiers

    Authors: Francesco Paissan, Luca Della Libera, Mirco Ravanelli, Cem Subakan

    Abstract: Interpreting the decisions of deep learning models, including audio classifiers, is crucial for ensuring the transparency and trustworthiness of this technology. In this paper, we introduce LMAC-ZS (Listenable Maps for Audio Classifiers in the Zero-Shot context), which, to the best of our knowledge, is the first decoder-based post-hoc interpretation method for explaining the decisions of zero-shot… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  3. arXiv:2403.13086  [pdf, other

    cs.SD cs.LG eess.AS eess.SP

    Listenable Maps for Audio Classifiers

    Authors: Francesco Paissan, Mirco Ravanelli, Cem Subakan

    Abstract: Despite the impressive performance of deep learning models across diverse tasks, their complexity poses challenges for interpretation. This challenge is particularly evident for audio signals, where conveying interpretations becomes inherently difficult. To address this issue, we introduce Listenable Maps for Audio Classifiers (L-MAC), a posthoc interpretation method that generates faithful and li… ▽ More

    Submitted 19 June, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted to ICML 2024 (Oral)

  4. arXiv:2311.14517  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    tinyCLAP: Distilling Constrastive Language-Audio Pretrained Models

    Authors: Francesco Paissan, Elisabetta Farella

    Abstract: Contrastive Language-Audio Pretraining (CLAP) became of crucial importance in the field of audio and speech processing. Its employment ranges from sound event detection to text-to-audio generation. However, one of the main limitations is the considerable amount of data required in the training process and the overall computational complexity during inference. This paper investigates how we can red… ▽ More

    Submitted 12 June, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: Accepted to INTERSPEECH 2024

  5. arXiv:2310.12858  [pdf, other

    cs.SD cs.LG eess.AS

    Audio Editing with Non-Rigid Text Prompts

    Authors: Francesco Paissan, Luca Della Libera, Zhepei Wang, Mirco Ravanelli, Paris Smaragdis, Cem Subakan

    Abstract: In this paper, we explore audio-editing with non-rigid text edits. We show that the proposed editing pipeline is able to create audio edits that remain faithful to the input audio. We explore text prompts that perform addition, style transfer, and in-painting. We quantitatively and qualitatively show that the edits are able to obtain results which outperform Audio-LDM, a recently released text-pro… ▽ More

    Submitted 12 June, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: Accepted to INTERSPEECH 2024

  6. arXiv:2303.12659  [pdf, other

    cs.AI cs.LG cs.SD eess.AS

    Posthoc Interpretation via Quantization

    Authors: Francesco Paissan, Cem Subakan, Mirco Ravanelli

    Abstract: In this paper, we introduce a new approach, called Posthoc Interpretation via Quantization (PIQ), for interpreting decisions made by trained classifiers. Our method utilizes vector quantization to transform the representations of a classifier into a discrete, class-specific latent space. The class-specific codebooks act as a bottleneck that forces the interpreter to focus on the parts of the input… ▽ More

    Submitted 27 May, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

    Comments: Francesco Paissan and Cem Subakan contributed equally

  7. arXiv:2303.03005  [pdf, other

    cs.SD cs.LG eess.AS

    Scaling strategies for on-device low-complexity source separation with Conv-Tasnet

    Authors: Mohamed Nabih Ali, Francesco Paissan, Daniele Falavigna, Alessio Brutti

    Abstract: Recently, several very effective neural approaches for single-channel speech separation have been presented in the literature. However, due to the size and complexity of these models, their use on low-resource devices, e.g. for hearing aids, and earphones, is still a challenge and established solutions are not available yet. Although approaches based on either pruning or compressing neural models… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

  8. arXiv:2206.03835  [pdf, other

    eess.AS

    Low-complexity acoustic scene classification in DCASE 2022 Challenge

    Authors: Irene Martín-Morató, Francesco Paissan, Alberto Ancilotto, Toni Heittola, Annamaria Mesaros, Elisabetta Farella, Alessio Brutti, Tuomas Virtanen

    Abstract: This paper presents an analysis of the Low-Complexity Acoustic Scene Classification task in DCASE 2022 Challenge. The task was a continuation from the previous years, but the low-complexity requirements were changed to the following: the maximum number of allowed parameters, including the zero-valued ones, was 128 K, with parameters being represented using INT8 numerical format; and the maximum nu… ▽ More

    Submitted 13 July, 2022; v1 submitted 8 June, 2022; originally announced June 2022.