Skip to main content

Showing 1–13 of 13 results for author: De Mori, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00463  [pdf, other

    cs.LG cs.AI cs.CL cs.HC eess.AS

    Open-Source Conversational AI with SpeechBrain 1.0

    Authors: Mirco Ravanelli, Titouan Parcollet, Adel Moumen, Sylvain de Langen, Cem Subakan, Peter Plantinga, Yingzhi Wang, Pooneh Mousavi, Luca Della Libera, Artem Ploujnikov, Francesco Paissan, Davide Borra, Salah Zaiem, Zeyu Zhao, Shucong Zhang, Georgios Karakasidis, Sung-Lin Yeh, Aku Rouhe, Rudolf Braun, Florian Mai, Juan Zuluaga-Gomez, Seyed Mahed Mousavi, Andreas Nautsch, Xuechen Liu, Sangeet Sagar , et al. (5 additional authors not shown)

    Abstract: SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more.It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper presen… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Submitted to JMLR (Machine Learning Open Source Software)

  2. arXiv:2106.04624  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    SpeechBrain: A General-Purpose Speech Toolkit

    Authors: Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-Chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato De Mori, Yoshua Bengio

    Abstract: SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the core architecture designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: Preprint

  3. End2End Acoustic to Semantic Transduction

    Authors: Valentin Pelloin, Nathalie Camelin, Antoine Laurent, Renato De Mori, Antoine Caubrière, Yannick Estève, Sylvain Meignier

    Abstract: In this paper, we propose a novel end-to-end sequence-to-sequence spoken language understanding model using an attention mechanism. It reliably selects contextual acoustic features in order to hypothesize semantic contents. An initial architecture capable of extracting all pronounced words and concepts from acoustic spans is designed and tested. With a shallow fusion language model, this system re… ▽ More

    Submitted 1 February, 2021; originally announced February 2021.

    Comments: Accepted at IEEE ICASSP 2021

    Journal ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  4. Dialogue history integration into end-to-end signal-to-concept spoken language understanding systems

    Authors: Natalia Tomashenko, Christian Raymond, Antoine Caubriere, Renato De Mori, Yannick Esteve

    Abstract: This work investigates the embeddings for representing dialog history in spoken language understanding (SLU) systems. We focus on the scenario when the semantic information is extracted directly from the speech signal by means of a single end-to-end neural network model. We proposed to integrate dialogue history into an end-to-end signal-to-concept SLU system. The dialog history is represented in… ▽ More

    Submitted 14 February, 2020; originally announced February 2020.

    Comments: Accepted for ICASSP 2020 (Submitted: October 21, 2019)

    Journal ref: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  5. arXiv:1906.08043  [pdf, other

    eess.AS cs.CL cs.SD

    Real to H-space Encoder for Speech Recognition

    Authors: Titouan Parcollet, Mohamed Morchid, Georges Linarès, Renato De Mori

    Abstract: Deep neural networks (DNNs) and more precisely recurrent neural networks (RNNs) are at the core of modern automatic speech recognition systems, due to their efficiency to process input sequences. Recently, it has been shown that different input representations, based on multidimensional algebras, such as complex and quaternion numbers, are able to bring to neural networks a more natural, compressi… ▽ More

    Submitted 17 June, 2019; originally announced June 2019.

    Comments: Accepted at INTERSPEECH 2019

  6. arXiv:1812.09321  [pdf, other

    cs.CL

    Multiple topic identification in telephone conversations

    Authors: Xavier Bost, Marc El Bèze, Renato De Mori

    Abstract: This paper deals with the automatic analysis of conversations between a customer and an agent in a call centre of a customer care service. The purpose of the analysis is to hypothesize themes about problems and complaints discussed in the conversation. Themes are defined by the application documentation topics. A conversation may contain mentions that are irrelevant for the application purpose and… ▽ More

    Submitted 29 December, 2018; v1 submitted 21 December, 2018; originally announced December 2018.

    Comments: arXiv admin note: text overlap with arXiv:1812.07207

    Journal ref: Interspeech, Aug 2013, Lyon, France

  7. Multiple topic identification in human/human conversations

    Authors: X. Bost, G. Senay, M. El-Bèze, R. De Mori

    Abstract: The paper deals with the automatic analysis of real-life telephone conversations between an agent and a customer of a customer care service (ccs). The application domain is the public transportation system in Paris and the purpose is to collect statistics about customer problems in order to monitor the service and decide priorities on the intervention for improving user satisfaction. Of primary im… ▽ More

    Submitted 29 December, 2018; v1 submitted 18 December, 2018; originally announced December 2018.

    Journal ref: Computer Speech \& Language, 2015, 34 (1), pp.18-42

  8. arXiv:1811.09678  [pdf, other

    eess.AS cs.SD stat.ML

    Speech recognition with quaternion neural networks

    Authors: Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linarès, Renato De Mori

    Abstract: Neural network architectures are at the core of powerful automatic speech recognition systems (ASR). However, while recent researches focus on novel model architectures, the acoustic input features remain almost unchanged. Traditional ASR systems rely on multidimensional acoustic features such as the Mel filter bank energies alongside with the first, and second order derivatives to characterize ti… ▽ More

    Submitted 21 November, 2018; originally announced November 2018.

    Comments: NIPS 2018 (IRASL). arXiv admin note: text overlap with arXiv:1806.04418

  9. arXiv:1811.02566  [pdf, other

    eess.AS cs.LG cs.SD eess.SP stat.ML

    Bidirectional Quaternion Long-Short Term Memory Recurrent Neural Networks for Speech Recognition

    Authors: Titouan Parcollet, Mohamed Morchid, Georges Linarès, Renato De Mori

    Abstract: Recurrent neural networks (RNN) are at the core of modern automatic speech recognition (ASR) systems. In particular, long-short term memory (LSTM) recurrent neural networks have achieved state-of-the-art results in many speech recognition tasks, due to their efficient representation of long and short term dependencies in sequences of inter-dependent features. Nonetheless, internal dependencies wit… ▽ More

    Submitted 6 November, 2018; originally announced November 2018.

    Comments: Submitted at ICASSP 2019. arXiv admin note: text overlap with arXiv:1806.04418

  10. arXiv:1806.07789  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition

    Authors: Titouan Parcollet, Ying Zhang, Mohamed Morchid, Chiheb Trabelsi, Georges Linarès, Renato De Mori, Yoshua Bengio

    Abstract: Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it easier to train speech recognition systems in an end-to-end fashion. However in real-valued models, time frame components such as mel-filter-bank energies and the cepstral coefficients obtained from them, together with their first and second order derivatives… ▽ More

    Submitted 20 June, 2018; originally announced June 2018.

    Comments: Accepted at INTERSPEECH 2018

  11. arXiv:1806.04418  [pdf, other

    stat.ML cs.LG

    Quaternion Recurrent Neural Networks

    Authors: Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linarès, Chiheb Trabelsi, Renato De Mori, Yoshua Bengio

    Abstract: Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence. Nonetheless, popular tasks such as speech or images recognition, involve multi-dimensional input features that are characterized by strong internal dependencies between the dimensions of the input vector. W… ▽ More

    Submitted 7 January, 2019; v1 submitted 12 June, 2018; originally announced June 2018.

    Comments: ICLR Update - Full rework

  12. arXiv:1705.09515  [pdf, other

    cs.CL cs.AI cs.NE

    ASR error management for improving spoken language understanding

    Authors: Edwin Simonnet, Sahar Ghannay, Nathalie Camelin, Yannick Estève, Renato De Mori

    Abstract: This paper addresses the problem of automatic speech recognition (ASR) error detection and their use for improving spoken language understanding (SLU) systems. In this study, the SLU task consists in automatically extracting, from ASR transcriptions , semantic concepts and concept/values pairs in a e.g touristic information system. An approach is proposed for enriching the set of semantic labels w… ▽ More

    Submitted 26 May, 2017; originally announced May 2017.

    Comments: Interspeech 2017, Aug 2017, Stockholm, Sweden. 2017

  13. arXiv:1702.03402  [pdf, ps, other

    cs.LG cs.CL

    Parallel Long Short-Term Memory for Multi-stream Classification

    Authors: Mohamed Bouaziz, Mohamed Morchid, Richard Dufour, Georges Linarès, Renato De Mori

    Abstract: Recently, machine learning methods have provided a broad spectrum of original and efficient algorithms based on Deep Neural Networks (DNN) to automatically predict an outcome with respect to a sequence of inputs. Recurrent hidden cells allow these DNN-based models to manage long-term dependencies such as Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM). Nevertheless, these RNNs pr… ▽ More

    Submitted 11 February, 2017; originally announced February 2017.

    Comments: 2016 IEEE Workshop on Spoken Language Technology