Skip to main content

Showing 1–9 of 9 results for author: Plantinga, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00463  [pdf, other

    cs.LG cs.AI cs.CL cs.HC eess.AS

    Open-Source Conversational AI with SpeechBrain 1.0

    Authors: Mirco Ravanelli, Titouan Parcollet, Adel Moumen, Sylvain de Langen, Cem Subakan, Peter Plantinga, Yingzhi Wang, Pooneh Mousavi, Luca Della Libera, Artem Ploujnikov, Francesco Paissan, Davide Borra, Salah Zaiem, Zeyu Zhao, Shucong Zhang, Georgios Karakasidis, Sung-Lin Yeh, Aku Rouhe, Rudolf Braun, Florian Mai, Juan Zuluaga-Gomez, Seyed Mahed Mousavi, Andreas Nautsch, Xuechen Liu, Sangeet Sagar , et al. (5 additional authors not shown)

    Abstract: SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more.It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper presen… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Submitted to JMLR (Machine Learning Open Source Software)

  2. arXiv:2305.09681  [pdf, other

    eess.AS cs.SD

    Continual Learning for End-to-End ASR by Averaging Domain Experts

    Authors: Peter Plantinga, Jaekwon Yoo, Chandra Dhir

    Abstract: Continual learning for end-to-end automatic speech recognition has to contend with a number of difficulties. Fine-tuning strategies tend to lose performance on data already seen, a process known as catastrophic forgetting. On the other hand, strategies that freeze parameters and append tunable parameters must maintain multiple models. We suggest a strategy that maintains only a single model for in… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: Submitted to INTERSPEECH 2023

  3. arXiv:2112.06068  [pdf, other

    cs.SD eess.AS

    Perceptual Loss with Recognition Model for Single-Channel Enhancement and Robust ASR

    Authors: Peter Plantinga, Deblin Bagchi, Eric Fosler-Lussier

    Abstract: Single-channel speech enhancement approaches do not always improve automatic recognition rates in the presence of noise, because they can introduce distortions unhelpful for recognition. Following a trend towards end-to-end training of sequential neural network models, several research groups have addressed this problem with joint training of front-end enhancement module with back-end recognition… ▽ More

    Submitted 11 December, 2021; originally announced December 2021.

  4. arXiv:2106.04624  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    SpeechBrain: A General-Purpose Speech Toolkit

    Authors: Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-Chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato De Mori, Yoshua Bengio

    Abstract: SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the core architecture designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: Preprint

  5. arXiv:2104.03538  [pdf

    cs.SD cs.AI eess.AS

    MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

    Authors: Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao

    Abstract: The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory. Objective evaluation metrics which consider human perception can hence serve as a bridge to reduce the gap. Our previously proposed MetricGAN was designed to optimize objective metrics by connecting the metric with a discr… ▽ More

    Submitted 4 June, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

    Comments: Accepted by Interspeech 2021

  6. arXiv:2003.01769  [pdf, other

    eess.AS cs.CL cs.SD

    Phonetic Feedback for Speech Enhancement With and Without Parallel Speech Data

    Authors: Peter Plantinga, Deblin Bagchi, Eric Fosler-Lussier

    Abstract: While deep learning systems have gained significant ground in speech enhancement research, these systems have yet to make use of the full potential of deep learning systems to provide high-level feedback. In particular, phonetic feedback is rare in speech enhancement research even though it includes valuable top-down information. We use the technique of mimic loss to provide phonetic feedback to a… ▽ More

    Submitted 3 March, 2020; originally announced March 2020.

    Comments: 4 pages + 1 page for references, accepted to ICASSP 2020

  7. arXiv:2003.01765  [pdf, other

    eess.AS cs.CL cs.SD

    Towards Real-time Mispronunciation Detection in Kids' Speech

    Authors: Peter Plantinga, Eric Fosler-Lussier

    Abstract: Modern mispronunciation detection and diagnosis systems have seen significant gains in accuracy due to the introduction of deep learning. However, these systems have not been evaluated for the ability to be run in real-time, an important factor in applications that provide rapid feedback. In particular, the state-of-the-art uses bi-directional recurrent networks, where a uni-directional network ma… ▽ More

    Submitted 3 March, 2020; originally announced March 2020.

    Comments: 6 pages + 1 page for references, accepted at ASRU 2019

  8. arXiv:1809.09756  [pdf, other

    cs.SD eess.AS

    An Exploration of Mimic Architectures for Residual Network Based Spectral Map**

    Authors: Peter Plantinga, Deblin Bagchi, Eric Fosler-Lussier

    Abstract: Spectral map** uses a deep neural network (DNN) to map directly from noisy speech to clean speech. Our previous study found that the performance of spectral map** improves greatly when using helpful cues from an acoustic model trained on clean speech. The mapper network learns to mimic the input favored by the spectral classifier and cleans the features accordingly. In this study, we explore t… ▽ More

    Submitted 25 September, 2018; originally announced September 2018.

    Comments: Published in the IEEE 2018 Workshop on Spoken Language Technology (SLT 2018)

  9. arXiv:1803.09816  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Spectral feature map** with mimic loss for robust speech recognition

    Authors: Deblin Bagchi, Peter Plantinga, Adam Stiff, Eric Fosler-Lussier

    Abstract: For the task of speech enhancement, local learning objectives are agnostic to phonetic structures helpful for speech recognition. We propose to add a global criterion to ensure de-noised speech is useful for downstream tasks like ASR. We first train a spectral classifier on clean speech to predict senone labels. Then, the spectral classifier is joined with our speech enhancer as a noisy speech rec… ▽ More

    Submitted 26 March, 2018; originally announced March 2018.