Search | arXiv e-print repository

Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition

Authors: Patrick Eickhoff, Matthias Möller, Theresa Pekarek Rosin, Johannes Twiefel, Stefan Wermter

Abstract: In recent research, in the domain of speech processing, large End-to-End (E2E) systems for Automatic Speech Recognition (ASR) have reported state-of-the-art performance on various benchmarks. These systems intrinsically learn how to handle and remove noise conditions from speech. Previous research has shown, that it is possible to extract the denoising capabilities of these models into a preproces… ▽ More In recent research, in the domain of speech processing, large End-to-End (E2E) systems for Automatic Speech Recognition (ASR) have reported state-of-the-art performance on various benchmarks. These systems intrinsically learn how to handle and remove noise conditions from speech. Previous research has shown, that it is possible to extract the denoising capabilities of these models into a preprocessor network, which can be used as a frontend for downstream ASR models. However, the proposed methods were limited to specific fully convolutional architectures. In this work, we propose a novel method to extract the denoising capabilities, that can be applied to any encoder-decoder architecture. We propose the Cleancoder preprocessor architecture that extracts hidden activations from the Conformer ASR model and feeds them to a decoder to predict denoised spectrograms. We train our pre-processor on the Noisy Speech Database (NSD) to reconstruct denoised spectrograms from noisy inputs. Then, we evaluate our model as a frontend to a pretrained Conformer ASR model as well as a frontend to train smaller Conformer ASR models from scratch. We show that the Cleancoder is able to filter noise from speech and that it improves the total Word Error Rate (WER) of the downstream model in noisy conditions for both applications. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: Submitted and accepted for ICANN 2023 (32nd International Conference on Artificial Neural Networks)

arXiv:2307.09689 [pdf, other]

doi 10.1016/j.nima.2023.168549

Electron-beam Calibration of Aerogel Tiles for the HELIX RICH Detector

Authors: P. Allison, M. Baiocchi, J. J. Beatty, L. Beaufore, D. H. Calderone, Y. Chen, S. Coutu, E. Ellingwood, N. Green, D. Hanna, H. B. Jeon, R. Mbarek, K. McBride, I. Mognet, J. Musser, S. Nutter, S. O'Brien, N. Park, T. Rosin, M. Tabata, G. Tarlé, G. Visser, S. P. Wakely, M. Yu

Abstract: The HELIX cosmic-ray detector is a balloon-borne instrument designed to measure the flux of light isotopes in the energy range from 0.2 GeV/n to beyond 3 GeV/n. It will rely on a ring-imaging Cherenkov (RICH) detector for particle identification at energies greater than 1 GeV/n and will use aerogel tiles with refractive index near 1.15 as the radiator. To achieve the performance goals of the exper… ▽ More The HELIX cosmic-ray detector is a balloon-borne instrument designed to measure the flux of light isotopes in the energy range from 0.2 GeV/n to beyond 3 GeV/n. It will rely on a ring-imaging Cherenkov (RICH) detector for particle identification at energies greater than 1 GeV/n and will use aerogel tiles with refractive index near 1.15 as the radiator. To achieve the performance goals of the experiment it is necessary to know the refractive index and its position dependence over the lateral extent of the tiles to a precision of O(10$^{-4}). In this paper we describe the apparatus and methods developed to calibrate the HELIX tiles in an electron beam, in order to meet this requirement. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: 27 pages and 16 figures. Accepted for publication in Nuclear Instruments and Methods A

arXiv:2307.07280 [pdf, other]

doi 10.1007/978-3-031-44195-0_40

Replay to Remember: Continual Layer-Specific Fine-tuning for German Speech Recognition

Authors: Theresa Pekarek Rosin, Stefan Wermter

Abstract: While Automatic Speech Recognition (ASR) models have shown significant advances with the introduction of unsupervised or self-supervised training techniques, these improvements are still only limited to a subsection of languages and speakers. Transfer learning enables the adaptation of large-scale multilingual models to not only low-resource languages but also to more specific speaker groups. Howe… ▽ More While Automatic Speech Recognition (ASR) models have shown significant advances with the introduction of unsupervised or self-supervised training techniques, these improvements are still only limited to a subsection of languages and speakers. Transfer learning enables the adaptation of large-scale multilingual models to not only low-resource languages but also to more specific speaker groups. However, fine-tuning on data from new domains is usually accompanied by a decrease in performance on the original domain. Therefore, in our experiments, we examine how well the performance of large-scale ASR models can be approximated for smaller domains, with our own dataset of German Senior Voice Commands (SVC-de), and how much of the general speech recognition performance can be preserved by selectively freezing parts of the model during training. To further increase the robustness of the ASR model to vocabulary and speakers outside of the fine-tuned domain, we apply Experience Replay for continual learning. By adding only a fraction of data from the original domain, we are able to reach Word-Error-Rates (WERs) below 5\% on the new domain, while stabilizing performance for general speech recognition at acceptable WERs. △ Less

Submitted 18 October, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

Comments: 13 pages, 7 figures, accepted and presented at ICANN 2023

Journal ref: Artificial Neural Networks and Machine Learning - ICANN 2023, Lecture Notes in Computer Science, vol 14260, 489-500

arXiv:2112.10707 [pdf, other]

doi 10.1016/j.nima.2021.166235

Studies of VERITAS Photomultipliers After Eight Years of Use

Authors: David Hanna, Stephan Obrien, Thomas Rosin

Abstract: The VERITAS gamma-ray telescope array has been operating since 2007 and has been equipped with Hamamatsu R10560-100-20 PMTs since 2012. A decision to continue operations into the mid 2020s was taken in 2019 so the question of whether the PMTs would need replacing became important and a study was initiated. We present results from scanning two groups of 20 Hamamatsu R10560-100-20 PMTs with an LED… ▽ More The VERITAS gamma-ray telescope array has been operating since 2007 and has been equipped with Hamamatsu R10560-100-20 PMTs since 2012. A decision to continue operations into the mid 2020s was taken in 2019 so the question of whether the PMTs would need replacing became important and a study was initiated. We present results from scanning two groups of 20 Hamamatsu R10560-100-20 PMTs with an LED flasher. One group comprised five PMTs from each of the four VERITAS telescopes and the other was made up of 20 PMTs of the same type, and date of manufacture, that had never been used. We measured three test variables related to gains and high-voltage response and found that there were no significant differences between the two groups. This indicates that there has been little ageing in the PMTs that have been used on the telescopes and that replacement is unnecessary. △ Less

Submitted 20 December, 2021; originally announced December 2021.

Showing 1–4 of 4 results for author: Rosin, T