-
Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition
Authors:
Patrick Eickhoff,
Matthias Möller,
Theresa Pekarek Rosin,
Johannes Twiefel,
Stefan Wermter
Abstract:
In recent research, in the domain of speech processing, large End-to-End (E2E) systems for Automatic Speech Recognition (ASR) have reported state-of-the-art performance on various benchmarks. These systems intrinsically learn how to handle and remove noise conditions from speech. Previous research has shown, that it is possible to extract the denoising capabilities of these models into a preproces…
▽ More
In recent research, in the domain of speech processing, large End-to-End (E2E) systems for Automatic Speech Recognition (ASR) have reported state-of-the-art performance on various benchmarks. These systems intrinsically learn how to handle and remove noise conditions from speech. Previous research has shown, that it is possible to extract the denoising capabilities of these models into a preprocessor network, which can be used as a frontend for downstream ASR models. However, the proposed methods were limited to specific fully convolutional architectures. In this work, we propose a novel method to extract the denoising capabilities, that can be applied to any encoder-decoder architecture. We propose the Cleancoder preprocessor architecture that extracts hidden activations from the Conformer ASR model and feeds them to a decoder to predict denoised spectrograms. We train our pre-processor on the Noisy Speech Database (NSD) to reconstruct denoised spectrograms from noisy inputs. Then, we evaluate our model as a frontend to a pretrained Conformer ASR model as well as a frontend to train smaller Conformer ASR models from scratch. We show that the Cleancoder is able to filter noise from speech and that it improves the total Word Error Rate (WER) of the downstream model in noisy conditions for both applications.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Electron-beam Calibration of Aerogel Tiles for the HELIX RICH Detector
Authors:
P. Allison,
M. Baiocchi,
J. J. Beatty,
L. Beaufore,
D. H. Calderone,
Y. Chen,
S. Coutu,
E. Ellingwood,
N. Green,
D. Hanna,
H. B. Jeon,
R. Mbarek,
K. McBride,
I. Mognet,
J. Musser,
S. Nutter,
S. O'Brien,
N. Park,
T. Rosin,
M. Tabata,
G. Tarlé,
G. Visser,
S. P. Wakely,
M. Yu
Abstract:
The HELIX cosmic-ray detector is a balloon-borne instrument designed to measure the flux of light isotopes in the energy range from 0.2 GeV/n to beyond 3 GeV/n. It will rely on a ring-imaging Cherenkov (RICH) detector for particle identification at energies greater than 1 GeV/n and will use aerogel tiles with refractive index near 1.15 as the radiator. To achieve the performance goals of the exper…
▽ More
The HELIX cosmic-ray detector is a balloon-borne instrument designed to measure the flux of light isotopes in the energy range from 0.2 GeV/n to beyond 3 GeV/n. It will rely on a ring-imaging Cherenkov (RICH) detector for particle identification at energies greater than 1 GeV/n and will use aerogel tiles with refractive index near 1.15 as the radiator. To achieve the performance goals of the experiment it is necessary to know the refractive index and its position dependence over the lateral extent of the tiles to a precision of O(10$^{-4}). In this paper we describe the apparatus and methods developed to calibrate the HELIX tiles in an electron beam, in order to meet this requirement.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
Replay to Remember: Continual Layer-Specific Fine-tuning for German Speech Recognition
Authors:
Theresa Pekarek Rosin,
Stefan Wermter
Abstract:
While Automatic Speech Recognition (ASR) models have shown significant advances with the introduction of unsupervised or self-supervised training techniques, these improvements are still only limited to a subsection of languages and speakers. Transfer learning enables the adaptation of large-scale multilingual models to not only low-resource languages but also to more specific speaker groups. Howe…
▽ More
While Automatic Speech Recognition (ASR) models have shown significant advances with the introduction of unsupervised or self-supervised training techniques, these improvements are still only limited to a subsection of languages and speakers. Transfer learning enables the adaptation of large-scale multilingual models to not only low-resource languages but also to more specific speaker groups. However, fine-tuning on data from new domains is usually accompanied by a decrease in performance on the original domain. Therefore, in our experiments, we examine how well the performance of large-scale ASR models can be approximated for smaller domains, with our own dataset of German Senior Voice Commands (SVC-de), and how much of the general speech recognition performance can be preserved by selectively freezing parts of the model during training. To further increase the robustness of the ASR model to vocabulary and speakers outside of the fine-tuned domain, we apply Experience Replay for continual learning. By adding only a fraction of data from the original domain, we are able to reach Word-Error-Rates (WERs) below 5\% on the new domain, while stabilizing performance for general speech recognition at acceptable WERs.
△ Less
Submitted 18 October, 2023; v1 submitted 14 July, 2023;
originally announced July 2023.
-
Studies of VERITAS Photomultipliers After Eight Years of Use
Authors:
David Hanna,
Stephan Obrien,
Thomas Rosin
Abstract:
The VERITAS gamma-ray telescope array has been operating since 2007 and has been equipped with Hamamatsu R10560-100-20 PMTs since 2012. A decision to continue operations into the mid 2020s was taken in 2019 so the question of whether the PMTs would need replacing became important and a study was initiated.
We present results from scanning two groups of 20 Hamamatsu R10560-100-20 PMTs with an LED…
▽ More
The VERITAS gamma-ray telescope array has been operating since 2007 and has been equipped with Hamamatsu R10560-100-20 PMTs since 2012. A decision to continue operations into the mid 2020s was taken in 2019 so the question of whether the PMTs would need replacing became important and a study was initiated.
We present results from scanning two groups of 20 Hamamatsu R10560-100-20 PMTs with an LED flasher. One group comprised five PMTs from each of the four VERITAS telescopes and the other was made up of 20 PMTs of the same type, and date of manufacture, that had never been used. We measured three test variables related to gains and high-voltage response and found that there were no significant differences between the two groups. This indicates that there has been little ageing in the PMTs that have been used on the telescopes and that replacement is unnecessary.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.