Fill in the Gap! Combining Self-supervised Representation Learning with Neural Audio Synthesis for Speech Inpainting
Authors:
Ihab Asaad,
Maxime Jacquelin,
Olivier Perrotin,
Laurent Girin,
Thomas Hueber
Abstract:
Most speech self-supervised learning (SSL) models are trained with a pretext task which consists in predicting missing parts of the input signal, either future segments (causal prediction) or segments masked anywhere within the input (non-causal prediction). Learned speech representations can then be efficiently transferred to downstream tasks (e.g., automatic speech or speaker recognition). In th…
▽ More
Most speech self-supervised learning (SSL) models are trained with a pretext task which consists in predicting missing parts of the input signal, either future segments (causal prediction) or segments masked anywhere within the input (non-causal prediction). Learned speech representations can then be efficiently transferred to downstream tasks (e.g., automatic speech or speaker recognition). In the present study, we investigate the use of a speech SSL model for speech inpainting, that is reconstructing a missing portion of a speech signal from its surrounding context, i.e., fulfilling a downstream task that is very similar to the pretext task. To that purpose, we combine an SSL encoder, namely HuBERT, with a neural vocoder, namely HiFiGAN, playing the role of a decoder. In particular, we propose two solutions to match the HuBERT output with the HiFiGAN input, by freezing one and fine-tuning the other, and vice versa. Performance of both approaches was assessed in single- and multi-speaker settings, for both informed and blind inpainting configurations (i.e., the position of the mask is known or unknown, respectively), with different objective metrics and a perceptual evaluation. Performances show that if both solutions allow to correctly reconstruct signal portions up to the size of 200ms (and even 400ms in some cases), fine-tuning the SSL encoder provides a more accurate signal reconstruction in the single-speaker setting case, while freezing it (and training the neural vocoder instead) is a better strategy when dealing with multi-speaker data.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
Design of neural nonlinear PFC Controller to control speed of Autonomous Car
Authors:
Isam Asaad,
Bilal Chiha
Abstract:
In this research, we are going to design a neural nonlinear predictive functional controller (PFC) to achieve a reduced fuel consumption for a chosen autonomous car walks according to a supplied speed trajectory on known roads. We used a fitting neural network as a simple tool for modelling the car's engine and control laws needed to calculate the suitable control commands passed to the brakes and…
▽ More
In this research, we are going to design a neural nonlinear predictive functional controller (PFC) to achieve a reduced fuel consumption for a chosen autonomous car walks according to a supplied speed trajectory on known roads. We used a fitting neural network as a simple tool for modelling the car's engine and control laws needed to calculate the suitable control commands passed to the brakes and gas pedals' actuators. Independent model method and constraints handling are used to provide controller robustness. We used MATLAB Simulink and IPG CarMaker to design and test our PFC controller. The performance of designed PFC controller is compared to the performance of a PI controller which exists within IPG CarMaker simulator. Keywords :- Predictive Functional Controller, Fuel Consumption, Neural Network, Independent Model, Constraint Handling, PI Controller.
△ Less
Submitted 23 September, 2019;
originally announced September 2019.