Skip to main content

Showing 1–12 of 12 results for author: Hueber, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.20101  [pdf, other

    cs.SD cs.CL eess.AS

    Fill in the Gap! Combining Self-supervised Representation Learning with Neural Audio Synthesis for Speech Inpainting

    Authors: Ihab Asaad, Maxime Jacquelin, Olivier Perrotin, Laurent Girin, Thomas Hueber

    Abstract: Most speech self-supervised learning (SSL) models are trained with a pretext task which consists in predicting missing parts of the input signal, either future segments (causal prediction) or segments masked anywhere within the input (non-causal prediction). Learned speech representations can then be efficiently transferred to downstream tasks (e.g., automatic speech or speaker recognition). In th… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  2. arXiv:2309.09804  [pdf, other

    eess.SY

    Energy Management of Hydrogen Hybrid Electric Vehicles -- A Potential Study

    Authors: David Theodor Machacek, Nazim Ozan Yazar, Thomas Huber, Christopher Harald Onder

    Abstract: The hydrogen combustion engine (H$_2$ICE) is known to be able to burn H$_2$ under ultra-lean conditions, while producing no CO$_2$ emissions and extremely low engine-out NO$_x^{\mathrm{eo}}$ emissions. Immediate goals, as for instance the upcoming EURO 7 NO$_x$ limitations, can be reached more easily as extremely low engine-out NO$_x^{\mathrm{eo}}$ emissions facilitate the reduction of the overall… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  3. arXiv:2301.01205  [pdf, other

    eess.SY

    Learning-Based Model Predictive Control for the Energy Management of Hybrid Electric Vehicles Including Driving Mode Decisions

    Authors: David Theodor Machacek, Stijn van Dooren, Thomas Huber, Christopher Onder

    Abstract: This paper presents an online-capable controller for the energy management system of a parallel hybrid electric vehicle based on model predictive control. Its task is to minimize the vehicle's fuel consumption along a predicted driving mission by calculating the distribution of the driver's power request between the electrical and the combustive part of the powertrain, and by choosing the driving… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

  4. arXiv:2207.01718  [pdf, other

    cs.CL eess.AS

    BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model

    Authors: Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber

    Abstract: Several recent studies have tested the use of transformer language model representations to infer prosodic features for text-to-speech synthesis (TTS). While these studies have explored prosody in general, in this work, we look specifically at the prediction of contrastive focus on personal pronouns. This is a particularly challenging task as it often requires semantic, discursive and/or pragmatic… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: 5 pages

  5. arXiv:2204.04965  [pdf, other

    cs.CL cs.SD eess.AS

    Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding

    Authors: Sanjana Sankar, Denis Beautemps, Thomas Hueber

    Abstract: This paper proposes a simple and effective approach for automatic recognition of Cued Speech (CS), a visual communication tool that helps people with hearing impairment to understand spoken language with the help of hand gestures that can uniquely identify the uttered phonemes in complement to lipreading. The proposed approach is based on a pre-trained hand and lips tracker used for visual feature… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

    Journal ref: ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing, May 2022, Singapour, Singapore

  6. arXiv:2204.02269  [pdf, other

    cs.SD cs.CL eess.AS

    Repeat after me: Self-supervised learning of acoustic-to-articulatory map** by vocal imitation

    Authors: Marc-Antoine Georges, Julien Diard, Laurent Girin, Jean-Luc Schwartz, Thomas Hueber

    Abstract: We propose a computational model of speech production combining a pre-trained neural articulatory synthesizer able to reproduce complex speech stimuli from a limited set of interpretable articulatory parameters, a DNN-based internal forward model predicting the sensory consequences of articulatory commands, and an internal inverse model based on a recurrent neural network recovering articulatory c… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

  7. arXiv:2106.06500  [pdf, ps, other

    cs.SD eess.AS

    A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling

    Authors: Xiaoyu Bie, Laurent Girin, Simon Leglaive, Thomas Hueber, Xavier Alameda-Pineda

    Abstract: The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, input data vectors are processed independently. In recent years, a series of papers have presented different extensions of the VAE to process sequential data, th… ▽ More

    Submitted 14 June, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: Accepted to Interspeech 2021. arXiv admin note: text overlap with arXiv:2008.12595

  8. Deep learning-based bias transfer for overcoming laboratory differences of microscopic images

    Authors: Ann-Katrin Thebille, Esther Dietrich, Martin Klaus, Lukas Gernhold, Maximilian Lennartz, Christoph Kuppe, Rafael Kramann, Tobias B. Huber, Guido Sauter, Victor G. Puelles, Marina Zimmermann, Stefan Bonn

    Abstract: The automated analysis of medical images is currently limited by technical and biological noise and bias. The same source tissue can be represented by vastly different images if the image acquisition or processing protocols vary. For an image analysis pipeline, it is crucial to compensate such biases to avoid misinterpretations. Here, we evaluate, compare, and improve existing generative model arc… ▽ More

    Submitted 25 May, 2021; originally announced May 2021.

    Comments: Accepted as a regular conference paper at MIUA 2021

  9. arXiv:2104.03204  [pdf, other

    cs.SD cs.CL eess.AS

    Learning robust speech representation with an articulatory-regularized variational autoencoder

    Authors: Marc-Antoine Georges, Laurent Girin, Jean-Luc Schwartz, Thomas Hueber

    Abstract: It is increasingly considered that human speech perception and production both rely on articulatory representations. In this paper, we investigate whether this type of representation could improve the performances of a deep generative model (here a variational autoencoder) trained to encode and decode acoustic speech features. First we develop an articulatory model able to associate articulatory p… ▽ More

    Submitted 7 April, 2021; originally announced April 2021.

  10. arXiv:2102.09914  [pdf, other

    cs.CL eess.AS

    Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input

    Authors: Brooke Stephenson, Thomas Hueber, Laurent Girin, Laurent Besacier

    Abstract: The prosody of a spoken word is determined by its surrounding context. In incremental text-to-speech synthesis, where the synthesizer produces an output before it has access to the complete input, the full context is often unknown which can result in a loss of naturalness in the synthesized speech. In this paper, we investigate whether the use of predicted future text can attenuate this loss. We c… ▽ More

    Submitted 15 June, 2021; v1 submitted 19 February, 2021; originally announced February 2021.

    Comments: 4 pages

  11. arXiv:2009.02035  [pdf, other

    eess.AS cs.CL

    What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS

    Authors: Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber

    Abstract: In incremental text to speech synthesis (iTTS), the synthesizer produces an audio output before it has access to the entire input sentence. In this paper, we study the behavior of a neural sequence-to-sequence TTS system when used in an incremental mode, i.e. when generating speech output for token n, the system has access to n + k tokens from the text sequence. We first analyze the impact of this… ▽ More

    Submitted 4 September, 2020; originally announced September 2020.

    Comments: 5 pages, 4 figures

  12. arXiv:1806.04096  [pdf, other

    eess.AS cs.SD

    Autoencoders for music sound modeling: a comparison of linear, shallow, deep, recurrent and variational models

    Authors: Fanny Roche, Thomas Hueber, Samuel Limier, Laurent Girin

    Abstract: This study investigates the use of non-linear unsupervised dimensionality reduction techniques to compress a music dataset into a low-dimensional representation which can be used in turn for the synthesis of new sounds. We systematically compare (shallow) autoencoders (AEs), deep autoencoders (DAEs), recurrent autoencoders (with Long Short-Term Memory cells -- LSTM-AEs) and variational autoencoder… ▽ More

    Submitted 24 May, 2019; v1 submitted 11 June, 2018; originally announced June 2018.

    Comments: SMC 2019