Skip to main content

Showing 1–12 of 12 results for author: Obin, N

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.04467  [pdf, other

    eess.AS cs.CL cs.SD

    Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis

    Authors: Théodor Lemerle, Nicolas Obin, Axel Roebel

    Abstract: Recent advancements in text-to-speech (TTS) powered by language models have showcased remarkable capabilities in achieving naturalness and zero-shot voice cloning. Notably, the decoder-only transformer is the prominent architecture in this domain. However, transformers face challenges stemming from their quadratic complexity in sequence length, impeding training on lengthy sequences and resource-c… ▽ More

    Submitted 11 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Interspeech

  2. arXiv:2309.02592  [pdf, other

    eess.AS cs.SD

    BWSNet: Automatic Perceptual Assessment of Audio Signals

    Authors: Clément Le Moine Veillon, Victor Rosi, Pablo Arias Sarah, Léane Salais, Nicolas Obin

    Abstract: This paper introduces BWSNet, a model that can be trained from raw human judgements obtained through a Best-Worst scaling (BWS) experiment. It maps sound samples into an embedded space that represents the perception of a studied attribute. To this end, we propose a set of cost functions and constraints, interpreting trial-wise ordinal relations as distance comparisons in a metric learning task. We… ▽ More

    Submitted 21 January, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

  3. arXiv:2308.10843  [pdf, other

    cs.MM cs.CV cs.LG cs.SD eess.AS

    TranSTYLer: Multimodal Behavioral Style Transfer for Facial and Body Gestures Generation

    Authors: Mireille Fares, Catherine Pelachaud, Nicolas Obin

    Abstract: This paper addresses the challenge of transferring the behavior expressivity style of a virtual agent to another one while preserving behaviors shape as they carry communicative meaning. Behavior expressivity style is viewed here as the qualitative properties of behaviors. We propose TranSTYLer, a multimodal transformer based model that synthesizes the multimodal behaviors of a source speaker with… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  4. arXiv:2305.12887  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    ZS-MSTM: Zero-Shot Style Transfer for Gesture Animation driven by Text and Speech using Adversarial Disentanglement of Multimodal Style Encoding

    Authors: Mireille Fares, Catherine Pelachaud, Nicolas Obin

    Abstract: In this study, we address the importance of modeling behavior style in virtual agents for personalized human-agent interaction. We propose a machine learning approach to synthesize gestures, driven by prosodic features and text, in the style of different speakers, even those unseen during training. Our model incorporates zero-shot multimodal style transfer using multimodal data from the PATS datab… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2208.01917

  5. arXiv:2208.01917  [pdf, other

    cs.SD cs.HC cs.LG eess.AS

    Zero-Shot Style Transfer for Gesture Animation driven by Text and Speech using Adversarial Disentanglement of Multimodal Style Encoding

    Authors: Mireille Fares, Michele Grimaldi, Catherine Pelachaud, Nicolas Obin

    Abstract: Modeling virtual agents with behavior style is one factor for personalizing human agent interaction. We propose an efficient yet effective machine learning approach to synthesize gestures driven by prosodic features and text in the style of different speakers including those unseen during training. Our model performs zero shot multimodal style transfer driven by multimodal data from the PATS datab… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

  6. arXiv:2110.04527  [pdf, other

    eess.AS

    Transformer Network for Semantically-Aware and Speech-Driven Upper-Face Generation

    Authors: Mireille Fares, Catherine Pelachaud, Nicolas Obin

    Abstract: We propose a semantically-aware speech driven model to generate expressive and natural upper-facial and head motion for Embodied Conversational Agents (ECA). In this work, we aim to produce natural and continuous head motion and upper-facial gestures synchronized with speech. We propose a model that generates these gestures based on multimodal input features: the first modality is text, and the se… ▽ More

    Submitted 21 May, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

  7. arXiv:2110.03744  [pdf, other

    cs.SD eess.AS

    Voice Reenactment with F0 and timing constraints and adversarial learning of conversions

    Authors: Frederik Bous, Laurent Benaroya, Nicolas Obin, Axel Roebel

    Abstract: This paper introduces voice reenactement as the task of voice conversion (VC) in which the expressivity of the source speaker is preserved during conversion while the identity of a target speaker is transferred. To do so, an original neural- VC architecture is proposed based on sequence-to-sequence voice conversion (S2S-VC) in which the speech prosody of the source speaker is preserved during conv… ▽ More

    Submitted 31 May, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: arXiv admin note: text overlap with arXiv:2107.12346

  8. arXiv:2107.12346  [pdf, other

    cs.SD cs.LG eess.AS

    Beyond Voice Identity Conversion: Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations

    Authors: Laurent Benaroya, Nicolas Obin, Axel Roebel

    Abstract: Voice conversion (VC) consists of digitally altering the voice of an individual to manipulate part of its content, primarily its identity, while maintaining the rest unchanged. Research in neural VC has accomplished considerable breakthroughs with the capacity to falsify a voice identity using a small amount of data with a highly realistic rendering. This paper goes beyond voice identity and prese… ▽ More

    Submitted 27 July, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

  9. arXiv:2104.07288  [pdf, other

    eess.AS cs.LG cs.SD

    Speaker Attentive Speech Emotion Recognition

    Authors: Clément Le Moine, Nicolas Obin, Axel Roebel

    Abstract: Speech Emotion Recognition (SER) task has known significant improvements over the last years with the advent of Deep Neural Networks (DNNs). However, even the most successful methods are still rather failing when adaptation to specific speakers and scenarios is needed, inevitably leading to poorer performances when compared to humans. In this paper, we present novel work based on the idea of teach… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  10. arXiv:2104.07283  [pdf, other

    eess.AS cs.LG cs.SD

    Towards end-to-end F0 voice conversion based on Dual-GAN with convolutional wavelet kernels

    Authors: Clément Le Moine Veillon, Nicolas Obin, Axel Roebel

    Abstract: This paper presents a end-to-end framework for the F0 transformation in the context of expressive voice conversion. A single neural network is proposed, in which a first module is used to learn F0 representation over different temporal scales and a second adversarial module is used to learn the transformation from one emotion to another. The first module is composed of a convolution layer with wav… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  11. arXiv:2004.04410  [pdf, other

    eess.AS

    Att-HACK: An Expressive Speech Database with Social Attitudes

    Authors: Clément Le Moine, Nicolas Obin

    Abstract: This paper presents Att-HACK, the first large database of acted speech with social attitudes. Available databases of expressive speech are rare and very often restricted to the primary emotions: anger, joy, sadness, fear. This greatly limits the scope of the research on expressive speech. Besides, a fundamental aspect of speech prosody is always ignored and missing from such databases: its variety… ▽ More

    Submitted 9 April, 2020; originally announced April 2020.

    Comments: 5 pages, 5 figures

  12. arXiv:1910.12614  [pdf, other

    eess.AS cs.LG cs.SD

    CycleGAN Voice Conversion of Spectral Envelopes using Adversarial Weights

    Authors: Rafael Ferro, Nicolas Obin, Axel Roebel

    Abstract: This paper tackles GAN optimization and stability issues in the context of voice conversion. First, to simplify the conversion task, we propose to use spectral envelopes as inputs. Second we propose two adversarial weight training paradigms, the generalized weighted GAN and the generator impact GAN, both aim at reducing the impact of the generator on the discriminator, so both can learn more gradu… ▽ More

    Submitted 11 July, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: 5 pages, 1 figure