Skip to main content

Showing 1–11 of 11 results for author: Alexanderson, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.19622  [pdf, other

    cs.HC cs.CV cs.GR cs.SD eess.AS

    Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis

    Authors: Shivam Mehta, Anna Deichler, Jim O'Regan, Birger Moëll, Jonas Beskow, Gustav Eje Henter, Simon Alexanderson

    Abstract: Although humans engaged in face-to-face conversation simultaneously communicate both verbally and non-verbally, methods for joint and unified synthesis of speech audio and co-speech 3D gesture motion from text are a new and emerging field. These technologies hold great promise for more human-like, efficient, expressive, and robust synthetic communication, but are currently held back by the lack of… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 13+1 pages, 2 figures, accepted at the Human Motion Generation workshop (HuMoGen) at CVPR 2024

    MSC Class: 68T07 (Primary); 68T42 (Secondary) ACM Class: I.2.7; I.2.6; H.5

  2. arXiv:2310.05181  [pdf, other

    eess.AS cs.GR cs.HC cs.LG cs.SD

    Unified speech and gesture synthesis using flow matching

    Authors: Shivam Mehta, Ruibo Tu, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter

    Abstract: As text-to-speech technologies achieve remarkable naturalness in read-aloud tasks, there is growing interest in multimodal synthesis of verbal and non-verbal communicative behaviour, such as spontaneous speech and associated body gestures. This paper presents a novel, unified architecture for jointly synthesising speech acoustics and skeleton-based 3D gesture motion from text, trained using optima… ▽ More

    Submitted 9 January, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: 5 pages, 1 figure. Final version, accepted to IEEE ICASSP 2024

    MSC Class: 68T07 (Primary); 68T42 (Secondary) ACM Class: I.2.7; I.2.6; H.5

  3. arXiv:2309.05455  [pdf, other

    eess.AS cs.HC cs.LG cs.SD

    Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation

    Authors: Anna Deichler, Shivam Mehta, Simon Alexanderson, Jonas Beskow

    Abstract: This paper describes a system developed for the GENEA (Generation and Evaluation of Non-verbal Behaviour for Embodied Agents) Challenge 2023. Our solution builds on an existing diffusion-based motion synthesis model. We propose a contrastive speech and motion pretraining (CSMP) module, which learns a joint embedding for speech and gesture with the aim to learn a semantic coupling between these mod… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    MSC Class: 68T42 ACM Class: I.2.6; I.2.7

  4. arXiv:2306.09417  [pdf, other

    eess.AS cs.AI cs.CV cs.HC cs.LG

    Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

    Authors: Shivam Mehta, Siyang Wang, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter

    Abstract: With read-aloud speech synthesis achieving high naturalness scores, there is a growing research interest in synthesising spontaneous speech. However, human spontaneous face-to-face conversation has both spoken and non-verbal aspects (here, co-speech gestures). Only recently has research begun to explore the benefits of jointly synthesising these two modalities in a single system. The previous stat… ▽ More

    Submitted 9 August, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: 7 pages, 2 figures, presented at the ISCA Speech Synthesis Workshop (SSW) 2023

    MSC Class: 68T07 (Primary); 68T42 (Secondary) ACM Class: I.2.7; I.2.6; G.3; H.5.5

  5. arXiv:2211.09707  [pdf, other

    cs.LG cs.CV cs.GR cs.HC cs.SD eess.AS

    Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models

    Authors: Simon Alexanderson, Rajmund Nagy, Jonas Beskow, Gustav Eje Henter

    Abstract: Diffusion models have experienced a surge of interest as highly expressive yet efficiently trainable probabilistic models. We show that these models are an excellent fit for synthesising human motion that co-occurs with audio, e.g., dancing and co-speech gesticulation, since motion is complex and highly ambiguous given audio, calling for a probabilistic description. Specifically, we adapt the Diff… ▽ More

    Submitted 16 May, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: 20 pages, 9 figures. Published in ACM ToG and presented at SIGGRAPH 2023

    MSC Class: 68T07 ACM Class: G.3; I.2.6; I.3.7; J.5

    Journal ref: ACM Trans. Graph. 42, 4 (August 2023), 20 pages

  6. arXiv:2108.11436  [pdf, other

    cs.HC cs.GR cs.LG cs.SD eess.AS

    Integrated Speech and Gesture Synthesis

    Authors: Siyang Wang, Simon Alexanderson, Joakim Gustafson, Jonas Beskow, Gustav Eje Henter, Éva Székely

    Abstract: Text-to-speech and co-speech gesture synthesis have until now been treated as separate areas by two different research communities, and applications merely stack the two technologies using a simple system-level pipeline. This can lead to modeling inefficiencies and may introduce inconsistencies that limit the achievable naturalness. We propose to instead synthesize the two modalities in a single m… ▽ More

    Submitted 25 August, 2021; originally announced August 2021.

    Comments: 9 pages, accepted at ICMI 2021

  7. arXiv:2106.13871  [pdf, other

    cs.SD cs.GR cs.LG eess.AS

    Transflower: probabilistic autoregressive dance generation with multimodal attention

    Authors: Guillermo Valle-Pérez, Gustav Eje Henter, Jonas Beskow, André Holzapfel, Pierre-Yves Oudeyer, Simon Alexanderson

    Abstract: Dance requires skillful composition of complex movements that follow rhythmic, tonal and timbral features of music. Formally, generating dance conditioned on a piece of music can be expressed as a problem of modelling a high-dimensional continuous motion signal, conditioned on an audio signal. In this work we make two contributions to tackle this problem. First, we present a novel probabilistic au… ▽ More

    Submitted 11 June, 2022; v1 submitted 25 June, 2021; originally announced June 2021.

    Comments: Article presented at SIGGRAPH Asia 2021, and published in ACM Transactions on Graphics

  8. arXiv:2101.05684  [pdf, other

    cs.LG cs.GR cs.SD eess.AS

    Generating coherent spontaneous speech and gesture from text

    Authors: Simon Alexanderson, Éva Székely, Gustav Eje Henter, Taras Kucherenko, Jonas Beskow

    Abstract: Embodied human communication encompasses both verbal (speech) and non-verbal information (e.g., gesture and head movements). Recent advances in machine learning have substantially improved the technologies for generating synthetic versions of both of these types of data: On the speech side, text-to-speech systems are now able to generate highly convincing, spontaneous-sounding speech using unscrip… ▽ More

    Submitted 14 January, 2021; originally announced January 2021.

    Comments: 3 pages, 2 figures, published at the ACM International Conference on Intelligent Virtual Agents (IVA) 2020

    MSC Class: 68T07 ACM Class: I.2.6; J.4; I.3.7; I.2.9

    Journal ref: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents (IVA '20), 2020, 3 pages

  9. arXiv:2006.06599  [pdf, other

    cs.LG stat.ML

    Robust model training and generalisation with Studentising flows

    Authors: Simon Alexanderson, Gustav Eje Henter

    Abstract: Normalising flows are tractable probabilistic models that leverage the power of deep learning to describe a wide parametric family of distributions, all while remaining trainable using maximum likelihood. We discuss how these methods can be further improved based on insights from robust (in particular, resistant) statistics. Specifically, we propose to endow flow-based models with fat-tailed laten… ▽ More

    Submitted 11 July, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: 9 pages, 8 figures, accepted for publication at INNF+ 2020 (Second ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models)

    MSC Class: 68T07 (Primary); 62F35 (Secondary) ACM Class: I.2.6; G.3

  10. arXiv:2001.09326  [pdf, other

    cs.HC cs.LG eess.AS

    Gesticulator: A framework for semantically-aware speech-driven gesture generation

    Authors: Taras Kucherenko, Patrik Jonell, Sanne van Waveren, Gustav Eje Henter, Simon Alexanderson, Iolanda Leite, Hedvig Kjellström

    Abstract: During speech, people spontaneously gesticulate, which plays a key role in conveying information. Similarly, realistic co-speech gestures are crucial to enable natural and smooth interactions with social agents. Current end-to-end co-speech gesture generation systems use a single modality for representing speech: either audio or text. These systems are therefore confined to producing either acoust… ▽ More

    Submitted 14 January, 2021; v1 submitted 25 January, 2020; originally announced January 2020.

    Comments: ICMI 2020 Best Paper Award. Code is available. 9 pages, 6 figures

    ACM Class: I.2.7; I.2.6; I.3.7

    Journal ref: Proceedings of the 2020 International Conference on Multimodal Interaction (ICMI '20)

  11. arXiv:1905.06598  [pdf, other

    cs.LG cs.GR eess.IV stat.ML

    MoGlow: Probabilistic and controllable motion synthesis using normalising flows

    Authors: Gustav Eje Henter, Simon Alexanderson, Jonas Beskow

    Abstract: Data-driven modelling and synthesis of motion is an active research area with applications that include animation, games, and social robotics. This paper introduces a new class of probabilistic, generative, and controllable motion-data models based on normalising flows. Models of this kind can describe highly complex distributions, yet can be trained efficiently using exact maximum likelihood, unl… ▽ More

    Submitted 7 December, 2020; v1 submitted 16 May, 2019; originally announced May 2019.

    Comments: 14 pages, 5 figures, published in ACM Transactions on Graphics and presented at SIGGRAPH Asia 2020

    ACM Class: I.3.7; G.3; I.2.6

    Journal ref: ACM Trans. Graph. 39, 4, Article 236 (November 2020), 14 pages