Skip to main content

Showing 1–2 of 2 results for author: Iwamoto, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.00374  [pdf, other

    cs.CV

    EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling

    Authors: Haiyang Liu, Zihao Zhu, Giorgio Becherini, Yichen Peng, Mingyang Su, You Zhou, Xuefei Zhe, Naoya Iwamoto, Bo Zheng, Michael J. Black

    Abstract: We propose EMAGE, a framework to generate full-body human gestures from audio and masked gestures, encompassing facial, local body, hands, and global movements. To achieve this, we first introduce BEAT2 (BEAT-SMPLX-FLAME), a new mesh-level holistic co-speech dataset. BEAT2 combines a MoShed SMPL-X body with FLAME head parameters and further refines the modeling of head, neck, and finger movements,… ▽ More

    Submitted 30 March, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

    Comments: Fix typos; Conflict of Interest Disclosure; CVPR Camera Ready; Project Page: https://pantomatrix.github.io/EMAGE/

  2. arXiv:2203.05297  [pdf, other

    cs.CV cs.CL cs.GR cs.LG cs.MM

    BEAT: A Large-Scale Semantic and Emotional Multi-Modal Dataset for Conversational Gestures Synthesis

    Authors: Haiyang Liu, Zihao Zhu, Naoya Iwamoto, Yichen Peng, Zhengqing Li, You Zhou, Elif Bozkurt, Bo Zheng

    Abstract: Achieving realistic, vivid, and human-like synthesized conversational gestures conditioned on multi-modal data is still an unsolved problem due to the lack of available datasets, models and standard evaluation metrics. To address this, we build Body-Expression-Audio-Text dataset, BEAT, which has i) 76 hours, high-quality, multi-modal data captured from 30 speakers talking with eight different emot… ▽ More

    Submitted 20 September, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

    Comments: 28 pages, 15 figures, Accepted by ECCV2022