Skip to main content

Showing 1–30 of 30 results for author: Bolkart, T

.
  1. arXiv:2404.04104  [pdf, other

    cs.CV

    3D Facial Expressions through Analysis-by-Neural-Synthesis

    Authors: George Retsinas, Panagiotis P. Filntisis, Radek Danecek, Victoria F. Abrevaya, Anastasios Roussos, Timo Bolkart, Petros Maragos

    Abstract: While existing methods for 3D face reconstruction from in-the-wild images excel at recovering the overall face shape, they commonly miss subtle, extreme, asymmetric, or rarely observed expressions. We improve upon these methods with SMIRK (Spatial Modeling for Image-based Reconstruction of Kinesics), which faithfully reconstructs expressive 3D faces from images. We identify two key limitations in… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  2. arXiv:2312.04466  [pdf, other

    cs.CV

    Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion

    Authors: Kiran Chhatre, Radek Daněček, Nikos Athanasiou, Giorgio Becherini, Christopher Peters, Michael J. Black, Timo Bolkart

    Abstract: Existing methods for synthesizing 3D human gestures from speech have shown promising results, but they do not explicitly model the impact of emotions on the generated gestures. Instead, these methods directly output animations from speech without control over the expressed emotion. To address this limitation, we present AMUSE, an emotional speech-driven body animation model based on latent diffusi… ▽ More

    Submitted 1 April, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: Conference on Computer Vision and Pattern Recognition (CVPR) 2024. Webpage: https://amuse.is.tue.mpg.de/

  3. arXiv:2309.06441  [pdf, other

    cs.CV cs.AI cs.GR

    Learning Disentangled Avatars with Hybrid 3D Representations

    Authors: Yao Feng, Weiyang Liu, Timo Bolkart, **long Yang, Marc Pollefeys, Michael J. Black

    Abstract: Tremendous efforts have been made to learn animatable and photorealistic human avatars. Towards this end, both explicit and implicit 3D representations are heavily studied for a holistic modeling and capture of the whole human (e.g., body, clothing, face and hair), but neither representation is an optimal choice in terms of representation efficacy since different parts of the human avatar have dif… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: home page: https://yfeng95.github.io/delta. arXiv admin note: text overlap with arXiv:2210.01868

  4. arXiv:2308.10638  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes

    Authors: Soubhik Sanyal, Partha Ghosh, **long Yang, Michael J. Black, Justus Thies, Timo Bolkart

    Abstract: We present SCULPT, a novel 3D generative model for clothed and textured 3D meshes of humans. Specifically, we devise a deep neural network that learns to represent the geometry and appearance distribution of clothed human bodies. Training such a model is challenging, as datasets of textured 3D meshes for humans are limited in size and accessibility. Our key observation is that there exist medium-s… ▽ More

    Submitted 6 May, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: Updated to camera ready version of CVPR 2024

  5. Emotional Speech-Driven Animation with Content-Emotion Disentanglement

    Authors: Radek Daněček, Kiran Chhatre, Shashank Tripathi, Yandong Wen, Michael J. Black, Timo Bolkart

    Abstract: To be widely adopted, 3D facial avatars must be animated easily, realistically, and directly from speech signals. While the best recent methods generate 3D animations that are synchronized with the input audio, they largely ignore the impact of emotions on facial expressions. Realistic facial animation requires lip-sync together with the natural expression of emotion. To that end, we propose EMOTE… ▽ More

    Submitted 26 September, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: SIGGRAPH Asia 2023 Conference Paper

  6. arXiv:2306.07437  [pdf, other

    cs.CV

    Instant Multi-View Head Capture through Learnable Registration

    Authors: Timo Bolkart, Tianye Li, Michael J. Black

    Abstract: Existing methods for capturing datasets of 3D heads in dense semantic correspondence are slow, and commonly address the problem in two separate steps; multi-view stereo (MVS) reconstruction followed by non-rigid registration. To simplify this process, we introduce TEMPEH (Towards Estimation of 3D Meshes from Performances of Expressive Heads) to directly infer 3D heads in dense correspondence from… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: Conference on Computer Vision and Pattern Recognition (CVPR) 2023

  7. arXiv:2212.04420  [pdf, other

    cs.CV cs.GR

    Generating Holistic 3D Human Motion from Speech

    Authors: Hongwei Yi, Hualin Liang, Yifei Liu, Qiong Cao, Yandong Wen, Timo Bolkart, Dacheng Tao, Michael J. Black

    Abstract: This work addresses the problem of generating 3D holistic body motions from human speech. Given a speech recording, we synthesize sequences of 3D body poses, hand gestures, and facial expressions that are realistic and diverse. To achieve this, we first build a high-quality dataset of 3D holistic body meshes with synchronous speech. We then define a novel speech-to-motion generation framework in w… ▽ More

    Submitted 17 June, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: Project Webpage: https://talkshow.is.tue.mpg.de; CVPR2023

  8. arXiv:2211.12499  [pdf, other

    cs.CV

    Instant Volumetric Head Avatars

    Authors: Wojciech Zielonka, Timo Bolkart, Justus Thies

    Abstract: We present Instant Volumetric Head Avatars (INSTA), a novel approach for reconstructing photo-realistic digital avatars instantaneously. INSTA models a dynamic neural radiance field based on neural graphics primitives embedded around a parametric face model. Our pipeline is trained on a single monocular RGB portrait video that observes the subject under different expressions and views. While state… ▽ More

    Submitted 23 March, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: Website: https://zielon.github.io/insta/ Video: https://youtu.be/HOgaeWTih7Q Accepted to CVPR2023

  9. arXiv:2210.13861  [pdf, other

    cs.CV

    SUPR: A Sparse Unified Part-Based Human Representation

    Authors: Ahmed A. A. Osman, Timo Bolkart, Dimitrios Tzionas, Michael J. Black

    Abstract: Statistical 3D shape models of the head, hands, and fullbody are widely used in computer vision and graphics. Despite their wide use, we show that existing models of the head and hands fail to capture the full range of motion for these parts. Moreover, existing work largely ignores the feet, which are crucial for modeling human movement and have applications in biomechanics, animation, and the foo… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: Accepted in ECCV 2022

  10. arXiv:2210.05667  [pdf, other

    cs.CV cs.AI cs.LG

    Human Body Measurement Estimation with Adversarial Augmentation

    Authors: Nataniel Ruiz, Miriam Bellver, Timo Bolkart, Ambuj Arora, Ming C. Lin, Javier Romero, Raja Bala

    Abstract: We present a Body Measurement network (BMnet) for estimating 3D anthropomorphic measurements of the human body shape from silhouette images. Training of BMnet is performed on data from real human subjects, and augmented with a novel adversarial body simulator (ABS) that finds and synthesizes challenging body shapes. ABS is based on the skinned multiperson linear (SMPL) body model, and aims to maxi… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: Published at the International Conference on 3D Vision (3DV) 2022

  11. arXiv:2210.01868  [pdf, other

    cs.CV cs.GR

    Capturing and Animation of Body and Clothing from Monocular Video

    Authors: Yao Feng, **long Yang, Marc Pollefeys, Michael J. Black, Timo Bolkart

    Abstract: While recent work has shown progress on extracting clothed 3D human avatars from a single image, video, or a set of 3D scans, several limitations remain. Most methods use a holistic representation to jointly model the body and clothing, which means that the clothing and body cannot be separated for applications like virtual try-on. Other methods separately model the body and clothing, but they req… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

    Comments: 7 pages main paper, 2 pages supp. mat

  12. arXiv:2205.03962  [pdf, other

    cs.CV

    Towards Racially Unbiased Skin Tone Estimation via Scene Disambiguation

    Authors: Haiwen Feng, Timo Bolkart, Joachim Tesch, Michael J. Black, Victoria Abrevaya

    Abstract: Virtual facial avatars will play an increasingly important role in immersive communication, games and the metaverse, and it is therefore critical that they be inclusive. This requires accurate recovery of the appearance, represented by albedo, regardless of age, sex, or ethnicity. While significant progress has been made on estimating 3D facial geometry, albedo estimation has received less attenti… ▽ More

    Submitted 23 July, 2022; v1 submitted 8 May, 2022; originally announced May 2022.

    Comments: Camera-Ready version, accepted at ECCV2022

  13. arXiv:2204.11312  [pdf, other

    cs.CV

    EMOCA: Emotion Driven Monocular Face Capture and Animation

    Authors: Radek Danecek, Michael J. Black, Timo Bolkart

    Abstract: As 3D facial avatars become more widely used for communication, it is critical that they faithfully convey emotion. Unfortunately, the best recent methods that regress parametric 3D face models from monocular images are unable to capture the full spectrum of facial expression, such as subtle or extreme emotions. We find the standard reconstruction metrics used for training (landmark reprojection e… ▽ More

    Submitted 24 April, 2022; originally announced April 2022.

    Comments: Conference on Computer Vision and Pattern Recognition (CVPR) 2022

  14. arXiv:2204.06607  [pdf, other

    cs.CV

    Towards Metrical Reconstruction of Human Faces

    Authors: Wojciech Zielonka, Timo Bolkart, Justus Thies

    Abstract: Face reconstruction and tracking is a building block of numerous applications in AR/VR, human-machine interaction, as well as medical applications. Most of these applications rely on a metrically correct prediction of the shape, especially, when the reconstructed subject is put into a metrical context (i.e., when there is a reference object of known size). A metrical reconstruction is also needed… ▽ More

    Submitted 19 October, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: Video: https://youtu.be/vzzEbvv08VA Website: https://zielon.github.io/mica/ Accepted to ECCV 2022

  15. arXiv:2110.05458  [pdf, other

    cs.CV

    Learning Realistic Human Reposing using Cyclic Self-Supervision with 3D Shape, Pose, and Appearance Consistency

    Authors: Soubhik Sanyal, Alex Vorobiov, Timo Bolkart, Matthew Loper, Betty Mohler, Larry Davis, Javier Romero, Michael J. Black

    Abstract: Synthesizing images of a person in novel poses from a single image is a highly ambiguous task. Most existing approaches require paired training images; i.e. images of the same person with the same clothing in different poses. However, obtaining sufficiently large datasets with paired data is challenging and costly. Previous methods that forego paired supervision lack realism. We propose a self-sup… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: International Conference on Computer Vision (ICCV)

  16. arXiv:2110.02948  [pdf, other

    cs.CV

    Topologically Consistent Multi-View Face Inference Using Volumetric Sampling

    Authors: Tianye Li, Shichen Liu, Timo Bolkart, Jiayi Liu, Hao Li, Yajie Zhao

    Abstract: High-fidelity face digitization solutions often combine multi-view stereo (MVS) techniques for 3D reconstruction and a non-rigid registration step to establish dense correspondence across identities and expressions. A common problem is the need for manual clean-up after the MVS step, as 3D scans are typically affected by noise and outliers and contain hairy surface regions that need to be cleaned… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

    Comments: International Conference on Computer Vision (ICCV)

  17. arXiv:2105.05301  [pdf, other

    cs.CV

    Collaborative Regression of Expressive Bodies using Moderation

    Authors: Yao Feng, Vasileios Choutas, Timo Bolkart, Dimitrios Tzionas, Michael J. Black

    Abstract: Recovering expressive humans from images is essential for understanding human behavior. Methods that estimate 3D bodies, faces, or hands have progressed significantly, yet separately. Face methods recover accurate 3D shape and geometric details, but need a tight crop and struggle with extreme views and low resolution. Whole-body methods are robust to a wide range of poses and resolutions, but prov… ▽ More

    Submitted 15 October, 2021; v1 submitted 11 May, 2021; originally announced May 2021.

    Comments: 21 pages. The first two authors contributed equally to this work

  18. arXiv:2012.04012  [pdf, other

    cs.CV

    Learning an Animatable Detailed 3D Face Model from In-The-Wild Images

    Authors: Yao Feng, Haiwen Feng, Michael J. Black, Timo Bolkart

    Abstract: While current monocular 3D face reconstruction methods can recover fine geometric details, they suffer several limitations. Some methods produce faces that cannot be realistically animated because they do not model how wrinkles vary with expression. Other methods are trained on high-quality face scans and do not generalize well to in-the-wild images. We present the first approach that regresses 3D… ▽ More

    Submitted 2 June, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

    Comments: SIGGRAPH 2021

    Journal ref: ACM Transactions on Graphics (ToG), Vol. 40, No. 4, Article 88. Publication date: August 2021

  19. arXiv:2009.00149  [pdf, other

    cs.CV cs.AI cs.GR cs.LG stat.AP

    GIF: Generative Interpretable Faces

    Authors: Partha Ghosh, Pravir Singh Gupta, Roy Uziel, Anurag Ranjan, Michael Black, Timo Bolkart

    Abstract: Photo-realistic visualization and animation of expressive human faces have been a long standing challenge. 3D face modeling methods provide parametric control but generates unrealistic images, on the other hand, generative 2D models like GANs (Generative Adversarial Networks) output photo-realistic face images, but lack explicit control. Recent methods gain partial control, either by attempting to… ▽ More

    Submitted 25 November, 2020; v1 submitted 31 August, 2020; originally announced September 2020.

    Comments: International Conference on 3D Vision (3DV) 2020

  20. arXiv:2008.09062  [pdf, other

    cs.CV cs.GR

    Monocular Expressive Body Regression through Body-Driven Attention

    Authors: Vasileios Choutas, Georgios Pavlakos, Timo Bolkart, Dimitrios Tzionas, Michael J. Black

    Abstract: To understand how people look, interact, or perform tasks, we need to quickly and accurately capture their 3D body, face, and hands together from an RGB image. Most existing methods focus only on parts of the body. A few recent approaches reconstruct full expressive 3D humans from images using 3D body models that include the face and hands. These methods are optimization-based and thus slow, prone… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

    Comments: Accepted in ECCV'20. Project page: http://expose.is.tue.mpg.de

  21. STAR: Sparse Trained Articulated Human Body Regressor

    Authors: Ahmed A. A. Osman, Timo Bolkart, Michael J. Black

    Abstract: The SMPL body model is widely used for the estimation, synthesis, and analysis of 3D human pose and shape. While popular, we show that SMPL has several limitations and introduce STAR, which is quantitatively and qualitatively superior to SMPL. First, SMPL has a huge number of parameters resulting from its use of global blend shapes. These dense pose-corrective offsets relate every vertex on the me… ▽ More

    Submitted 19 August, 2020; originally announced August 2020.

    Comments: ECCV 2020

  22. arXiv:1909.01815  [pdf, other

    cs.CV cs.GR cs.LG

    3D Morphable Face Models -- Past, Present and Future

    Authors: Bernhard Egger, William A. P. Smith, Ayush Tewari, Stefanie Wuhrer, Michael Zollhoefer, Thabo Beeler, Florian Bernard, Timo Bolkart, Adam Kortylewski, Sami Romdhani, Christian Theobalt, Volker Blanz, Thomas Vetter

    Abstract: In this paper, we provide a detailed survey of 3D Morphable Face Models over the 20 years since they were first proposed. The challenges in building and applying these models, namely capture, modeling, image formation, and image analysis, are still active research topics, and we review the state-of-the-art in each of these areas. We also look ahead, identifying unsolved challenges, proposing direc… ▽ More

    Submitted 16 April, 2020; v1 submitted 3 September, 2019; originally announced September 2019.

    Comments: ACM Transactions on Graphics (TOG)

  23. arXiv:1905.06817  [pdf, other

    cs.CV

    Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision

    Authors: Soubhik Sanyal, Timo Bolkart, Haiwen Feng, Michael J. Black

    Abstract: The estimation of 3D face shape from a single image must be robust to variations in lighting, head pose, expression, facial hair, makeup, and occlusions. Robustness requires a large training set of in-the-wild images, which by construction, lack ground truth 3D shape. To train a network without any 2D-to-3D supervision, we present RingNet, which learns to compute 3D face shape from a single image.… ▽ More

    Submitted 16 May, 2019; originally announced May 2019.

    Comments: To appear in CVPR 2019

  24. arXiv:1905.03079  [pdf, other

    cs.CV

    Capture, Learning, and Synthesis of 3D Speaking Styles

    Authors: Daniel Cudeiro, Timo Bolkart, Cassidy Laidlaw, Anurag Ranjan, Michael J. Black

    Abstract: Audio-driven 3D facial animation has been widely explored, but achieving realistic, human-like performance is still unsolved. This is due to the lack of available 3D datasets, models, and standard evaluation metrics. To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers. We then train a neural network on… ▽ More

    Submitted 8 May, 2019; originally announced May 2019.

    Comments: To appear in CVPR 2019

  25. arXiv:1904.05866  [pdf, other

    cs.CV

    Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

    Authors: Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, Michael J. Black

    Abstract: To facilitate the analysis of human actions, interactions and emotions, we compute a 3D model of human body pose, hand pose, and facial expression from a single monocular image. To achieve this, we use thousands of 3D scans to train a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with fully articulated hands and an expressive face. Learning to regress the parameters of SMPL-X… ▽ More

    Submitted 11 April, 2019; originally announced April 2019.

    Comments: To appear in CVPR 2019

  26. arXiv:1807.10267  [pdf, other

    cs.CV

    Generating 3D faces using Convolutional Mesh Autoencoders

    Authors: Anurag Ranjan, Timo Bolkart, Soubhik Sanyal, Michael J. Black

    Abstract: Learned 3D representations of human faces are useful for computer vision problems such as 3D face tracking and reconstruction from images, as well as graphics applications such as character generation and animation. Traditional models learn a latent representation of a face using linear subspaces or higher-order tensor generalizations. Due to this linearity, they can not capture extreme deformatio… ▽ More

    Submitted 31 July, 2018; v1 submitted 26 July, 2018; originally announced July 2018.

    Journal ref: European Conference on Computer Vision 2018

  27. arXiv:1602.07679  [pdf, other

    cs.CV

    A statistical shape space model of the palate surface trained on 3D MRI scans of the vocal tract

    Authors: Alexander Hewer, Ingmar Steiner, Timo Bolkart, Stefanie Wuhrer, Korin Richmond

    Abstract: We describe a minimally-supervised method for computing a statistical shape space model of the palate surface. The model is created from a corpus of volumetric magnetic resonance imaging (MRI) scans collected from 12 speakers. We extract a 3D mesh of the palate from each speaker, then train the model using principal component analysis (PCA). The palate model is then tested using 3D MRI from anothe… ▽ More

    Submitted 4 September, 2015; originally announced February 2016.

    Comments: Proceedings of the 18th International Congress of Phonetic Sciences, Aug 2015, Glasgow, United Kingdom. 2015, http://www.icphs2015.info/

  28. Fitting a 3D Morphable Model to Edges: A Comparison Between Hard and Soft Correspondences

    Authors: Anil Bas, William A. P. Smith, Timo Bolkart, Stefanie Wuhrer

    Abstract: We propose a fully automatic method for fitting a 3D morphable model to single face images in arbitrary pose and lighting. Our approach relies on geometric features (edges and landmarks) and, inspired by the iterated closest point algorithm, is based on computing hard correspondences between model vertices and edge pixels. We demonstrate that this is superior to previous work that uses soft corres… ▽ More

    Submitted 3 October, 2016; v1 submitted 2 February, 2016; originally announced February 2016.

    Comments: To appear in ACCV 2016 Workshop on Facial Informatics

  29. arXiv:1401.2818  [pdf, other

    cs.CV cs.GR

    Multilinear Wavelets: A Statistical Shape Space for Human Faces

    Authors: Alan Brunton, Timo Bolkart, Stefanie Wuhrer

    Abstract: We present a statistical model for $3$D human faces in varying expression, which decomposes the surface of the face using a wavelet transform, and learns many localized, decorrelated multilinear models on the resulting coefficients. Using this model we are able to reconstruct faces from noisy and occluded $3$D face scans, and facial motion sequences. Accurate reconstruction of face shape is import… ▽ More

    Submitted 1 July, 2014; v1 submitted 13 January, 2014; originally announced January 2014.

    Comments: 10 pages, 7 figures; accepted to ECCV 2014

  30. Review of Statistical Shape Spaces for 3D Data with Comparative Analysis for Human Faces

    Authors: Alan Brunton, Augusto Salazar, Timo Bolkart, Stefanie Wuhrer

    Abstract: With systems for acquiring 3D surface data being evermore commonplace, it has become important to reliably extract specific shapes from the acquired data. In the presence of noise and occlusions, this can be done through the use of statistical shape models, which are learned from databases of clean examples of the shape in question. In this paper, we review, analyze and compare different statistic… ▽ More

    Submitted 4 May, 2014; v1 submitted 28 September, 2012; originally announced September 2012.

    Comments: revised literature review, improved experiments, statistical models and code published

    Journal ref: Computer Vision and Image Understanding, 128, pp. 1-17, 2014