Search | arXiv e-print repository

Preface: A Data-driven Volumetric Prior for Few-shot Ultra High-resolution Face Synthesis

Authors: Marcel C. Bühler, Kripasindhu Sarkar, Tanmay Shah, Gengyan Li, Daoye Wang, Leonhard Helminger, Sergio Orts-Escolano, Dmitry Lagun, Otmar Hilliges, Thabo Beeler, Abhimitra Meka

Abstract: NeRFs have enabled highly realistic synthesis of human faces including complex appearance and reflectance effects of hair and skin. These methods typically require a large number of multi-view input images, making the process hardware intensive and cumbersome, limiting applicability to unconstrained settings. We propose a novel volumetric human face prior that enables the synthesis of ultra high-r… ▽ More NeRFs have enabled highly realistic synthesis of human faces including complex appearance and reflectance effects of hair and skin. These methods typically require a large number of multi-view input images, making the process hardware intensive and cumbersome, limiting applicability to unconstrained settings. We propose a novel volumetric human face prior that enables the synthesis of ultra high-resolution novel views of subjects that are not part of the prior's training distribution. This prior model consists of an identity-conditioned NeRF, trained on a dataset of low-resolution multi-view images of diverse humans with known camera calibration. A simple sparse landmark-based 3D alignment of the training dataset allows our model to learn a smooth latent space of geometry and appearance despite a limited number of training identities. A high-quality volumetric representation of a novel subject can be obtained by model fitting to 2 or 3 camera views of arbitrary resolution. Importantly, our method requires as few as two views of casually captured images as input at inference time. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

arXiv:2206.08428 [pdf, other]

doi 10.1145/3528223.3530130

EyeNeRF: A Hybrid Representation for Photorealistic Synthesis, Animation and Relighting of Human Eyes

Authors: Gengyan Li, Abhimitra Meka, Franziska Müller, Marcel C. Bühler, Otmar Hilliges, Thabo Beeler

Abstract: A unique challenge in creating high-quality animatable and relightable 3D avatars of people is modeling human eyes. The challenge of synthesizing eyes is multifold as it requires 1) appropriate representations for the various components of the eye and the periocular region for coherent viewpoint synthesis, capable of representing diffuse, refractive and highly reflective surfaces, 2) disentangling… ▽ More A unique challenge in creating high-quality animatable and relightable 3D avatars of people is modeling human eyes. The challenge of synthesizing eyes is multifold as it requires 1) appropriate representations for the various components of the eye and the periocular region for coherent viewpoint synthesis, capable of representing diffuse, refractive and highly reflective surfaces, 2) disentangling skin and eye appearance from environmental illumination such that it may be rendered under novel lighting conditions, and 3) capturing eyeball motion and the deformation of the surrounding skin to enable re-gazing. These challenges have traditionally necessitated the use of expensive and cumbersome capture setups to obtain high-quality results, and even then, modeling of the eye region holistically has remained elusive. We present a novel geometry and appearance representation that enables high-fidelity capture and photorealistic animation, view synthesis and relighting of the eye region using only a sparse set of lights and cameras. Our hybrid representation combines an explicit parametric surface model for the eyeball with implicit deformable volumetric representations for the periocular region and the interior of the eye. This novel hybrid model has been designed to address the various parts of that challenging facial area - the explicit eyeball surface allows modeling refraction and high-frequency specular reflection at the cornea, whereas the implicit representation is well suited to model lower-frequency skin reflection via spherical harmonics and can represent non-surface structures such as hair or diffuse volumetric bodies, both of which are a challenge for explicit surface models. We show that for high-resolution close-ups of the eye, our model can synthesize high-fidelity animated gaze from novel views under unseen illumination conditions. △ Less

Submitted 12 July, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

Comments: 16 pages, 16 figures, 1 table, to be published in ACM Transactions on Graphics (TOG) (Volume: 41, Issue: 4), 2022

ACM Class: I.4.5; I.3

arXiv:2112.07471 [pdf, other]

I M Avatar: Implicit Morphable Head Avatars from Videos

Authors: Yufeng Zheng, Victoria Fernández Abrevaya, Marcel C. Bühler, Xu Chen, Michael J. Black, Otmar Hilliges

Abstract: Traditional 3D morphable face models (3DMMs) provide fine-grained control over expression but cannot easily capture geometric and appearance details. Neural volumetric representations approach photorealism but are hard to animate and do not generalize well to unseen expressions. To tackle this problem, we propose IMavatar (Implicit Morphable avatar), a novel method for learning implicit head avata… ▽ More Traditional 3D morphable face models (3DMMs) provide fine-grained control over expression but cannot easily capture geometric and appearance details. Neural volumetric representations approach photorealism but are hard to animate and do not generalize well to unseen expressions. To tackle this problem, we propose IMavatar (Implicit Morphable avatar), a novel method for learning implicit head avatars from monocular videos. Inspired by the fine-grained control mechanisms afforded by conventional 3DMMs, we represent the expression- and pose- related deformations via learned blendshapes and skinning fields. These attributes are pose-independent and can be used to morph the canonical geometry and texture fields given novel expression and pose parameters. We employ ray marching and iterative root-finding to locate the canonical surface intersection for each pixel. A key contribution is our novel analytical gradient formulation that enables end-to-end training of IMavatars from videos. We show quantitatively and qualitatively that our method improves geometry and covers a more complete expression space compared to state-of-the-art methods. △ Less

Submitted 4 November, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

Comments: Accepted at CVPR 2022 as an oral presentation. Project page https://ait.ethz.ch/projects/2022/IMavatar/ ; Github page: https://github.com/zhengyuf/IMavatar

arXiv:2109.01355 [pdf, other]

Theory of Mind Based Assistive Communication in Complex Human Robot Cooperation

Authors: Moritz C. Buehler, Jürgen Adamy, Thomas H. Weisswange

Abstract: When cooperating with a human, a robot should not only care about its environment and task but also develop an understanding of the partner's reasoning. To support its human partner in complex tasks, the robot can share information that it knows. However simply communicating everything will annoy and distract humans since they might already be aware of and not all information is relevant in the cu… ▽ More When cooperating with a human, a robot should not only care about its environment and task but also develop an understanding of the partner's reasoning. To support its human partner in complex tasks, the robot can share information that it knows. However simply communicating everything will annoy and distract humans since they might already be aware of and not all information is relevant in the current situation. The questions when and what type of information the human needs, are addressed through the concept of Theory of Mind based Communication which selects information sharing actions based on evaluation of relevance and an estimation of human beliefs. We integrate this into a communication assistant to support humans in a cooperative setting and evaluate performance benefits. We designed a human robot Sushi making task that is challenging for the human and generates different situations where humans are unaware and communication could be beneficial. We evaluate the influence of the human centric communication concept on performance with a user study. Compared to the condition without information exchange, assisted participants can recover from unawareness much earlier. The approach respects the costs of communication and balances interruptions better than other approaches. By providing information adapted to specific situations, the robot does not instruct but enable the human to make good decision. △ Less

Submitted 3 September, 2021; originally announced September 2021.

Comments: 16 pages, 6 figures

arXiv:2104.05988 [pdf, other]

VariTex: Variational Neural Face Textures

Authors: Marcel C. Bühler, Abhimitra Meka, Gengyan Li, Thabo Beeler, Otmar Hilliges

Abstract: Deep generative models can synthesize photorealistic images of human faces with novel identities. However, a key challenge to the wide applicability of such techniques is to provide independent control over semantically meaningful parameters: appearance, head pose, face shape, and facial expressions. In this paper, we propose VariTex - to the best of our knowledge the first method that learns a va… ▽ More Deep generative models can synthesize photorealistic images of human faces with novel identities. However, a key challenge to the wide applicability of such techniques is to provide independent control over semantically meaningful parameters: appearance, head pose, face shape, and facial expressions. In this paper, we propose VariTex - to the best of our knowledge the first method that learns a variational latent feature space of neural face textures, which allows sampling of novel identities. We combine this generative model with a parametric face model and gain explicit control over head pose and facial expressions. To generate complete images of human heads, we propose an additive decoder that adds plausible details such as hair. A novel training scheme enforces a pose-independent latent space and in consequence, allows learning a one-to-many map** between latent codes and pose-conditioned exterior regions. The resulting method can generate geometrically consistent images of novel identities under fine-grained control over head pose, face shape, and facial expressions. This facilitates a broad range of downstream tasks, like sampling novel identities, changing the head pose, expression transfer, and more. Code and models are available for research on https://mcbuehler.github.io/VariTex. △ Less

Submitted 18 August, 2021; v1 submitted 13 April, 2021; originally announced April 2021.

Comments: In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021

arXiv:2004.04433 [pdf, other]

DeepSEE: Deep Disentangled Semantic Explorative Extreme Super-Resolution

Authors: Marcel C. Bühler, Andrés Romero, Radu Timofte

Abstract: Super-resolution (SR) is by definition ill-posed. There are infinitely many plausible high-resolution variants for a given low-resolution natural image. Most of the current literature aims at a single deterministic solution of either high reconstruction fidelity or photo-realistic perceptual quality. In this work, we propose an explorative facial super-resolution framework, DeepSEE, for Deep disen… ▽ More Super-resolution (SR) is by definition ill-posed. There are infinitely many plausible high-resolution variants for a given low-resolution natural image. Most of the current literature aims at a single deterministic solution of either high reconstruction fidelity or photo-realistic perceptual quality. In this work, we propose an explorative facial super-resolution framework, DeepSEE, for Deep disentangled Semantic Explorative Extreme super-resolution. To the best of our knowledge, DeepSEE is the first method to leverage semantic maps for explorative super-resolution. In particular, it provides control of the semantic regions, their disentangled appearance and it allows a broad range of image manipulations. We validate DeepSEE on faces, for up to 32x magnification and exploration of the space of super-resolution. Our code and models are available at: https://mcbuehler.github.io/DeepSEE/ △ Less

Submitted 2 October, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

Comments: 19 pages. Supplementary material is available on the project page. Accepted for oral presentation at the 15th Asian Conference on Computer Vision (ACCV) 2020

Showing 1–6 of 6 results for author: Bühler, M C