Search | arXiv e-print repository

S2F2: Self-Supervised High Fidelity Face Reconstruction from Monocular Image

Authors: Abdallah Dib, Junghyun Ahn, Cedric Thebault, Philippe-Henri Gosselin, Louis Chevallier

Abstract: We present a novel face reconstruction method capable of reconstructing detailed face geometry, spatially varying face reflectance from a single monocular image. We build our work upon the recent advances of DNN-based auto-encoders with differentiable ray tracing image formation, trained in self-supervised manner. While providing the advantage of learning-based approaches and real-time reconstruct… ▽ More We present a novel face reconstruction method capable of reconstructing detailed face geometry, spatially varying face reflectance from a single monocular image. We build our work upon the recent advances of DNN-based auto-encoders with differentiable ray tracing image formation, trained in self-supervised manner. While providing the advantage of learning-based approaches and real-time reconstruction, the latter methods lacked fidelity. In this work, we achieve, for the first time, high fidelity face reconstruction using self-supervised learning only. Our novel coarse-to-fine deep architecture allows us to solve the challenging problem of decoupling face reflectance from geometry using a single image, at high computational speed. Compared to state-of-the-art methods, our method achieves more visually appealing reconstruction. △ Less

Submitted 5 April, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

Comments: 24 Pages, 22 Figures

MSC Class: 68T45; 68T07; 68U10; 68U05 ACM Class: I.4.5; I.4.8; I.4.9; I.3.3; I.2.10

arXiv:2110.02124 [pdf, other]

doi 10.1145/3485441.3485646

FacialFilmroll: High-resolution multi-shot video editing

Authors: Bharath Bhushan Damodaran, Emmanuel Jolly, Gilles Puy, Philippe Henri Gosselin, Cédric Thébault, Junghyun Ahn, Tim Christensen, Paul Ghezzo, Pierre Hellier

Abstract: We present FacialFilmroll, a solution for spatially and temporally consistent editing of faces in one or multiple shots. We build upon unwrap mosaic [Rav-Acha et al. 2008] by specializing it to faces. We leverage recent techniques to fit a 3D face model on monocular videos to (i) improve the quality of the mosaic for edition and (ii) permit the automatic transfer of edits from one shot to other sh… ▽ More We present FacialFilmroll, a solution for spatially and temporally consistent editing of faces in one or multiple shots. We build upon unwrap mosaic [Rav-Acha et al. 2008] by specializing it to faces. We leverage recent techniques to fit a 3D face model on monocular videos to (i) improve the quality of the mosaic for edition and (ii) permit the automatic transfer of edits from one shot to other shots of the same actor. We explain how FacialFilmroll is integrated in post-production facility. Finally, we present video editing results using FacialFilmroll on high resolution videos. △ Less

Submitted 17 November, 2021; v1 submitted 5 October, 2021; originally announced October 2021.

Comments: European Conference on Visual Media Production (CVMP '21)

Journal ref: European Conference on Visual Media Production (CVMP '21), 2021

arXiv:2103.15432 [pdf, other]

Towards High Fidelity Monocular Face Reconstruction with Rich Reflectance using Self-supervised Learning and Ray Tracing

Authors: Abdallah Dib, Cedric Thebault, Junghyun Ahn, Philippe-Henri Gosselin, Christian Theobalt, Louis Chevallier

Abstract: Robust face reconstruction from monocular image in general lighting conditions is challenging. Methods combining deep neural network encoders with differentiable rendering have opened up the path for very fast monocular reconstruction of geometry, lighting and reflectance. They can also be trained in self-supervised manner for increased robustness and better generalization. However, their differen… ▽ More Robust face reconstruction from monocular image in general lighting conditions is challenging. Methods combining deep neural network encoders with differentiable rendering have opened up the path for very fast monocular reconstruction of geometry, lighting and reflectance. They can also be trained in self-supervised manner for increased robustness and better generalization. However, their differentiable rasterization based image formation models, as well as underlying scene parameterization, limit them to Lambertian face reflectance and to poor shape details. More recently, ray tracing was introduced for monocular face reconstruction within a classic optimization-based framework and enables state-of-the art results. However optimization-based approaches are inherently slow and lack robustness. In this paper, we build our work on the aforementioned approaches and propose a new method that greatly improves reconstruction quality and robustness in general scenes. We achieve this by combining a CNN encoder with a differentiable ray tracer, which enables us to base the reconstruction on much more advanced personalized diffuse and specular albedos, a more sophisticated illumination model and a plausible representation of self-shadows. This enables to take a big leap forward in reconstruction quality of shape, appearance and lighting even in scenes with difficult illumination. With consistent face attributes reconstruction, our method leads to practical applications such as relighting and self-shadows removal. Compared to state-of-the-art methods, our results show improved accuracy and validity of the approach. △ Less

Submitted 22 November, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

Comments: International Conference on Computer Vision (ICCV 2021)

arXiv:2101.05356 [pdf, other]

Practical Face Reconstruction via Differentiable Ray Tracing

Authors: Abdallah Dib, Gaurav Bharaj, Junghyun Ahn, Cédric Thébault, Philippe-Henri Gosselin, Marco Romeo, Louis Chevallier

Abstract: We present a differentiable ray-tracing based novel face reconstruction approach where scene attributes - 3D geometry, reflectance (diffuse, specular and roughness), pose, camera parameters, and scene illumination - are estimated from unconstrained monocular images. The proposed method models scene illumination via a novel, parameterized virtual light stage, which in-conjunction with differentiabl… ▽ More We present a differentiable ray-tracing based novel face reconstruction approach where scene attributes - 3D geometry, reflectance (diffuse, specular and roughness), pose, camera parameters, and scene illumination - are estimated from unconstrained monocular images. The proposed method models scene illumination via a novel, parameterized virtual light stage, which in-conjunction with differentiable ray-tracing, introduces a coarse-to-fine optimization formulation for face reconstruction. Our method can not only handle unconstrained illumination and self-shadows conditions, but also estimates diffuse and specular albedos. To estimate the face attributes consistently and with practical semantics, a two-stage optimization strategy systematically uses a subset of parametric attributes, where subsequent attribute estimations factor those previously estimated. For example, self-shadows estimated during the first stage, later prevent its baking into the personalized diffuse and specular albedos in the second stage. We show the efficacy of our approach in several real-world scenarios, where face attributes can be estimated even under extreme illumination conditions. Ablation studies, analyses and comparisons against several recent state-of-the-art methods show improved accuracy and versatility of our approach. With consistent face attributes reconstruction, our method leads to several style -- illumination, albedo, self-shadow -- edit and transfer applications, as discussed in the paper. △ Less

Submitted 13 January, 2021; originally announced January 2021.

Comments: 16 pages, 14 figures

MSC Class: 65D19; 68U05 ACM Class: I.4.5; I.4.8; I.3.7

arXiv:2007.01151 [pdf, other]

JUMPS: Joints Upsampling Method for Pose Sequences

Authors: Lucas Mourot, François Le Clerc, Cédric Thébault, Pierre Hellier

Abstract: Human Pose Estimation is a low-level task useful forsurveillance, human action recognition, and scene understandingat large. It also offers promising perspectives for the animationof synthetic characters. For all these applications, and especiallythe latter, estimating the positions of many joints is desirablefor improved performance and realism. To this purpose, wepropose a novel method called JU… ▽ More Human Pose Estimation is a low-level task useful forsurveillance, human action recognition, and scene understandingat large. It also offers promising perspectives for the animationof synthetic characters. For all these applications, and especiallythe latter, estimating the positions of many joints is desirablefor improved performance and realism. To this purpose, wepropose a novel method called JUMPS for increasing the numberof joints in 2D pose estimates and recovering occluded ormissing joints. We believe this is the first attempt to addressthe issue. We build on a deep generative model that combines aGenerative Adversarial Network (GAN) and an encoder. TheGAN learns the distribution of high-resolution human posesequences, the encoder maps the input low-resolution sequencesto its latent space. Inpainting is obtained by computing the latentrepresentation whose decoding by the GAN generator optimallymatches the joints locations at the input. Post-processing a 2Dpose sequence using our method provides a richer representationof the character motion. We show experimentally that thelocalization accuracy of the additional joints is on average onpar with the original pose estimates. △ Less

Submitted 14 October, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

Comments: 8 pages, 7 figures, 2 tables

arXiv:1910.05200 [pdf, other]

Face Reflectance and Geometry Modeling via Differentiable Ray Tracing

Authors: Abdallah Dib, Gaurav Bharaj, Junghyun Ahn, Cedric Thebault, Philippe-Henri Gosselin, Louis Chevallier

Abstract: We present a novel strategy to automatically reconstruct 3D faces from monocular images with explicitly disentangled facial geometry (pose, identity and expression), reflectance (diffuse and specular albedo), and self-shadows. The scene lights are modeled as a virtual light stage with pre-oriented area lights used in conjunction with differentiable Monte-Carlo ray tracing to optimize the scene and… ▽ More We present a novel strategy to automatically reconstruct 3D faces from monocular images with explicitly disentangled facial geometry (pose, identity and expression), reflectance (diffuse and specular albedo), and self-shadows. The scene lights are modeled as a virtual light stage with pre-oriented area lights used in conjunction with differentiable Monte-Carlo ray tracing to optimize the scene and face parameters. With correctly disentangled self-shadows and specular reflection parameters, we can not only obtain robust facial geometry reconstruction, but also gain explicit control over these parameters, with several practical applications. We can change facial expressions with accurate resultant self-shadows or relight the scene and obtain accurate specular reflection and several other parameter combinations. △ Less

Submitted 3 October, 2019; originally announced October 2019.

Showing 1–6 of 6 results for author: Thebault, C