Search | arXiv e-print repository

Degrees of Freedom Matter: Inferring Dynamics from Point Trajectories

Authors: Yan Zhang, Sergey Prokudin, Marko Mihajlovic, Qianli Ma, Siyu Tang

Abstract: Understanding the dynamics of generic 3D scenes is fundamentally challenging in computer vision, essential in enhancing applications related to scene reconstruction, motion tracking, and avatar creation. In this work, we address the task as the problem of inferring dense, long-range motion of 3D points. By observing a set of point trajectories, we aim to learn an implicit motion field parameterize… ▽ More Understanding the dynamics of generic 3D scenes is fundamentally challenging in computer vision, essential in enhancing applications related to scene reconstruction, motion tracking, and avatar creation. In this work, we address the task as the problem of inferring dense, long-range motion of 3D points. By observing a set of point trajectories, we aim to learn an implicit motion field parameterized by a neural network to predict the movement of novel points within the same domain, without relying on any data-driven or scene-specific priors. To achieve this, our approach builds upon the recently introduced dynamic point field model that learns smooth deformation fields between the canonical frame and individual observation frames. However, temporal consistency between consecutive frames is neglected, and the number of required parameters increases linearly with the sequence length due to per-frame modeling. To address these shortcomings, we exploit the intrinsic regularization provided by SIREN, and modify the input layer to produce a spatiotemporally smooth motion field. Additionally, we analyze the motion field Jacobian matrix, and discover that the motion degrees of freedom (DOFs) in an infinitesimal area around a point and the network hidden variables have different behaviors to affect the model's representational power. This enables us to improve the model representation capability while retaining the model compactness. Furthermore, to reduce the risk of overfitting, we introduce a regularization term based on the assumption of piece-wise motion smoothness. Our experiments assess the model's performance in predicting unseen point trajectories and its application in temporal mesh alignment with guidance. The results demonstrate its superiority and effectiveness. The code and data for the project are publicly available: \url{https://yz-cnsdqz.github.io/eigenmotion/DOMA/} △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: cvpr24 post camera ready

arXiv:2401.04728 [pdf, other]

Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation

Authors: Xiyi Chen, Marko Mihajlovic, Shaofei Wang, Sergey Prokudin, Siyu Tang

Abstract: Recent advances in generative diffusion models have enabled the previously unfeasible capability of generating 3D assets from a single input image or a text prompt. In this work, we aim to enhance the quality and functionality of these models for the task of creating controllable, photorealistic human avatars. We achieve this by integrating a 3D morphable model into the state-of-the-art multi-view… ▽ More Recent advances in generative diffusion models have enabled the previously unfeasible capability of generating 3D assets from a single input image or a text prompt. In this work, we aim to enhance the quality and functionality of these models for the task of creating controllable, photorealistic human avatars. We achieve this by integrating a 3D morphable model into the state-of-the-art multi-view-consistent diffusion approach. We demonstrate that accurate conditioning of a generative pipeline on the articulated 3D model enhances the baseline model performance on the task of novel view synthesis from a single image. More importantly, this integration facilitates a seamless and accurate incorporation of facial expression and body pose control into the generation process. To the best of our knowledge, our proposed framework is the first diffusion model to enable the creation of fully 3D-consistent, animatable, and photorealistic human avatars from a single image of an unseen subject; extensive quantitative and qualitative evaluations demonstrate the advantages of our approach over existing state-of-the-art avatar creation models on both novel view and novel expression synthesis tasks. The code for our project is publicly available. △ Less

Submitted 2 April, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

Comments: [CVPR 2024] Project page: https://xiyichen.github.io/morphablediffusion/

arXiv:2312.09228 [pdf, other]

3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting

Authors: Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, Siyu Tang

Abstract: We introduce an approach that creates animatable human avatars from monocular videos using 3D Gaussian Splatting (3DGS). Existing methods based on neural radiance fields (NeRFs) achieve high-quality novel-view/novel-pose image synthesis but often require days of training, and are extremely slow at inference time. Recently, the community has explored fast grid structures for efficient training of c… ▽ More We introduce an approach that creates animatable human avatars from monocular videos using 3D Gaussian Splatting (3DGS). Existing methods based on neural radiance fields (NeRFs) achieve high-quality novel-view/novel-pose image synthesis but often require days of training, and are extremely slow at inference time. Recently, the community has explored fast grid structures for efficient training of clothed avatars. Albeit being extremely fast at training, these methods can barely achieve an interactive rendering frame rate with around 15 FPS. In this paper, we use 3D Gaussian Splatting and learn a non-rigid deformation network to reconstruct animatable clothed human avatars that can be trained within 30 minutes and rendered at real-time frame rates (50+ FPS). Given the explicit nature of our representation, we further introduce as-isometric-as-possible regularizations on both the Gaussian mean vectors and the covariance matrices, enhancing the generalization of our model on highly articulated unseen poses. Experimental results show that our method achieves comparable and even better performance compared to state-of-the-art approaches on animatable avatar creation from a monocular input, while being 400x and 250x faster in training and inference, respectively. △ Less

Submitted 4 April, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Project page: https://neuralbodies.github.io/3DGS-Avatar

arXiv:2309.03160 [pdf, other]

ResFields: Residual Neural Fields for Spatiotemporal Signals

Authors: Marko Mihajlovic, Sergey Prokudin, Marc Pollefeys, Siyu Tang

Abstract: Neural fields, a category of neural networks trained to represent high-frequency signals, have gained significant attention in recent years due to their impressive performance in modeling complex 3D data, such as signed distance (SDFs) or radiance fields (NeRFs), via a single multi-layer perceptron (MLP). However, despite the power and simplicity of representing signals with an MLP, these methods… ▽ More Neural fields, a category of neural networks trained to represent high-frequency signals, have gained significant attention in recent years due to their impressive performance in modeling complex 3D data, such as signed distance (SDFs) or radiance fields (NeRFs), via a single multi-layer perceptron (MLP). However, despite the power and simplicity of representing signals with an MLP, these methods still face challenges when modeling large and complex temporal signals due to the limited capacity of MLPs. In this paper, we propose an effective approach to address this limitation by incorporating temporal residual layers into neural fields, dubbed ResFields. It is a novel class of networks specifically designed to effectively represent complex temporal signals. We conduct a comprehensive analysis of the properties of ResFields and propose a matrix factorization technique to reduce the number of trainable parameters and enhance generalization capabilities. Importantly, our formulation seamlessly integrates with existing MLP-based neural fields and consistently improves results across various challenging tasks: 2D video approximation, dynamic shape modeling via temporal SDFs, and dynamic NeRF reconstruction. Lastly, we demonstrate the practical utility of ResFields by showcasing its effectiveness in capturing dynamic 3D scenes from sparse RGBD cameras of a lightweight capture system. △ Less

Submitted 11 February, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

Comments: [ICLR 2024 Spotlight] Project and code at: https://markomih.github.io/ResFields/

arXiv:2205.04992 [pdf, other]

KeypointNeRF: Generalizing Image-based Volumetric Avatars using Relative Spatial Encoding of Keypoints

Authors: Marko Mihajlovic, Aayush Bansal, Michael Zollhoefer, Siyu Tang, Shunsuke Saito

Abstract: Image-based volumetric humans using pixel-aligned features promise generalization to unseen poses and identities. Prior work leverages global spatial encodings and multi-view geometric consistency to reduce spatial ambiguity. However, global encodings often suffer from overfitting to the distribution of the training data, and it is difficult to learn multi-view consistent reconstruction from spars… ▽ More Image-based volumetric humans using pixel-aligned features promise generalization to unseen poses and identities. Prior work leverages global spatial encodings and multi-view geometric consistency to reduce spatial ambiguity. However, global encodings often suffer from overfitting to the distribution of the training data, and it is difficult to learn multi-view consistent reconstruction from sparse views. In this work, we investigate common issues with existing spatial encodings and propose a simple yet highly effective approach to modeling high-fidelity volumetric humans from sparse views. One of the key ideas is to encode relative spatial 3D information via sparse 3D keypoints. This approach is robust to the sparsity of viewpoints and cross-dataset domain gap. Our approach outperforms state-of-the-art methods for head reconstruction. On human body reconstruction for unseen subjects, we also achieve performance comparable to prior work that uses a parametric human body model and temporal feature aggregation. Our experiments show that a majority of errors in prior work stem from an inappropriate choice of spatial encoding and thus we suggest a new direction for high-fidelity image-based human modeling. https://markomih.github.io/KeypointNeRF △ Less

Submitted 21 July, 2022; v1 submitted 10 May, 2022; originally announced May 2022.

Comments: To appear at ECCV 2022. The project page is available at https://markomih.github.io/KeypointNeRF

arXiv:2204.06184 [pdf, other]

COAP: Compositional Articulated Occupancy of People

Authors: Marko Mihajlovic, Shunsuke Saito, Aayush Bansal, Michael Zollhoefer, Siyu Tang

Abstract: We present a novel neural implicit representation for articulated human bodies. Compared to explicit template meshes, neural implicit body representations provide an efficient mechanism for modeling interactions with the environment, which is essential for human motion reconstruction and synthesis in 3D scenes. However, existing neural implicit bodies suffer from either poor generalization on high… ▽ More We present a novel neural implicit representation for articulated human bodies. Compared to explicit template meshes, neural implicit body representations provide an efficient mechanism for modeling interactions with the environment, which is essential for human motion reconstruction and synthesis in 3D scenes. However, existing neural implicit bodies suffer from either poor generalization on highly articulated poses or slow inference time. In this work, we observe that prior knowledge about the human body's shape and kinematic structure can be leveraged to improve generalization and efficiency. We decompose the full-body geometry into local body parts and employ a part-aware encoder-decoder architecture to learn neural articulated occupancy that models complex deformations locally. Our local shape encoder represents the body deformation of not only the corresponding body part but also the neighboring body parts. The decoder incorporates the geometric constraints of local body shape which significantly improves pose generalization. We demonstrate that our model is suitable for resolving self-intersections and collisions with 3D environments. Quantitative and qualitative experiments show that our method largely outperforms existing solutions in terms of both efficiency and accuracy. The code and models are available at https://neuralbodies.github.io/COAP/index.html △ Less

Submitted 13 April, 2022; originally announced April 2022.

Comments: To appear at CVPR 2022. The project page is available at https://neuralbodies.github.io/COAP/index.html

arXiv:2106.11944 [pdf, other]

MetaAvatar: Learning Animatable Clothed Human Models from Few Depth Images

Authors: Shaofei Wang, Marko Mihajlovic, Qianli Ma, Andreas Geiger, Siyu Tang

Abstract: In this paper, we aim to create generalizable and controllable neural signed distance fields (SDFs) that represent clothed humans from monocular depth observations. Recent advances in deep learning, especially neural implicit representations, have enabled human shape reconstruction and controllable avatar generation from different sensor inputs. However, to generate realistic cloth deformations fr… ▽ More In this paper, we aim to create generalizable and controllable neural signed distance fields (SDFs) that represent clothed humans from monocular depth observations. Recent advances in deep learning, especially neural implicit representations, have enabled human shape reconstruction and controllable avatar generation from different sensor inputs. However, to generate realistic cloth deformations from novel input poses, watertight meshes or dense full-body scans are usually needed as inputs. Furthermore, due to the difficulty of effectively modeling pose-dependent cloth deformations for diverse body shapes and cloth types, existing approaches resort to per-subject/cloth-type optimization from scratch, which is computationally expensive. In contrast, we propose an approach that can quickly generate realistic clothed human avatars, represented as controllable neural SDFs, given only monocular depth images. We achieve this by using meta-learning to learn an initialization of a hypernetwork that predicts the parameters of neural SDFs. The hypernetwork is conditioned on human poses and represents a clothed neural avatar that deforms non-rigidly according to the input poses. Meanwhile, it is meta-learned to effectively incorporate priors of diverse body shapes and cloth types and thus can be much faster to fine-tune, compared to models trained from scratch. We qualitatively and quantitatively show that our approach outperforms state-of-the-art approaches that require complete meshes as inputs while our approach requires only depth frames as inputs and runs orders of magnitudes faster. Furthermore, we demonstrate that our meta-learned hypernetwork is very robust, being the first to generate avatars with realistic dynamic cloth deformations given as few as 8 monocular depth frames. △ Less

Submitted 20 January, 2022; v1 submitted 22 June, 2021; originally announced June 2021.

Comments: NeurIPS 2021 final camera-ready revision. Project page: https://neuralbodies.github.io/metavatar/

arXiv:2104.06849 [pdf, other]

LEAP: Learning Articulated Occupancy of People

Authors: Marko Mihajlovic, Yan Zhang, Michael J. Black, Siyu Tang

Abstract: Substantial progress has been made on modeling rigid 3D objects using deep implicit representations. Yet, extending these methods to learn neural models of human shape is still in its infancy. Human bodies are complex and the key challenge is to learn a representation that generalizes such that it can express body shape deformations for unseen subjects in unseen, highly-articulated, poses. To addr… ▽ More Substantial progress has been made on modeling rigid 3D objects using deep implicit representations. Yet, extending these methods to learn neural models of human shape is still in its infancy. Human bodies are complex and the key challenge is to learn a representation that generalizes such that it can express body shape deformations for unseen subjects in unseen, highly-articulated, poses. To address this challenge, we introduce LEAP (LEarning Articulated occupancy of People), a novel neural occupancy representation of the human body. Given a set of bone transformations (i.e. joint locations and rotations) and a query point in space, LEAP first maps the query point to a canonical space via learned linear blend skinning (LBS) functions and then efficiently queries the occupancy value via an occupancy network that models accurate identity- and pose-dependent deformations in the canonical space. Experiments show that our canonicalized occupancy estimation with the learned LBS functions greatly improves the generalization capability of the learned occupancy representation across various human shapes and poses, outperforming existing solutions in all settings. △ Less

Submitted 14 April, 2021; originally announced April 2021.

Comments: In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021

arXiv:2012.14240 [pdf, other]

DeepSurfels: Learning Online Appearance Fusion

Authors: Marko Mihajlovic, Silvan Weder, Marc Pollefeys, Martin R. Oswald

Abstract: We present DeepSurfels, a novel hybrid scene representation for geometry and appearance information. DeepSurfels combines explicit and neural building blocks to jointly encode geometry and appearance information. In contrast to established representations, DeepSurfels better represents high-frequency textures, is well-suited for online updates of appearance information, and can be easily combined… ▽ More We present DeepSurfels, a novel hybrid scene representation for geometry and appearance information. DeepSurfels combines explicit and neural building blocks to jointly encode geometry and appearance information. In contrast to established representations, DeepSurfels better represents high-frequency textures, is well-suited for online updates of appearance information, and can be easily combined with machine learning methods. We further present an end-to-end trainable online appearance fusion pipeline that fuses information from RGB images into the proposed scene representation and is trained using self-supervision imposed by the reprojection error with respect to the input images. Our method compares favorably to classical texture map** approaches as well as recent learning-based techniques. Moreover, we demonstrate lower runtime, im-proved generalization capabilities, and better scalability to larger scenes compared to existing methods. △ Less

Submitted 30 May, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

Comments: In Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2021

arXiv:1911.00262 [pdf, other]

Finding the most similar textual documents using Case-Based Reasoning

Authors: Marko Mihajlovic, Ning Xiong

Abstract: In recent years, huge amounts of unstructured textual data on the Internet are a big difficulty for AI algorithms to provide the best recommendations for users and their search queries. Since the Internet became widespread, a lot of research has been done in the field of Natural Language Processing (NLP) and machine learning. Almost every solution transforms documents into Vector Space Models (VSM… ▽ More In recent years, huge amounts of unstructured textual data on the Internet are a big difficulty for AI algorithms to provide the best recommendations for users and their search queries. Since the Internet became widespread, a lot of research has been done in the field of Natural Language Processing (NLP) and machine learning. Almost every solution transforms documents into Vector Space Models (VSM) in order to apply AI algorithms over them. One such approach is based on Case-Based Reasoning (CBR). Therefore, the most important part of those systems is to compute the similarity between numerical data points. In 2016, the new similarity TS-SS metric is proposed, which showed state-of-the-art results in the field of textual mining for unsupervised learning. However, no one before has investigated its performances for supervised learning (classification task). In this work, we devised a CBR system capable of finding the most similar documents for a given query aiming to investigate performances of the new state-of-the-art metric, TS-SS, in addition to the two other geometrical similarity measures --- Euclidean distance and Cosine similarity --- that showed the best predictive results over several benchmark corpora. The results show surprising inappropriateness of TS-SS measure for high dimensional features. △ Less

Submitted 1 November, 2019; originally announced November 2019.

arXiv:cond-mat/0411316 [pdf, ps, other]

doi 10.1103/PhysRevE.72.040801

Interfacial Slip in Sheared Polymer Blends

Authors: Tak Shing Lo, Maja Mihajlovic, Yitzhak Shnidman, Wentao Li, Dilip Gersappe

Abstract: We have developed a dynamic self-consistent field theory, without any adjustable parameters, for unentangled polymer blends under shear. Our model accounts for the interaction between polymers, and enables one to compute the evolution of the local rheology, microstructure and the conformations of the polymer chains under shear self-consistently. We use this model to study the interfacial dynamic… ▽ More We have developed a dynamic self-consistent field theory, without any adjustable parameters, for unentangled polymer blends under shear. Our model accounts for the interaction between polymers, and enables one to compute the evolution of the local rheology, microstructure and the conformations of the polymer chains under shear self-consistently. We use this model to study the interfacial dynamics in sheared polymer blends and make a quantitative comparison between this model and Molecular Dynamics simulations. We find good agreement between the two methods. △ Less

Submitted 11 November, 2004; originally announced November 2004.

arXiv:cond-mat/0411288 [pdf]

A Self-Consistent Field Study of Interfacial Dynamics in Unentangled Homopolymer Fluids in a Sheared Channel

Authors: Maja Mihajlovic, Tak Shing Lo, Yitzhak Shnidman

Abstract: In a preceding paper, we have presented a general lattice formulation of the dynamic self-consistent field (DSCF) theory for inhomogeneous, unentangled homopolymer fluids. Here we apply the DSCF theory to study both transient and steady-state interfacial structure, flow and rheology in a sheared planar channel containing either a one-component melt or a phase-separated, two-component blend. We f… ▽ More In a preceding paper, we have presented a general lattice formulation of the dynamic self-consistent field (DSCF) theory for inhomogeneous, unentangled homopolymer fluids. Here we apply the DSCF theory to study both transient and steady-state interfacial structure, flow and rheology in a sheared planar channel containing either a one-component melt or a phase-separated, two-component blend. We focus here on the case that the solid-liquid and the liquid-liquid interfaces are parallel to the walls of the channel, and assume that the system has translational symmetry within planes parallel to the walls. This symmetry allows us to derive a simplified, quasi-one-dimensional (quasi-1D) version of the DSCF evolution equations for free segment probabilities, momentum densities, and the ideal-chain conformation tensor. Numerical solutions of the quasi-1D DSCF equations are used to study both the transient evolution and the steady-state profiles of composition, density, velocity, chain deformation, stress, viscosity and normal stress within layers across the sheared channel. Good qualitative agreement is obtained with previously observed phenomena. △ Less

Submitted 10 November, 2004; originally announced November 2004.

arXiv:cond-mat/0411287 [pdf]

doi 10.1103/PhysRevE.72.041801

Dynamic Self-Consistent Field Theory for Unentangled Homopolymer Fluids

Authors: Maja Mihajlovic, Tak Shing Lo, Yitzhak Shnidman

Abstract: We present a lattice formulation of a dynamic self-consistent field (DSCF) theory that is capable of resolving interfacial structure, dynamics and rheology in inhomogeneous, compressible melts and blends of unentangled homopolymer chains. The joint probability distribution of all the Kuhn segments in the fluid, interacting with adjacent segments and walls, is approximated by a product of one-bod… ▽ More We present a lattice formulation of a dynamic self-consistent field (DSCF) theory that is capable of resolving interfacial structure, dynamics and rheology in inhomogeneous, compressible melts and blends of unentangled homopolymer chains. The joint probability distribution of all the Kuhn segments in the fluid, interacting with adjacent segments and walls, is approximated by a product of one-body probabilities for free segments interacting solely with an external potential field that is determined self-consistently. The effect of flow on ideal chain conformations is modeled with FENE-P dumbbells, and related to step** probabilities in a random walk. Free segment and step** probabilities generate statistical weights for chain conformations in a self-consistent field, and determine local volume fractions of chain segments. Flux balance across unit lattice cells yields mean-field transport equations for the evolution of free segment probabilities and of momentum densities on the Kuhn length scale. Diffusive and viscous contributions to the fluxes arise from segmental hops modeled as a Markov process, with transition rates reflecting changes in segmental interaction, kinetic energy, and entropic contributions to the free energy under flow. △ Less

Submitted 10 November, 2004; originally announced November 2004.

Showing 1–13 of 13 results for author: Mihajlovic, M