Skip to main content

Showing 1–50 of 111 results for author: Hilliges, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19811  [pdf, ps, other

    cs.CV

    EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting

    Authors: Daiwei Zhang, Gengyan Li, Jiajie Li, Mickaël Bressieux, Otmar Hilliges, Marc Pollefeys, Luc Van Gool, Xi Wang

    Abstract: Human activities are inherently complex, and even simple household tasks involve numerous object interactions. To better understand these activities and behaviors, it is crucial to model their dynamic interactions with the environment. The recent availability of affordable head-mounted cameras and egocentric data offers a more accessible and efficient means to understand dynamic human-object inter… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.08472  [pdf, other

    cs.LG cs.AI

    RILe: Reinforced Imitation Learning

    Authors: Mert Albaba, Sammy Christen, Christoph Gebhardt, Thomas Langarek, Michael J. Black, Otmar Hilliges

    Abstract: Reinforcement Learning has achieved significant success in generating complex behavior but often requires extensive reward function engineering. Adversarial variants of Imitation Learning and Inverse Reinforcement Learning offer an alternative by learning policies from expert demonstrations via a discriminator. Employing discriminators increases their data- and computational efficiency over the st… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  3. arXiv:2406.01595  [pdf, other

    cs.CV

    MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild

    Authors: Zeren Jiang, Chen Guo, Manuel Kaufmann, Tianjian Jiang, Julien Valentin, Otmar Hilliges, Jie Song

    Abstract: We present MultiPly, a novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. Reconstructing multiple individuals moving and interacting naturally from monocular in-the-wild videos poses a challenging task. Addressing it necessitates precise pixel-level disentanglement of individuals without any prior knowledge about the subjects. Moreover, it requires recovering i… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Project page: https://eth-ait.github.io/MultiPly/

  4. arXiv:2405.14477  [pdf, other

    cs.LG cs.CV

    LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models

    Authors: Seyedmorteza Sadat, Jakob Buhmann, Derek Bradley, Otmar Hilliges, Romann M. Weber

    Abstract: Advances in latent diffusion models (LDMs) have revolutionized high-resolution image generation, but the design space of the autoencoder that is central to these systems remains underexplored. In this paper, we introduce LiteVAE, a family of autoencoders for LDMs that leverage the 2D discrete wavelet transform to enhance scalability and computational efficiency over standard variational autoencode… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  5. ContourCraft: Learning to Resolve Intersections in Neural Multi-Garment Simulations

    Authors: Artur Grigorev, Giorgio Becherini, Michael J. Black, Otmar Hilliges, Bernhard Thomaszewski

    Abstract: Learning-based approaches to cloth simulation have started to show their potential in recent years. However, handling collisions and intersections in neural simulations remains a largely unsolved problem. In this work, we present \moniker{}, a learning-based solution for handling intersections in neural cloth simulations. Unlike conventional approaches that critically rely on intersection-free inp… ▽ More

    Submitted 24 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted for publication by SIGGRAPH 2024, conference track

  6. arXiv:2404.18630  [pdf, other

    cs.CV

    4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations

    Authors: Wenbo Wang, Hsuan-I Ho, Chen Guo, Boxiang Rong, Artur Grigorev, Jie Song, Juan Jose Zarate, Otmar Hilliges

    Abstract: The studies of human clothing for digital avatars have predominantly relied on synthetic datasets. While easy to collect, synthetic data often fall short in realism and fail to capture authentic clothing dynamics. Addressing this gap, we introduce 4D-DRESS, the first real-world 4D dataset advancing human clothing research with its high-quality 4D textured scans and garment meshes. 4D-DRESS capture… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 paper, 21 figures, 9 tables

  7. arXiv:2404.15383  [pdf, other

    cs.CV cs.AI

    WANDR: Intention-guided Human Motion Generation

    Authors: Markos Diomataris, Nikos Athanasiou, Omid Taheri, Xi Wang, Otmar Hilliges, Michael J. Black

    Abstract: Synthesizing natural human motions that enable a 3D human avatar to walk and reach for arbitrary goals in 3D space remains an unsolved problem with many applications. Existing methods (data-driven or using reinforcement learning) are limited in terms of generalization and motion naturalness. A primary obstacle is the scarcity of training data that combines locomotion with goal reaching. To address… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  8. arXiv:2403.19649  [pdf, other

    cs.RO cs.CV

    GraspXL: Generating Gras** Motions for Diverse Objects at Scale

    Authors: Hui Zhang, Sammy Christen, Zicong Fan, Otmar Hilliges, Jie Song

    Abstract: Human hands possess the dexterity to interact with diverse objects such as gras** specific parts of the objects and/or approaching them from desired directions. More importantly, humans can grasp objects of any shape without object-specific skills. Recent works synthesize gras** motions following single objectives such as a desired approach heading direction or a gras** area. Moreover, they… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Project Page: https://eth-ait.github.io/graspxl/

  9. arXiv:2403.16428  [pdf, other

    cs.CV

    Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

    Authors: Zicong Fan, Takehiko Ohkawa, Linlin Yang, Nie Lin, Zhishan Zhou, Shihao Zhou, Jiajun Liang, Zhong Gao, Xuanyang Zhang, Xue Zhang, Fei Li, Liu Zheng, Feng Lu, Karim Abou Zeid, Bastian Leibe, Jeongwan On, Seungryul Baek, Aditya Prakash, Saurabh Gupta, Kun He, Yoichi Sato, Otmar Hilliges, Hyung ** Chang, Angela Yao

    Abstract: We interact with the world with our hands and see it through our own (egocentric) perspective. A holistic 3D understanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation. Accurately reconstructing such interactions in 3D is challenging due to heavy occlusion, viewpoint bias, camera distortion, and motion blur from the… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  10. arXiv:2401.04143  [pdf, other

    cs.CV

    RHOBIN Challenge: Reconstruction of Human Object Interaction

    Authors: Xianghui Xie, Xi Wang, Nikos Athanasiou, Bharat Lal Bhatnagar, Chun-Hao P. Huang, Kaichun Mo, Hao Chen, Xia Jia, Zerui Zhang, Liangxian Cui, Xiao Lin, Bingqiao Qian, Jie Xiao, Wenfei Yang, Hyeong** Nam, Daniel Sungho Jung, Kihoon Kim, Kyoung Mu Lee, Otmar Hilliges, Gerard Pons-Moll

    Abstract: Modeling the interaction between humans and objects has been an emerging research direction in recent years. Capturing human-object interaction is however a very challenging task due to heavy occlusion and complex dynamics, which requires understanding not only 3D human pose, and object pose but also the interaction between them. Reconstruction of 3D humans and objects has been two separate resear… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: 14 pages, 5 tables, 7 figure. Technical report of the CVPR'23 workshop: RHOBIN challenge (https://rhobin-challenge.github.io/)

  11. arXiv:2312.11666  [pdf, other

    cs.CV cs.GR

    HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles

    Authors: Vanessa Sklyarova, Egor Zakharov, Otmar Hilliges, Michael J. Black, Justus Thies

    Abstract: We present HAAR, a new strand-based generative model for 3D human hairstyles. Specifically, based on textual inputs, HAAR produces 3D hairstyles that could be used as production-level assets in modern computer graphics engines. Current AI-based generative models take advantage of powerful 2D priors to reconstruct 3D content in the form of point clouds, meshes, or volumetric functions. However, by… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: For more results please refer to the project page https://haar.is.tue.mpg.de/

  12. arXiv:2312.08558  [pdf, other

    cs.CV

    G-MEMP: Gaze-Enhanced Multimodal Ego-Motion Prediction in Driving

    Authors: M. Eren Akbiyik, Nedko Savov, Danda Pani Paudel, Nikola Popovic, Christian Vater, Otmar Hilliges, Luc Van Gool, Xi Wang

    Abstract: Understanding the decision-making process of drivers is one of the keys to ensuring road safety. While the driver intent and the resulting ego-motion trajectory are valuable in develo** driver-assistance systems, existing methods mostly focus on the motions of other vehicles. In contrast, we focus on inferring the ego trajectory of a driver's vehicle using their gaze data. For this purpose, we f… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  13. arXiv:2311.18448  [pdf, other

    cs.CV

    HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video

    Authors: Zicong Fan, Maria Parelli, Maria Eleni Kadoglou, Muhammed Kocabas, Xu Chen, Michael J. Black, Otmar Hilliges

    Abstract: Since humans interact with diverse objects every day, the holistic 3D capture of these interactions is important to understand and model human behaviour. However, most existing methods for hand-object reconstruction from RGB either assume pre-scanned object templates or heavily rely on limited 3D hand-object data, restricting their ability to scale and generalize to more unconstrained interaction… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  14. arXiv:2311.17944  [pdf, other

    cs.CV

    LALM: Long-Term Action Anticipation with Language Models

    Authors: Sanghwan Kim, Daoji Huang, Yongqin Xian, Otmar Hilliges, Luc Van Gool, Xi Wang

    Abstract: Understanding human activity is a crucial yet intricate task in egocentric vision, a field that focuses on capturing visual perspectives from the camera wearer's viewpoint. While traditional methods heavily rely on representation learning trained on extensive video data, there exists a significant limitation: obtaining effective video representations proves challenging due to the inherent complexi… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  15. arXiv:2311.16854  [pdf, other

    cs.CV

    A Unified Approach for Text- and Image-guided 4D Scene Generation

    Authors: Yufeng Zheng, Xueting Li, Koki Nagano, Sifei Liu, Karsten Kreis, Otmar Hilliges, Shalini De Mello

    Abstract: Large-scale diffusion generative models are greatly simplifying image, video and 3D asset creation from user-provided text prompts and images. However, the challenging problem of text-to-4D dynamic 3D scene generation with diffusion guidance remains largely unexplored. We propose Dream-in-4D, which features a novel two-stage approach for text-to-4D synthesis, leveraging (1) 3D and 2D diffusion gui… ▽ More

    Submitted 7 May, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: Project page: https://research.nvidia.com/labs/nxp/dream-in-4d/

  16. arXiv:2311.15855  [pdf, other

    cs.CV

    SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion

    Authors: Hsuan-I Ho, Jie Song, Otmar Hilliges

    Abstract: A long-standing goal of 3D human reconstruction is to create lifelike and fully detailed 3D humans from single-view images. The main challenge lies in inferring unknown body shapes, appearances, and clothing details in areas not visible in the images. To address this, we propose SiTH, a novel pipeline that uniquely integrates an image-conditioned diffusion model into a 3D mesh reconstruction workf… ▽ More

    Submitted 30 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: 23 pages, 23 figures, CVPR 2024

  17. arXiv:2311.05599  [pdf, other

    cs.RO cs.AI

    SynH2R: Synthesizing Hand-Object Motions for Learning Human-to-Robot Handovers

    Authors: Sammy Christen, Lan Feng, Wei Yang, Yu-Wei Chao, Otmar Hilliges, Jie Song

    Abstract: Vision-based human-to-robot handover is an important and challenging task in human-robot interaction. Recent work has attempted to train robot policies by interacting with dynamic virtual humans in simulated environments, where the policies can later be transferred to the real world. However, a major bottleneck is the reliance on human motion capture data, which is expensive to acquire and difficu… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  18. FLARE: Fast Learning of Animatable and Relightable Mesh Avatars

    Authors: Shrisha Bharadwaj, Yufeng Zheng, Otmar Hilliges, Michael J. Black, Victoria Fernandez-Abrevaya

    Abstract: Our goal is to efficiently learn personalized animatable 3D head avatars from videos that are geometrically accurate, realistic, relightable, and compatible with current rendering systems. While 3D meshes enable efficient processing and are highly portable, they lack realism in terms of shape and appearance. Neural representations, on the other hand, are realistic but lack compatibility and are sl… ▽ More

    Submitted 27 October, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: 15 pages, Accepted: ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia), 2023

    Journal ref: Volume 42, article number 204, year 2023

  19. arXiv:2310.17347  [pdf, other

    cs.CV

    CADS: Unleashing the Diversity of Diffusion Models through Condition-Annealed Sampling

    Authors: Seyedmorteza Sadat, Jakob Buhmann, Derek Bradley, Otmar Hilliges, Romann M. Weber

    Abstract: While conditional diffusion models are known to have good coverage of the data distribution, they still face limitations in output diversity, particularly when sampled with a high classifier-free guidance scale for optimal image quality or when trained on small datasets. We attribute this problem to the role of the conditioning signal in inference and offer an improved sampling strategy for diffus… ▽ More

    Submitted 13 May, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: Published as a conference paper at ICLR 2024

    Journal ref: The Twelfth International Conference on Learning Representations (ICLR 2024)

  20. arXiv:2310.13768  [pdf, other

    cs.CV

    PACE: Human and Camera Motion Estimation from in-the-wild Videos

    Authors: Muhammed Kocabas, Ye Yuan, Pavlo Molchanov, Yunrong Guo, Michael J. Black, Otmar Hilliges, Jan Kautz, Umar Iqbal

    Abstract: We present a method to estimate human motion in a global scene from moving cameras. This is a highly challenging task due to the coupling of human and camera motions in the video. To address this problem, we propose a joint optimization framework that disentangles human and camera motions using both foreground human motion priors and background scene features. Unlike existing methods that use SLAM… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: 3DV 2024. Project page: https://nvlabs.github.io/PACE/

  21. arXiv:2309.16859  [pdf, other

    cs.CV cs.AI cs.LG

    Preface: A Data-driven Volumetric Prior for Few-shot Ultra High-resolution Face Synthesis

    Authors: Marcel C. Bühler, Kripasindhu Sarkar, Tanmay Shah, Gengyan Li, Daoye Wang, Leonhard Helminger, Sergio Orts-Escolano, Dmitry Lagun, Otmar Hilliges, Thabo Beeler, Abhimitra Meka

    Abstract: NeRFs have enabled highly realistic synthesis of human faces including complex appearance and reflectance effects of hair and skin. These methods typically require a large number of multi-view input images, making the process hardware intensive and cumbersome, limiting applicability to unconstrained settings. We propose a novel volumetric human face prior that enables the synthesis of ultra high-r… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

  22. arXiv:2309.07907  [pdf, other

    cs.RO cs.CV cs.LG

    Physically Plausible Full-Body Hand-Object Interaction Synthesis

    Authors: Jona Braun, Sammy Christen, Muhammed Kocabas, Emre Aksan, Otmar Hilliges

    Abstract: We propose a physics-based method for synthesizing dexterous hand-object interactions in a full-body setting. While recent advancements have addressed specific facets of human-object interactions, a comprehensive physics-based approach remains a challenge. Existing methods often focus on isolated segments of the interaction process and rely on data-driven techniques that may result in artifacts. I… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: Project page at https://eth-ait.github.io/phys-fullbody-grasp

  23. arXiv:2309.03891  [pdf, other

    cs.RO cs.CV cs.LG

    ArtiGrasp: Physically Plausible Synthesis of Bi-Manual Dexterous Gras** and Articulation

    Authors: Hui Zhang, Sammy Christen, Zicong Fan, Luocheng Zheng, Jemin Hwangbo, Jie Song, Otmar Hilliges

    Abstract: We present ArtiGrasp, a novel method to synthesize bi-manual hand-object interactions that include gras** and articulation. This task is challenging due to the diversity of the global wrist motions and the precise finger control that are necessary to articulate objects. ArtiGrasp leverages reinforcement learning and physics simulations to train a policy that controls the global and local hand po… ▽ More

    Submitted 3 March, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

    Comments: 3DV-2024 camera ready. Project page: https://eth-ait.github.io/artigrasp/

  24. arXiv:2308.16894  [pdf, other

    cs.CV

    EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild

    Authors: Manuel Kaufmann, Jie Song, Chen Guo, Kaiyue Shen, Tianjian Jiang, Chengcheng Tang, Juan Zarate, Otmar Hilliges

    Abstract: We present EMDB, the Electromagnetic Database of Global 3D Human Pose and Shape in the Wild. EMDB is a novel dataset that contains high-quality 3D SMPL pose and shape parameters with global body and camera trajectories for in-the-wild videos. We use body-worn, wireless electromagnetic (EM) sensors and a hand-held iPhone to record a total of 58 minutes of motion data, distributed over 81 indoor and… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV 2023

  25. arXiv:2306.16545  [pdf, other

    cs.CV

    Palm: Predicting Actions through Language Models @ Ego4D Long-Term Action Anticipation Challenge 2023

    Authors: Daoji Huang, Otmar Hilliges, Luc Van Gool, Xi Wang

    Abstract: We present Palm, a solution to the Long-Term Action Anticipation (LTA) task utilizing vision-language and large language models. Given an input video with annotated action periods, the LTA task aims to predict possible future actions. We hypothesize that an optimal solution should capture the interdependency between past and future actions, and be able to infer future actions based on the structur… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  26. arXiv:2305.05526  [pdf, other

    cs.CV

    EFE: End-to-end Frame-to-Gaze Estimation

    Authors: Haldun Balim, Seonwook Park, Xi Wang, Xucong Zhang, Otmar Hilliges

    Abstract: Despite the recent development of learning-based gaze estimation methods, most methods require one or more eye or face region crops as inputs and produce a gaze direction vector as output. Crop** results in a higher resolution in the eye regions and having fewer confounding factors (such as clothing and hair) is believed to benefit the final model performance. However, this eye/face patch croppi… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

  27. arXiv:2305.02312  [pdf, other

    cs.CV

    AG3D: Learning to Generate 3D Avatars from 2D Image Collections

    Authors: Zijian Dong, Xu Chen, **long Yang, Michael J. Black, Otmar Hilliges, Andreas Geiger

    Abstract: While progress in 2D generative models of human appearance has been rapid, many applications require 3D avatars that can be animated and rendered. Unfortunately, most existing methods for learning generative models of 3D humans with diverse shape and appearance require 3D training data, which is limited and expensive to acquire. The key to progress is hence to learn generative models of 3D avatars… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: Project Page: https://zj-dong.github.io/AG3D/

  28. arXiv:2305.00121  [pdf, other

    cs.CV

    Learning Locally Editable Virtual Humans

    Authors: Hsuan-I Ho, Lixin Xue, Jie Song, Otmar Hilliges

    Abstract: In this paper, we propose a novel hybrid representation and end-to-end trainable network architecture to model fully editable and customizable neural avatars. At the core of our work lies a representation that combines the modeling power of neural fields with the ease of use and inherent 3D consistency of skinned meshes. To this end, we construct a trainable feature codebook to store local geometr… ▽ More

    Submitted 28 April, 2023; originally announced May 2023.

    Comments: 12+11 pages, CVPR'23, project page https://custom-humans.github.io/

  29. arXiv:2303.17592  [pdf, other

    cs.RO cs.CV cs.LG

    Learning Human-to-Robot Handovers from Point Clouds

    Authors: Sammy Christen, Wei Yang, Claudia Pérez-D'Arpino, Otmar Hilliges, Dieter Fox, Yu-Wei Chao

    Abstract: We propose the first framework to learn control policies for vision-based human-to-robot handovers, a critical task for human-robot interaction. While research in Embodied AI has made significant progress in training robot agents in simulated environments, interacting with humans remains challenging due to the difficulties of simulating humans. Fortunately, recent research has developed realistic… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: Accepted at CVPR 2023 as highlight. Project page at https://handover-sim2real.github.io

  30. arXiv:2303.17209  [pdf, other

    cs.CV

    Human from Blur: Human Pose Tracking from Blurry Images

    Authors: Yiming Zhao, Denys Rozumnyi, Jie Song, Otmar Hilliges, Marc Pollefeys, Martin R. Oswald

    Abstract: We propose a method to estimate 3D human poses from substantially blurred images. The key idea is to tackle the inverse problem of image deblurring by modeling the forward problem with a 3D human model, a texture map, and a sequence of poses to describe human motion. The blurring process is then modeled by a temporal image aggregation step. Using a differentiable renderer, we can solve the inverse… ▽ More

    Submitted 25 September, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: typos and minor error fixed

  31. arXiv:2303.15380  [pdf, other

    cs.CV

    Hi4D: 4D Instance Segmentation of Close Human Interaction

    Authors: Yifei Yin, Chen Guo, Manuel Kaufmann, Juan Jose Zarate, Jie Song, Otmar Hilliges

    Abstract: We propose Hi4D, a method and dataset for the automatic analysis of physically close human-human interaction under prolonged contact. Robustly disentangling several in-contact subjects is a challenging task due to occlusions and complex shapes. Hence, existing multi-view systems typically fuse 3D surfaces of close subjects into a single, connected mesh. To address this issue we leverage i) individ… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: Project page: https://yifeiyin04.github.io/Hi4D/

  32. arXiv:2303.09628  [pdf, other

    cs.LG cs.RO

    Efficient Learning of High Level Plans from Play

    Authors: Núria Armengol Urpí, Marco Bagatella, Otmar Hilliges, Georg Martius, Stelian Coros

    Abstract: Real-world robotic manipulation tasks remain an elusive challenge, since they involve both fine-grained environment interaction, as well as the ability to plan for long-horizon goals. Although deep reinforcement learning (RL) methods have shown encouraging results when planning end-to-end in high-dimensional environments, they remain fundamentally limited by poor sample efficiency due to inefficie… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: Accepted to the International Conference on Robotics and Automation 2023

  33. arXiv:2303.04805  [pdf, other

    cs.CV

    X-Avatar: Expressive Human Avatars

    Authors: Kaiyue Shen, Chen Guo, Manuel Kaufmann, Juan Jose Zarate, Julien Valentin, Jie Song, Otmar Hilliges

    Abstract: We present X-Avatar, a novel avatar model that captures the full expressiveness of digital humans to bring about life-like experiences in telepresence, AR/VR and beyond. Our method models bodies, hands, facial expressions and appearance in a holistic fashion and can be learned from either full 3D scans or RGB-D data. To achieve this, we propose a part-aware learned forward skinning module that can… ▽ More

    Submitted 9 March, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

    Comments: Project page: https://skype-line.github.io/projects/X-Avatar/

  34. arXiv:2302.11566  [pdf, other

    cs.CV

    Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition

    Authors: Chen Guo, Tianjian Jiang, Xu Chen, Jie Song, Otmar Hilliges

    Abstract: We present Vid2Avatar, a method to learn human avatars from monocular in-the-wild videos. Reconstructing humans that move naturally from monocular in-the-wild videos is difficult. Solving it requires accurately separating humans from arbitrary backgrounds. Moreover, it requires reconstructing detailed 3D surface from short video sequences, making it even more challenging. Despite these challenges,… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

    Comments: Project page: https://moygcc.github.io/vid2avatar/

  35. arXiv:2301.09209  [pdf, other

    cs.CV cs.CL

    Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation

    Authors: Razvan-George Pasca, Alexey Gavryushin, Muhammad Hamza, Yen-Ling Kuo, Kaichun Mo, Luc Van Gool, Otmar Hilliges, Xi Wang

    Abstract: We study object interaction anticipation in egocentric videos. This task requires an understanding of the spatio-temporal context formed by past actions on objects, coined action context. We propose TransFusion, a multimodal transformer-based architecture. It exploits the representational power of language by summarizing the action context. TransFusion leverages pre-trained image captioning and vi… ▽ More

    Submitted 10 March, 2024; v1 submitted 22 January, 2023; originally announced January 2023.

  36. arXiv:2212.10550  [pdf, other

    cs.CV

    InstantAvatar: Learning Avatars from Monocular Video in 60 Seconds

    Authors: Tianjian Jiang, Xu Chen, Jie Song, Otmar Hilliges

    Abstract: In this paper, we take a significant step towards real-world applicability of monocular neural avatar reconstruction by contributing InstantAvatar, a system that can reconstruct human avatars from a monocular video within seconds, and these avatars can be animated and rendered at an interactive rate. To achieve this efficiency we propose a carefully designed and engineered system, that leverages e… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: 12 pages

  37. arXiv:2212.09530  [pdf, other

    cs.CV

    HARP: Personalized Hand Reconstruction from a Monocular RGB Video

    Authors: Korrawe Karunratanakul, Sergey Prokudin, Otmar Hilliges, Siyu Tang

    Abstract: We present HARP (HAnd Reconstruction and Personalization), a personalized hand avatar creation approach that takes a short monocular RGB video of a human hand as input and reconstructs a faithful hand avatar exhibiting a high-fidelity appearance and geometry. In contrast to the major trend of neural implicit representations, HARP models a hand with a mesh-based parametric hand model, a vertex disp… ▽ More

    Submitted 3 July, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: CVPR 2023. Project page: https://korrawe.github.io/harp-project/

  38. arXiv:2212.08377  [pdf, other

    cs.CV cs.GR

    PointAvatar: Deformable Point-based Head Avatars from Videos

    Authors: Yufeng Zheng, Wang Yifan, Gordon Wetzstein, Michael J. Black, Otmar Hilliges

    Abstract: The ability to create realistic, animatable and relightable head avatars from casual video sequences would open up wide ranging applications in communication and entertainment. Current methods either build on explicit 3D morphable meshes (3DMM) or exploit neural implicit representations. The former are limited by fixed topology, while the latter are non-trivial to deform and inefficient to render.… ▽ More

    Submitted 28 February, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: Project page: https://zhengyuf.github.io/PointAvatar/ Code base: https://github.com/zhengyuf/pointavatar

  39. arXiv:2212.07242  [pdf, other

    cs.CV

    HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics

    Authors: Artur Grigorev, Bernhard Thomaszewski, Michael J. Black, Otmar Hilliges

    Abstract: We propose a method that leverages graph neural networks, multi-level message passing, and unsupervised training to enable real-time prediction of realistic clothing dynamics. Whereas existing methods based on linear blend skinning must be trained for specific garments, our method is agnostic to body shape and applies to tight-fitting garments as well as loose, free-flowing clothing. Our method fu… ▽ More

    Submitted 16 June, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 16965-16974

  40. arXiv:2212.04823  [pdf, other

    cs.CV

    GazeNeRF: 3D-Aware Gaze Redirection with Neural Radiance Fields

    Authors: Alessandro Ruzzi, Xiangwei Shi, Xi Wang, Gengyan Li, Shalini De Mello, Hyung ** Chang, Xucong Zhang, Otmar Hilliges

    Abstract: We propose GazeNeRF, a 3D-aware method for the task of gaze redirection. Existing gaze redirection methods operate on 2D images and struggle to generate 3D consistent results. Instead, we build on the intuition that the face region and eyeballs are separate 3D structures that move in a coordinated yet independent fashion. Our method leverages recent advancements in conditional image-based neural r… ▽ More

    Submitted 28 March, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: Accepted at CVPR 2023. Github page: https://github.com/AlessandroRuzzi/GazeNeRF

  41. arXiv:2211.16630  [pdf, other

    cs.CV

    DINER: Depth-aware Image-based NEural Radiance fields

    Authors: Malte Prinzler, Otmar Hilliges, Justus Thies

    Abstract: We present Depth-aware Image-based NEural Radiance fields (DINER). Given a sparse set of RGB input views, we predict depth and feature maps to guide the reconstruction of a volumetric scene representation that allows us to render 3D objects under novel views. Specifically, we propose novel techniques to incorporate depth information into feature fusion and efficient scene sampling. In comparison t… ▽ More

    Submitted 30 March, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: Website: https://malteprinzler.github.io/projects/diner/diner.html ; Video: https://www.youtube.com/watch?v=iI_fpjY5k8Y&t=1s

  42. arXiv:2211.15601  [pdf, other

    cs.CV

    Fast-SNARF: A Fast Deformer for Articulated Neural Fields

    Authors: Xu Chen, Tianjian Jiang, Jie Song, Max Rietmann, Andreas Geiger, Michael J. Black, Otmar Hilliges

    Abstract: Neural fields have revolutionized the area of 3D reconstruction and novel view synthesis of rigid scenes. A key challenge in making such methods applicable to articulated objects, such as the human body, is to model the deformation of 3D locations between the rest pose (a canonical space) and the deformed space. We propose a new articulation module for neural fields, Fast-SNARF, which finds accura… ▽ More

    Submitted 1 December, 2022; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: github page: https://github.com/xuchen-ethz/fast-snarf

  43. arXiv:2211.07556  [pdf, other

    cs.LG

    Utilizing Synthetic Data in Supervised Learning for Robust 5-DoF Magnetic Marker Localization

    Authors: Mengfan Wu, Thomas Langerak, Otmar Hilliges, Juan Zarate

    Abstract: Tracking passive magnetic markers plays a vital role in advancing healthcare and robotics, offering the potential to significantly improve the precision and efficiency of systems. This technology is key to develo** smarter, more responsive tools and devices, such as enhanced surgical instruments, precise diagnostic tools, and robots with improved environmental interaction capabilities. However,… ▽ More

    Submitted 25 March, 2024; v1 submitted 14 November, 2022; originally announced November 2022.

  44. Computational Design of Active Kinesthetic Garments

    Authors: Velko Vechev, Ronan Hinchet, Stelian Coros, Bernhard Thomaszewski, Otmar Hilliges

    Abstract: Garments with the ability to provide kinesthetic force-feedback on-demand can augment human capabilities in a non-obtrusive way, enabling numerous applications in VR haptics, motion assistance, and robotic control. However, designing such garments is a complex, and often manual task, particularly when the goal is to resist multiple motions with a single design. In this work, we propose a computati… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

  45. arXiv:2209.12660  [pdf, other

    cs.HC

    MARLUI: Multi-Agent Reinforcement Learning for Adaptive UIs

    Authors: Thomas Langerak, Sammy Christen, Mert Albaba, Christoph Gebhardt, Otmar Hilliges

    Abstract: Adaptive user interfaces (UIs) automatically change an interface to better support users' tasks. Recently, machine learning techniques have enabled the transition to more powerful and complex adaptive UIs. However, a core challenge for adaptive user interfaces is the reliance on high-quality user data that has to be collected offline for each task. We formulate UI adaptation as a multi-agent reinf… ▽ More

    Submitted 27 October, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

  46. arXiv:2209.02485  [pdf, other

    cs.CV cs.CL

    Reconstructing Action-Conditioned Human-Object Interactions Using Commonsense Knowledge Priors

    Authors: Xi Wang, Gen Li, Yen-Ling Kuo, Muhammed Kocabas, Emre Aksan, Otmar Hilliges

    Abstract: We present a method for inferring diverse 3D models of human-object interactions from images. Reasoning about how humans interact with objects in complex scenes from a single 2D image is a challenging task given ambiguities arising from the loss of information through projection. In addition, modeling 3D interactions requires the generalization ability towards diverse object categories and interac… ▽ More

    Submitted 6 September, 2022; originally announced September 2022.

  47. arXiv:2209.00489  [pdf, other

    cs.CV

    TempCLR: Reconstructing Hands via Time-Coherent Contrastive Learning

    Authors: Andrea Ziani, Zicong Fan, Muhammed Kocabas, Sammy Christen, Otmar Hilliges

    Abstract: We introduce TempCLR, a new time-coherent contrastive learning approach for the structured regression task of 3D hand reconstruction. Unlike previous time-contrastive methods for hand pose estimation, our framework considers temporal consistency in its augmentation scheme, and accounts for the differences of hand poses along the temporal direction. Our data-driven method leverages unlabelled video… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

  48. EyeNeRF: A Hybrid Representation for Photorealistic Synthesis, Animation and Relighting of Human Eyes

    Authors: Gengyan Li, Abhimitra Meka, Franziska Müller, Marcel C. Bühler, Otmar Hilliges, Thabo Beeler

    Abstract: A unique challenge in creating high-quality animatable and relightable 3D avatars of people is modeling human eyes. The challenge of synthesizing eyes is multifold as it requires 1) appropriate representations for the various components of the eye and the periocular region for coherent viewpoint synthesis, capable of representing diffuse, refractive and highly reflective surfaces, 2) disentangling… ▽ More

    Submitted 12 July, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: 16 pages, 16 figures, 1 table, to be published in ACM Transactions on Graphics (TOG) (Volume: 41, Issue: 4), 2022

    ACM Class: I.4.5; I.3

  49. arXiv:2205.13528  [pdf, other

    cs.LG

    SFP: State-free Priors for Exploration in Off-Policy Reinforcement Learning

    Authors: Marco Bagatella, Sammy Christen, Otmar Hilliges

    Abstract: Efficient exploration is a crucial challenge in deep reinforcement learning. Several methods, such as behavioral priors, are able to leverage offline data in order to efficiently accelerate reinforcement learning on complex tasks. However, if the task at hand deviates excessively from the demonstrated task, the effectiveness of such methods is limited. In our work, we propose to learn features fro… ▽ More

    Submitted 31 August, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: Accepted by TMLR in August 2022. Project page at https://eth-ait.github.io/sfp/

  50. arXiv:2204.13662  [pdf, other

    cs.CV

    ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation

    Authors: Zicong Fan, Omid Taheri, Dimitrios Tzionas, Muhammed Kocabas, Manuel Kaufmann, Michael J. Black, Otmar Hilliges

    Abstract: Humans intuitively understand that inanimate objects do not move by themselves, but that state changes are typically caused by human manipulation (e.g., the opening of a book). This is not yet the case for machines. In part this is because there exist no datasets with ground-truth 3D annotations for the study of physically consistent and synchronised motion of hands and articulated objects. To thi… ▽ More

    Submitted 23 April, 2023; v1 submitted 28 April, 2022; originally announced April 2022.

    Comments: Project page: https://arctic.is.tue.mpg.de