Skip to main content

Showing 1–50 of 52 results for author: Rhodin, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.00637  [pdf, other

    cs.CV cs.AI cs.GR

    Representing Animatable Avatar via Factorized Neural Fields

    Authors: Chun** Song, Zhijie Wu, Bastian Wandt, Leonid Sigal, Helge Rhodin

    Abstract: For reconstructing high-fidelity human 3D models from monocular videos, it is crucial to maintain consistent large-scale body shapes along with finely matched subtle wrinkles. This paper explores the observation that the per-frame rendering results can be factorized into a pose-independent component and a corresponding pose-dependent equivalent to facilitate frame consistency. Pose adaptive textur… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  2. arXiv:2405.06845  [pdf, other

    cs.CV

    CasCalib: Cascaded Calibration for Motion Capture from Sparse Unsynchronized Cameras

    Authors: James Tang, Shashwat Suri, Daniel Ajisafe, Bastian Wandt, Helge Rhodin

    Abstract: It is now possible to estimate 3D human pose from monocular images with off-the-shelf 3D pose estimators. However, many practical applications require fine-grained absolute pose information for which multi-view cues and camera calibration are necessary. Such multi-view recordings are laborious because they require manual calibration, and are expensive when using dedicated hardware. Our goal is ful… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: Accepted to the 18th IEEE International Conference on Automatic Face and Gesture Recognition

  3. arXiv:2401.06116  [pdf, other

    cs.CV

    Gaussian Shadow Casting for Neural Characters

    Authors: Luis Bolanos, Shih-Yang Su, Helge Rhodin

    Abstract: Neural character models can now reconstruct detailed geometry and texture from video, but they lack explicit shadows and shading, leading to artifacts when generating novel views and poses or during relighting. It is particularly difficult to include shadows as they are a global effect and the required casting of secondary rays is costly. We propose a new shadow model using a Gaussian density prox… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 14 pages, 13 figures

  4. arXiv:2312.00065  [pdf, other

    cs.CV

    Unsupervised Keypoints from Pretrained Diffusion Models

    Authors: Eric Hedlin, Gopal Sharma, Shweta Mahajan, Xingzhe He, Hossam Isack, Abhishek Kar Helge Rhodin, Andrea Tagliasacchi, Kwang Moo Yi

    Abstract: Unsupervised learning of keypoints and landmarks has seen significant progress with the help of modern neural network architectures, but performance is yet to match the supervised counterpart, making their practicability questionable. We leverage the emergent knowledge within text-to-image diffusion models, towards more robust unsupervised keypoints. Our core idea is to find text embeddings that w… ▽ More

    Submitted 21 May, 2024; v1 submitted 29 November, 2023; originally announced December 2023.

  5. arXiv:2311.04315  [pdf, other

    cs.CV

    A Data Perspective on Enhanced Identity Preservation for Diffusion Personalization

    Authors: Xingzhe He, Zhiwen Cao, Nicholas Kolkin, Lantao Yu, Kun Wan, Helge Rhodin, Ratheesh Kalarot

    Abstract: Large text-to-image models have revolutionized the ability to generate imagery using natural language. However, particularly unique or personal visual concepts, such as pets and furniture, will not be captured by the original model. This has led to interest in how to personalize a text-to-image model. Despite significant progress, this task remains a formidable challenge, particularly in preservin… ▽ More

    Submitted 14 March, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

  6. arXiv:2309.04750  [pdf, other

    cs.CV

    Mirror-Aware Neural Humans

    Authors: Daniel Ajisafe, James Tang, Shih-Yang Su, Bastian Wandt, Helge Rhodin

    Abstract: Human motion capture either requires multi-camera systems or is unreliable when using single-view input due to depth ambiguities. Meanwhile, mirrors are readily available in urban environments and form an affordable alternative by recording two views with only a single camera. However, the mirror setting poses the additional challenge of handling occlusions of real and mirror image. Going beyond e… ▽ More

    Submitted 15 May, 2024; v1 submitted 9 September, 2023; originally announced September 2023.

    Comments: The 11th International Conference on 3D Vision (3DV 2024). Project website: https://danielajisafe.github.io/mirror-aware-neural-humans/

  7. arXiv:2308.11951  [pdf, other

    cs.CV cs.AI cs.GR

    Pose Modulated Avatars from Video

    Authors: Chun** Song, Bastian Wandt, Helge Rhodin

    Abstract: It is now possible to reconstruct dynamic human motion and shape from a sparse set of cameras using Neural Radiance Fields (NeRF) driven by an underlying skeleton. However, a challenge remains to model the deformation of cloth and skin in relation to skeleton pose. Unlike existing avatar models that are learned implicitly or rely on a proxy surface, our approach is motivated by the observation tha… ▽ More

    Submitted 29 September, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

  8. arXiv:2304.02013  [pdf, other

    cs.CV

    NPC: Neural Point Characters from Video

    Authors: Shih-Yang Su, Timur Bagautdinov, Helge Rhodin

    Abstract: High-fidelity human 3D models can now be learned directly from videos, typically by combining a template-based surface model with neural representations. However, obtaining a template surface requires expensive multi-view capture systems, laser scans, or strictly controlled conditions. Previous methods avoid using a template but rely on a costly or ill-posed map** from observation to canonical s… ▽ More

    Submitted 1 September, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: Project website: https://lemonatsu.github.io/npc/

  9. arXiv:2303.17216  [pdf, other

    cs.CV

    Few-shot Geometry-Aware Keypoint Localization

    Authors: Xingzhe He, Gaurav Bharaj, David Ferman, Helge Rhodin, Pablo Garrido

    Abstract: Supervised keypoint localization methods rely on large manually labeled image datasets, where objects can deform, articulate, or occlude. However, creating such large keypoint labels is time-consuming and costly, and is often error-prone due to inconsistent labeling. Thus, we desire an approach that can learn keypoint localization with fewer yet consistently annotated images. To this end, we prese… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023

  10. arXiv:2211.05773  [pdf, other

    cs.CV

    Scaling Neural Face Synthesis to High FPS and Low Latency by Neural Caching

    Authors: Frank Yu, Sid Fels, Helge Rhodin

    Abstract: Recent neural rendering approaches greatly improve image quality, reaching near photorealism. However, the underlying neural networks have high runtime, precluding telepresence and virtual reality applications that require high resolution at low latency. The sequential dependency of layers in deep networks makes their optimization difficult. We break this dependency by caching information from the… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

  11. arXiv:2206.11952  [pdf, other

    cs.CV cs.GR

    UNeRF: Time and Memory Conscious U-Shaped Network for Training Neural Radiance Fields

    Authors: Abiramy Kuganesan, Shih-yang Su, James J. Little, Helge Rhodin

    Abstract: Neural Radiance Fields (NeRFs) increase reconstruction detail for novel view synthesis and scene reconstruction, with applications ranging from large static scenes to dynamic human motion. However, the increased resolution and model-free nature of such neural fields come at the cost of high training times and excessive memory requirements. Recent advances improve the inference time by using comple… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

  12. arXiv:2205.10636  [pdf, other

    cs.CV

    AutoLink: Self-supervised Learning of Human Skeletons and Object Outlines by Linking Keypoints

    Authors: Xingzhe He, Bastian Wandt, Helge Rhodin

    Abstract: Structured representations such as keypoints are widely used in pose transfer, conditional image generation, animation, and 3D reconstruction. However, their supervised learning requires expensive annotation for each target domain. We propose a self-supervised method that learns to disentangle object structure from the appearance with a graph of 2D keypoints linked by straight edges. Both the keyp… ▽ More

    Submitted 23 March, 2023; v1 submitted 21 May, 2022; originally announced May 2022.

    Comments: NeurIPS 2022

    Journal ref: Advances in Neural Information Processing Systems 2022

  13. arXiv:2205.03448  [pdf, other

    cs.CV

    LatentKeypointGAN: Controlling Images via Latent Keypoints -- Extended Abstract

    Authors: Xingzhe He, Bastian Wandt, Helge Rhodin

    Abstract: Generative adversarial networks (GANs) can now generate photo-realistic images. However, how to best control the image content remains an open challenge. We introduce LatentKeypointGAN, a two-stage GAN internally conditioned on a set of keypoints and associated appearance embeddings providing control of the position and style of the generated objects and their respective parts. A major difficulty… ▽ More

    Submitted 17 May, 2022; v1 submitted 6 May, 2022; originally announced May 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2103.15812

    Journal ref: CVPR Workshop 2022

  14. arXiv:2205.01666  [pdf, other

    cs.CV

    DANBO: Disentangled Articulated Neural Body Representations via Graph Neural Networks

    Authors: Shih-Yang Su, Timur Bagautdinov, Helge Rhodin

    Abstract: Deep learning greatly improved the realism of animatable human models by learning geometry and appearance from collections of 3D scans, template meshes, and multi-view imagery. High-resolution models enable photo-realistic avatars but at the cost of requiring studio settings not available to end users. Our goal is to create avatars directly from raw images without relying on expensive studio setup… ▽ More

    Submitted 11 October, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

    Comments: ECCV 2022. Project website: https://lemonatsu.github.io/danbo

  15. arXiv:2205.00076  [pdf, other

    cs.CV

    A Simple Method to Boost Human Pose Estimation Accuracy by Correcting the Joint Regressor for the Human3.6m Dataset

    Authors: Eric Hedlin, Helge Rhodin, Kwang Moo Yi

    Abstract: Many human pose estimation methods estimate Skinned Multi-Person Linear (SMPL) models and regress the human joints from these SMPL estimates. In this work, we show that the most widely used SMPL-to-joint linear layer (joint regressor) is inaccurate, which may mislead pose evaluation results. To achieve a more accurate joint regressor, we propose a method to create pseudo-ground-truth SMPL poses, w… ▽ More

    Submitted 29 April, 2022; originally announced May 2022.

  16. arXiv:2202.14019  [pdf, other

    cs.CV cs.AI cs.HC cs.LG

    Domain Knowledge-Informed Self-Supervised Representations for Workout Form Assessment

    Authors: Paritosh Parmar, Amol Gharat, Helge Rhodin

    Abstract: Maintaining proper form while exercising is important for preventing injuries and maximizing muscle mass gains. Detecting errors in workout form naturally requires estimating human's body pose. However, off-the-shelf pose estimators struggle to perform well on the videos recorded in gym scenarios due to factors such as camera angles, occlusion from gym equipment, illumination, and clothing. To agg… ▽ More

    Submitted 21 October, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

  17. arXiv:2112.12193  [pdf, other

    cs.CV

    Improved 2D Keypoint Detection in Out-of-Balance and Fall Situations -- combining input rotations and a kinematic model

    Authors: Michael Zwölfer, Dieter Heinrich, Kurt Schindelwig, Bastian Wandt, Helge Rhodin, Joerg Spoerri, Werner Nachbauer

    Abstract: Injury analysis may be one of the most beneficial applications of deep learning based human pose estimation. To facilitate further research on this topic, we provide an injury specific 2D dataset for alpine skiing, covering in total 533 images. We further propose a post processing routine, that combines rotational information with a simple kinematic model. We could improve detection results in fal… ▽ More

    Submitted 22 December, 2021; originally announced December 2021.

    Comments: extended abstract, 4 pages, 3 figures, 2 tables

  18. arXiv:2112.11593  [pdf, other

    cs.CV

    AdaptPose: Cross-Dataset Adaptation for 3D Human Pose Estimation by Learnable Motion Generation

    Authors: Mohsen Gholami, Bastian Wandt, Helge Rhodin, Rabab Ward, Z. Jane Wang

    Abstract: This paper addresses the problem of cross-dataset generalization of 3D human pose estimation models. Testing a pre-trained 3D pose estimator on a new dataset results in a major performance drop. Previous methods have mainly addressed this problem by improving the diversity of the training data. We argue that diversity alone is not sufficient and that the characteristics of the training data need t… ▽ More

    Submitted 15 March, 2022; v1 submitted 21 December, 2021; originally announced December 2021.

  19. arXiv:2112.07088  [pdf, other

    cs.CV

    ElePose: Unsupervised 3D Human Pose Estimation by Predicting Camera Elevation and Learning Normalizing Flows on 2D Poses

    Authors: Bastian Wandt, James J. Little, Helge Rhodin

    Abstract: Human pose estimation from single images is a challenging problem that is typically solved by supervised learning. Unfortunately, labeled training data does not yet exist for many human activities since 3D annotation requires dedicated motion capture systems. Therefore, we propose an unsupervised approach that learns to predict a 3D human pose from a single image while only being trained with 2D p… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

  20. arXiv:2112.01036  [pdf, other

    cs.CV

    GANSeg: Learning to Segment by Unsupervised Hierarchical Image Generation

    Authors: Xingzhe He, Bastian Wandt, Helge Rhodin

    Abstract: Segmenting an image into its parts is a frequent preprocess for high-level vision tasks such as image editing. However, annotating masks for supervised training is expensive. Weakly-supervised and unsupervised methods exist, but they depend on the comparison of pairs of images, such as from multi-views, frames of videos, and image augmentation, which limits their applicability. To address this, we… ▽ More

    Submitted 8 October, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

    Comments: CVPR 2022

  21. arXiv:2107.02407  [pdf, other

    cs.CV

    NRST: Non-rigid Surface Tracking from Monocular Video

    Authors: Marc Habermann, Weipeng Xu, Helge Rhodin, Michael Zollhoefer, Gerard Pons-Moll, Christian Theobalt

    Abstract: We propose an efficient method for non-rigid surface tracking from monocular RGB videos. Given a video and a template mesh, our algorithm sequentially registers the template non-rigidly to each frame. We formulate the per-frame registration as an optimization problem that includes a novel texture term specifically tailored towards tracking objects with uniform texture but fine-scale structure, suc… ▽ More

    Submitted 12 July, 2021; v1 submitted 6 July, 2021; originally announced July 2021.

  22. arXiv:2105.06599  [pdf, other

    cs.CV

    TriPose: A Weakly-Supervised 3D Human Pose Estimation via Triangulation from Video

    Authors: Mohsen Gholami, Ahmad Rezaei, Helge Rhodin, Rabab Ward, Z. Jane Wang

    Abstract: Estimating 3D human poses from video is a challenging problem. The lack of 3D human pose annotations is a major obstacle for supervised training and for generalization to unseen datasets. In this work, we address this problem by proposing a weakly-supervised training scheme that does not require 3D annotations or calibrated cameras. The proposed method relies on temporal information and triangulat… ▽ More

    Submitted 13 May, 2021; originally announced May 2021.

  23. arXiv:2103.15812  [pdf, other

    cs.CV

    LatentKeypointGAN: Controlling GANs via Latent Keypoints

    Authors: Xingzhe He, Bastian Wandt, Helge Rhodin

    Abstract: Generative adversarial networks (GANs) have attained photo-realistic quality in image generation. However, how to best control the image content remains an open challenge. We introduce LatentKeypointGAN, a two-stage GAN which is trained end-to-end on the classical GAN objective with internal conditioning on a set of space keypoints. These keypoints have associated appearance embeddings that respec… ▽ More

    Submitted 8 June, 2023; v1 submitted 29 March, 2021; originally announced March 2021.

    Journal ref: CRV 2023

  24. arXiv:2102.06199  [pdf, other

    cs.CV cs.GR

    A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose

    Authors: Shih-Yang Su, Frank Yu, Michael Zollhoefer, Helge Rhodin

    Abstract: While deep learning reshaped the classical motion capture pipeline with feed-forward networks, generative models are required to recover fine alignment via iterative refinement. Unfortunately, the existing models are usually hand-crafted or learned in controlled conditions, only applicable to limited domains. We propose a method to learn a generative neural body model from unlabelled monocular vid… ▽ More

    Submitted 28 October, 2021; v1 submitted 11 February, 2021; originally announced February 2021.

    Comments: NeurIPS 2021. Project website: https://lemonatsu.github.io/anerf/

  25. arXiv:2012.13341  [pdf, other

    cs.HC cs.CV cs.LG cs.SD eess.AS

    AudioViewer: Learning to Visualize Sounds

    Authors: Chun** Song, Yuchi Zhang, Willis Peng, Parmis Mohaghegh, Bastian Wandt, Helge Rhodin

    Abstract: A long-standing goal in the field of sensory substitution is to enable sound perception for deaf and hard of hearing (DHH) people by visualizing audio content. Different from existing models that translate to hand sign language, between speech and text, or text and images, we target immediate and low-level audio to video translation that applies to generic environment sounds as well as human speec… ▽ More

    Submitted 10 November, 2022; v1 submitted 22 December, 2020; originally announced December 2020.

    Journal ref: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 2206-2216

  26. arXiv:2012.05119  [pdf, other

    cs.CV

    Human Detection and Segmentation via Multi-view Consensus

    Authors: Isinsu Katircioglu, Helge Rhodin, Jörg Spörri, Mathieu Salzmann, Pascal Fua

    Abstract: Self-supervised detection and segmentation of foreground objects aims for accuracy without annotated training data. However, existing approaches predominantly rely on restrictive assumptions on appearance and motion. For scenes with dynamic activities and camera motion, we propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during trai… ▽ More

    Submitted 18 August, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

  27. Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation

    Authors: Sina Honari, Victor Constantin, Helge Rhodin, Mathieu Salzmann, Pascal Fua

    Abstract: In this paper we propose an unsupervised feature extraction method to capture temporal information on monocular videos, where we detect and encode subject of interest in each frame and leverage contrastive self-supervised (CSS) learning to extract rich latent vectors. Instead of simply treating the latent features of nearby frames as positive pairs and those of temporally-distant ones as negative… ▽ More

    Submitted 25 November, 2022; v1 submitted 2 December, 2020; originally announced December 2020.

    Comments: Accepted in "IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)"

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022

  28. arXiv:2011.14679  [pdf, other

    cs.CV

    CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the Wild

    Authors: Bastian Wandt, Marco Rudolph, Petrissa Zell, Helge Rhodin, Bodo Rosenhahn

    Abstract: Human pose estimation from single images is a challenging problem in computer vision that requires large amounts of labeled training data to be solved accurately. Unfortunately, for many human activities (\eg outdoor sports) such training data does not exist and is hard or even impossible to acquire with traditional motion capture systems. We propose a self-supervised approach that learns a single… ▽ More

    Submitted 30 November, 2020; originally announced November 2020.

  29. arXiv:2011.13607  [pdf, other

    cs.CV

    PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective Crop Layers

    Authors: Frank Yu, Mathieu Salzmann, Pascal Fua, Helge Rhodin

    Abstract: Local processing is an essential feature of CNNs and other neural network architectures - it is one of the reasons why they work so well on images where relevant information is, to a large extent, local. However, perspective effects stemming from the projection in a conventional camera vary for different global positions in the image. We introduce Perspective Crop Layers (PCLs) - a form of perspec… ▽ More

    Submitted 15 April, 2021; v1 submitted 27 November, 2020; originally announced November 2020.

    Comments: CVPR 2021

  30. arXiv:2011.05626  [pdf, other

    cs.CV

    Self-supervised Segmentation via Background Inpainting

    Authors: Isinsu Katircioglu, Helge Rhodin, Victor Constantin, Jörg Spörri, Mathieu Salzmann, Pascal Fua

    Abstract: While supervised object detection and segmentation methods achieve impressive accuracy, they generalize poorly to images whose appearance significantly differs from the data they have been trained on. To address this when annotating data is prohibitively expensive, we introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving c… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: arXiv admin note: text overlap with arXiv:1907.08051

  31. arXiv:2011.04844  [pdf, other

    cs.CV

    Ellipse Detection and Localization with Applications to Knots in Sawn Lumber Images

    Authors: Shenyi Pan, Shuxian Fan, Samuel W. K. Wong, James V. Zidek, Helge Rhodin

    Abstract: While general object detection has seen tremendous progress, localization of elliptical objects has received little attention in the literature. Our motivating application is the detection of knots in sawn timber images, which is an important problem since the number and types of knots are visual characteristics that adversely affect the quality of sawn timber. We demonstrate how models can be tai… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

    Comments: Accepted at WACV 2021

  32. arXiv:2001.08601  [pdf, other

    cs.CV

    Deformation-aware Unpaired Image Translation for Pose Estimation on Laboratory Animals

    Authors: Siyuan Li, Semih Günel, Mirela Ostrek, Pavan Ramdya, Pascal Fua, Helge Rhodin

    Abstract: Our goal is to capture the pose of neuroscience model organisms, without using any manual supervision, to be able to study how neural circuits orchestrate behaviour. Human pose estimation attains remarkable accuracy when trained on real or simulated datasets consisting of millions of frames. However, for many applications simulated models are unrealistic and real training datasets with comprehensi… ▽ More

    Submitted 23 January, 2020; originally announced January 2020.

  33. arXiv:1912.10589  [pdf, other

    cs.CV cs.GR

    Front2Back: Single View 3D Shape Reconstruction via Front to Back Prediction

    Authors: Yuan Yao, Nico Schertler, Enrique Rosales, Helge Rhodin, Leonid Sigal, Alla Sheffer

    Abstract: Reconstruction of a 3D shape from a single 2D image is a classical computer vision problem, whose difficulty stems from the inherent ambiguity of recovering occluded or only partially observed surfaces. Recent methods address this challenge through the use of largely unstructured neural networks that effectively distill conditional map** and priors over 3D shape. In this work, we induce structur… ▽ More

    Submitted 31 January, 2020; v1 submitted 22 December, 2019; originally announced December 2019.

  34. arXiv:1912.08568  [pdf, other

    cs.CV cs.RO

    ActiveMoCap: Optimized Viewpoint Selection for Active Human Motion Capture

    Authors: Sena Kiciroglu, Helge Rhodin, Sudipta N. Sinha, Mathieu Salzmann, Pascal Fua

    Abstract: The accuracy of monocular 3D human pose estimation depends on the viewpoint from which the image is captured. While freely moving cameras, such as on drones, provide control over this viewpoint, automatically positioning them at the location which will yield the highest accuracy remains an open problem. This is the problem that we address in this paper. Specifically, given a short video sequence,… ▽ More

    Submitted 18 June, 2020; v1 submitted 18 December, 2019; originally announced December 2019.

    Comments: For associated video, see https://youtu.be/i58Bu-hbZHs Published in CVPR 2020

  35. arXiv:1909.02211  [pdf, other

    cs.CV

    Gravity as a Reference for Estimating a Person's Height from Video

    Authors: Didier Bieler, Semih Günel, Pascal Fua, Helge Rhodin

    Abstract: Estimating the metric height of a person from monocular imagery without additional assumptions is ill-posed. Existing solutions either require manual calibration of ground plane and camera geometry, special cameras, or reference objects of known size. We focus on motion cues and exploit gravity on earth as an omnipresent reference 'object' to translate acceleration, and subsequently height, measur… ▽ More

    Submitted 16 October, 2019; v1 submitted 5 September, 2019; originally announced September 2019.

    Comments: ICCV 2019

  36. arXiv:1908.11676  [pdf, other

    cs.CV

    Motion Capture from Pan-Tilt Cameras with Unknown Orientation

    Authors: Roman Bachmann, Jörg Spörri, Pascal Fua, Helge Rhodin

    Abstract: In sports, such as alpine skiing, coaches would like to know the speed and various biomechanical variables of their athletes and competitors. Existing methods use either body-worn sensors, which are cumbersome to setup, or manual image annotation, which is time consuming. We propose a method for estimating an athlete's global 3D position and articulated pose using multiple cameras. By contrast to… ▽ More

    Submitted 30 August, 2019; originally announced August 2019.

    Comments: International Conference on 3D Vision 2019

  37. arXiv:1907.08051  [pdf, other

    cs.CV

    Self-supervised Training of Proposal-based Segmentation via Background Prediction

    Authors: Isinsu Katircioglu, Helge Rhodin, Victor Constantin, Jörg Spörri, Mathieu Salzmann, Pascal Fua

    Abstract: While supervised object detection methods achieve impressive accuracy, they generalize poorly to images whose appearance significantly differs from the data they have been trained on. To address this in scenarios where annotating data is prohibitively expensive, we introduce a self-supervised approach to object detection and segmentation, able to work with monocular images captured with a moving c… ▽ More

    Submitted 18 July, 2019; originally announced July 2019.

  38. XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera

    Authors: Dushyant Mehta, Oleksandr Sotnychenko, Franziska Mueller, Weipeng Xu, Mohamed Elgharib, Pascal Fua, Hans-Peter Seidel, Helge Rhodin, Gerard Pons-Moll, Christian Theobalt

    Abstract: We present a real-time approach for multi-person 3D motion capture at over 30 fps using a single RGB camera. It operates successfully in generic scenes which may contain occlusions by objects and by other people. Our method operates in subsequent stages. The first stage is a convolutional neural network (CNN) that estimates 2D and 3D pose features along with identity assignments for all visible jo… ▽ More

    Submitted 30 April, 2020; v1 submitted 1 July, 2019; originally announced July 2019.

    Comments: To appear in ACM Transactions on Graphics (SIGGRAPH) 2020

  39. arXiv:1903.05684  [pdf, other

    cs.CV

    Neural Scene Decomposition for Multi-Person Motion Capture

    Authors: Helge Rhodin, Victor Constantin, Isinsu Katircioglu, Mathieu Salzmann, Pascal Fua

    Abstract: Learning general image representations has proven key to the success of many computer vision tasks. For example, many approaches to image understanding problems rely on deep networks that were initially trained on ImageNet, mostly because the learned features are a valuable starting point to learn from limited labeled data. However, when it comes to 3D motion capture of multiple people, these feat… ▽ More

    Submitted 13 March, 2019; originally announced March 2019.

    Comments: CVPR 2019

  40. arXiv:1805.10355  [pdf, other

    cs.CV

    What Face and Body Shapes Can Tell About Height

    Authors: Semih Günel, Helge Rhodin, Pascal Fua

    Abstract: Recovering a person's height from a single image is important for virtual garment fitting, autonomous driving and surveillance, however, it is also very challenging due to the absence of absolute scale information. We tackle the rarely addressed case, where camera parameters and scene geometry is unknown. To nevertheless resolve the inherent scale ambiguity, we infer height from statistics that ar… ▽ More

    Submitted 25 May, 2018; originally announced May 2018.

  41. arXiv:1804.01110  [pdf, other

    cs.CV cs.AI

    Unsupervised Geometry-Aware Representation for 3D Human Pose Estimation

    Authors: Helge Rhodin, Mathieu Salzmann, Pascal Fua

    Abstract: Modern 3D human pose estimation techniques rely on deep networks, which require large amounts of training data. While weakly-supervised methods require less supervision, by utilizing 2D poses or multi-view imagery without annotations, they still need a sufficiently large set of samples with 3D annotations for learning to succeed. In this paper, we propose to overcome this problem by learning a g… ▽ More

    Submitted 3 April, 2018; originally announced April 2018.

  42. arXiv:1803.05959  [pdf, other

    cs.CV

    Mo2Cap2: Real-time Mobile 3D Motion Capture with a Cap-mounted Fisheye Camera

    Authors: Weipeng Xu, Avishek Chatterjee, Michael Zollhoefer, Helge Rhodin, Pascal Fua, Hans-Peter Seidel, Christian Theobalt

    Abstract: We propose the first real-time approach for the egocentric estimation of 3D human body pose in a wide range of unconstrained everyday activities. This setting has a unique set of challenges, such as mobility of the hardware setup, and robustness to long capture sessions with fast recovery from tracking failures. We tackle these challenges based on a novel lightweight setup that converts a standard… ▽ More

    Submitted 23 January, 2019; v1 submitted 15 March, 2018; originally announced March 2018.

    Comments: IEEE TVCG Proc. VR 2019. Webpage: http://gvv.mpi-inf.mpg.de/projects/wxu/Mo2Cap2/

  43. arXiv:1803.04775  [pdf, other

    cs.CV

    Learning Monocular 3D Human Pose Estimation from Multi-view Images

    Authors: Helge Rhodin, Jörg Spörri, Isinsu Katircioglu, Victor Constantin, Frédéric Meyer, Erich Müller, Mathieu Salzmann, Pascal Fua

    Abstract: Accurate 3D human pose estimation from single images is possible with sophisticated deep-net architectures that have been trained on very large datasets. However, this still leaves open the problem of capturing motions for which no such database exists. Manual annotation is tedious, slow, and error-prone. In this paper, we propose to replace most of the annotations by the use of multiple views, at… ▽ More

    Submitted 24 March, 2018; v1 submitted 13 March, 2018; originally announced March 2018.

    Comments: CVPR 2018, Ski-Pose PTZ-Camera Dataset available

  44. arXiv:1708.02136  [pdf, other

    cs.CV cs.GR

    MonoPerfCap: Human Performance Capture from Monocular Video

    Authors: Weipeng Xu, Avishek Chatterjee, Michael Zollhöfer, Helge Rhodin, Dushyant Mehta, Hans-Peter Seidel, Christian Theobalt

    Abstract: We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface deformations in general scenes. Human performance capture is a challenging problem due to the large range of articulation, potentially fast motion, and co… ▽ More

    Submitted 23 February, 2018; v1 submitted 7 August, 2017; originally announced August 2017.

    Comments: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 2018

  45. VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera

    Authors: Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, Christian Theobalt

    Abstract: We present the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera. Our method combines a new convolutional neural network (CNN) based pose regressor with kinematic skeleton fitting. Our novel fully-convolutional pose formulation regresses 2D and 3D joint positions jointly in real time and does not requir… ▽ More

    Submitted 3 May, 2017; originally announced May 2017.

    Comments: Accepted to SIGGRAPH 2017

  46. arXiv:1701.00142  [pdf, other

    cs.CV

    EgoCap: Egocentric Marker-less Motion Capture with Two Fisheye Cameras (Extended Abstract)

    Authors: Helge Rhodin, Christian Richardt, Dan Casas, Eldar Insafutdinov, Mohammad Shafiei, Hans-Peter Seidel, Bernt Schiele, Christian Theobalt

    Abstract: Marker-based and marker-less optical skeletal motion-capture methods use an outside-in arrangement of cameras placed around a scene, with viewpoints converging on the center. They often create discomfort by possibly needed marker suits, and their recording volume is severely restricted and often constrained to indoor scenes with controlled backgrounds. We therefore propose a new method for real-ti… ▽ More

    Submitted 31 December, 2016; originally announced January 2017.

    Comments: Short version of a SIGGRAPH Asia 2016 paper arXiv:1609.07306, presented at EPIC@ECCV16

  47. arXiv:1611.09813  [pdf, other

    cs.CV

    Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision

    Authors: Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, Christian Theobalt

    Abstract: We propose a CNN-based approach for 3D human body pose estimation from single RGB images that addresses the issue of limited generalizability of models trained solely on the starkly limited publicly available 3D pose data. Using only the existing 3D pose data and 2D pose data, we show state-of-the-art performance on established benchmarks through transfer of learned features, while also generalizi… ▽ More

    Submitted 4 October, 2017; v1 submitted 29 November, 2016; originally announced November 2016.

    Comments: Accepted at the International Conference on 3D Vision (3DV) 2017

  48. arXiv:1610.06740  [pdf, other

    cs.CV

    Model-based Outdoor Performance Capture

    Authors: Nadia Robertini, Dan Casas, Helge Rhodin, Hans-Peter Seidel, Christian Theobalt

    Abstract: We propose a new model-based method to accurately reconstruct human performances captured outdoors in a multi-camera setup. Starting from a template of the actor model, we introduce a new unified implicit representation for both, articulated skeleton tracking and nonrigid surface shape refinement. Our method fits the template to unsegmented video frames in two stages - first, the coarse skeletal p… ▽ More

    Submitted 21 October, 2016; originally announced October 2016.

    Comments: 3DV 2016

  49. arXiv:1609.07306  [pdf, other

    cs.CV

    EgoCap: Egocentric Marker-less Motion Capture with Two Fisheye Cameras

    Authors: Helge Rhodin, Christian Richardt, Dan Casas, Eldar Insafutdinov, Mohammad Shafiei, Hans-Peter Seidel, Bernt Schiele, Christian Theobalt

    Abstract: Marker-based and marker-less optical skeletal motion-capture methods use an outside-in arrangement of cameras placed around a scene, with viewpoints converging on the center. They often create discomfort by possibly needed marker suits, and their recording volume is severely restricted and often constrained to indoor scenes with controlled backgrounds. Alternative suit-based systems use several in… ▽ More

    Submitted 23 September, 2016; originally announced September 2016.

    Comments: SIGGRAPH Asia 2016

  50. arXiv:1607.08659  [pdf, other

    cs.CV

    General Automatic Human Shape and Motion Capture Using Volumetric Contour Cues

    Authors: Helge Rhodin, Nadia Robertini, Dan Casas, Christian Richardt, Hans-Peter Seidel, Christian Theobalt

    Abstract: Markerless motion capture algorithms require a 3D body with properly personalized skeleton dimension and/or body shape and appearance to successfully track a person. Unfortunately, many tracking methods consider model personalization a different problem and use manual or semi-automatic model initialization, which greatly reduces applicability. In this paper, we propose a fully automatic algorithm… ▽ More

    Submitted 21 October, 2016; v1 submitted 28 July, 2016; originally announced July 2016.

    Comments: Accepted to ECCV 2016, added additional references