Skip to main content

Showing 1–36 of 36 results for author: Pavlakos, G

.
  1. arXiv:2407.03204  [pdf, other

    cs.CV

    Expressive Gaussian Human Avatars from Monocular RGB Video

    Authors: Hezhen Hu, Zhiwen Fan, Tianhao Wu, Yihan Xi, Seoyoung Lee, Georgios Pavlakos, Zhangyang Wang

    Abstract: Nuanced expressiveness, particularly through fine-grained hand and facial expressions, is pivotal for enhancing the realism and vitality of digital human representations. In this work, we focus on investigating the expressiveness of human avatars when learned from monocular RGB video; a setting that introduces new challenges in capturing and animating fine-grained details. To this end, we introduc… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  2. arXiv:2406.08479  [pdf, other

    cs.CV

    Real3D: Scaling Up Large Reconstruction Models with Real-World Images

    Authors: Hanwen Jiang, Qixing Huang, Georgios Pavlakos

    Abstract: The default strategy for training single-view Large Reconstruction Models (LRMs) follows the fully supervised route using large-scale datasets of synthetic 3D assets or multi-view captures. Although these resources simplify the training procedure, they are hard to scale up beyond the existing datasets and they are not necessarily representative of the real distribution of object shapes. To address… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Project page: https://hwjiang1510.github.io/Real3D/

  3. arXiv:2406.03417  [pdf, other

    cs.CV cs.GR

    CoFie: Learning Compact Neural Surface Representations with Coordinate Fields

    Authors: Hanwen Jiang, Haitao Yang, Georgios Pavlakos, Qixing Huang

    Abstract: This paper introduces CoFie, a novel local geometry-aware neural surface representation. CoFie is motivated by the theoretical analysis of local SDFs with quadratic approximation. We find that local shapes are highly compressive in an aligned coordinate frame defined by the normal and tangent directions of local shapes. Accordingly, we introduce Coordinate Field, which is a composition of coordina… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Project page: https://hwjiang1510.github.io/CoFie/

  4. arXiv:2405.18831  [pdf, other

    cs.CV cs.LG

    Evaluating Zero-Shot GPT-4V Performance on 3D Visual Question Answering Benchmarks

    Authors: Simranjit Singh, Georgios Pavlakos, Dimitrios Stamoulis

    Abstract: As interest in "reformulating" the 3D Visual Question Answering (VQA) problem in the context of foundation models grows, it is imperative to assess how these new paradigms influence existing closed-vocabulary datasets. In this case study, we evaluate the zero-shot performance of foundational models (GPT-4 Vision and GPT-4) on well-established 3D VQA benchmarks, namely 3D-VQA and ScanQA. We provide… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted at 1st Workshop on Multimodalities for 3D Scenes CVPR 2024

  5. arXiv:2404.11987  [pdf, other

    cs.CV

    MultiPhys: Multi-Person Physics-aware 3D Motion Estimation

    Authors: Nicolas Ugrinovic, Boxiao Pan, Georgios Pavlakos, Despoina Paschalidou, Bokui Shen, Jordi Sanchez-Riera, Francesc Moreno-Noguer, Leonidas Guibas

    Abstract: We introduce MultiPhys, a method designed for recovering multi-person motion from monocular videos. Our focus lies in capturing coherent spatial placement between pairs of individuals across varying degrees of engagement. MultiPhys, being physically aware, exhibits robustness to jittering and occlusions, and effectively eliminates penetration issues between the two individuals. We devise a pipelin… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  6. arXiv:2404.06507  [pdf, other

    cs.CV

    Reconstructing Hand-Held Objects in 3D

    Authors: Jane Wu, Georgios Pavlakos, Georgia Gkioxari, Jitendra Malik

    Abstract: Objects manipulated by the hand (i.e., manipulanda) are particularly challenging to reconstruct from in-the-wild RGB images or videos. Not only does the hand occlude much of the object, but also the object is often only visible in a small number of image pixels. At the same time, two strong anchors emerge in this setting: (1) estimated 3D hands help disambiguate the location and scale of the objec… ▽ More

    Submitted 9 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: Project page: https://janehwu.github.io/mcc-ho

  7. arXiv:2403.20309  [pdf, other

    cs.CV

    InstantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds

    Authors: Zhiwen Fan, Wenyan Cong, Kairun Wen, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, Zhangyang Wang, Yue Wang

    Abstract: While novel view synthesis (NVS) from a sparse set of images has advanced significantly in 3D computer vision, it relies on precise initial estimation of camera parameters using Structure-from-Motion (SfM). For instance, the recently developed Gaussian Splatting depends heavily on the accuracy of SfM-derived points and poses. However, SfM processes are time-consuming and often prove unreliable in… ▽ More

    Submitted 30 June, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

    Comments: Project Page: https://instantsplat.github.io/

  8. arXiv:2312.05251  [pdf, other

    cs.CV

    Reconstructing Hands in 3D with Transformers

    Authors: Georgios Pavlakos, Dandan Shan, Ilija Radosavovic, Angjoo Kanazawa, David Fouhey, Jitendra Malik

    Abstract: We present an approach that can reconstruct hands in 3D from monocular input. Our approach for Hand Mesh Recovery, HaMeR, follows a fully transformer-based architecture and can analyze hands with significantly increased accuracy and robustness compared to previous work. The key to HaMeR's success lies in scaling up both the data used for training and the capacity of the deep network for hand recon… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  9. arXiv:2311.16099  [pdf, other

    cs.CV cs.GR

    GART: Gaussian Articulated Template Models

    Authors: Jiahui Lei, Yufu Wang, Georgios Pavlakos, Lingjie Liu, Kostas Daniilidis

    Abstract: We introduce Gaussian Articulated Template Model GART, an explicit, efficient, and expressive representation for non-rigid articulated subject capturing and rendering from monocular videos. GART utilizes a mixture of moving 3D Gaussians to explicitly approximate a deformable subject's geometry and appearance. It takes advantage of a categorical template model prior (SMPL, SMAL, etc.) with learnabl… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 13 pages, code available at https://www.cis.upenn.edu/~leijh/projects/gart/

  10. arXiv:2306.09337  [pdf, other

    cs.CV

    Generative Proxemics: A Prior for 3D Social Interaction from Images

    Authors: Lea Müller, Vickie Ye, Georgios Pavlakos, Michael Black, Angjoo Kanazawa

    Abstract: Social interaction is a fundamental aspect of human behavior and communication. The way individuals position themselves in relation to others, also known as proxemics, conveys social cues and affects the dynamics of social interaction. Reconstructing such interaction from images presents challenges because of mutual occlusion and the limited availability of large training datasets. To address this… ▽ More

    Submitted 12 December, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Project website: muelea.github.io/buddi

  11. arXiv:2305.20091  [pdf, other

    cs.CV

    Humans in 4D: Reconstructing and Tracking Humans with Transformers

    Authors: Shubham Goel, Georgios Pavlakos, Jathushan Rajasegaran, Angjoo Kanazawa, Jitendra Malik

    Abstract: We present an approach to reconstruct humans and track them over time. At the core of our approach, we propose a fully "transformerized" version of a network for human mesh recovery. This network, HMR 2.0, advances the state of the art and shows the capability to analyze unusual poses that have in the past been difficult to reconstruct from single images. To analyze video, we use 3D reconstruction… ▽ More

    Submitted 31 August, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: In ICCV 2023. Project Webpage: https://shubham-goel.github.io/4dhumans/

  12. arXiv:2304.14396  [pdf, other

    cs.CV

    Learning Articulated Shape with Keypoint Pseudo-labels from Web Images

    Authors: Anastasis Stathopoulos, Georgios Pavlakos, Ligong Han, Dimitris Metaxas

    Abstract: This paper shows that it is possible to learn models for monocular 3D reconstruction of articulated objects (e.g., horses, cows, sheep), using as few as 50-150 images labeled with 2D keypoints. Our proposed approach involves training category-specific keypoint estimators, generating 2D keypoint pseudo-labels on unlabeled web images, and using both the labeled and self-labeled sets to train 3D reco… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Comments: CVPR 2023 (project page: https://statho.github.io/projects/animals3d/index.html)

  13. arXiv:2304.01199  [pdf, other

    cs.CV

    On the Benefits of 3D Pose and Tracking for Human Action Recognition

    Authors: Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Christoph Feichtenhofer, Jitendra Malik

    Abstract: In this work we study the benefits of using tracking and 3D poses for action recognition. To achieve this, we take the Lagrangian view on analysing actions over a trajectory of human motion rather than at a fixed point in space. Taking this stand allows us to use the tracklets of people to predict their actions. In this spirit, first we show the benefits of using 3D pose to infer actions, and stud… ▽ More

    Submitted 7 August, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: CVPR2023 (project page: https://brjathu.github.io/LART)

  14. arXiv:2302.12827  [pdf, other

    cs.CV

    Decoupling Human and Camera Motion from Videos in the Wild

    Authors: Vickie Ye, Georgios Pavlakos, Jitendra Malik, Angjoo Kanazawa

    Abstract: We propose a method to reconstruct global human trajectories from videos in the wild. Our optimization method decouples the camera and human motion, which allows us to place people in the same world coordinate frame. Most existing methods do not model the camera motion; methods that rely on the background pixels to infer 3D human motion usually require a full scene reconstruction, which is often n… ▽ More

    Submitted 20 March, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

    Comments: Project site: https://vye16.github.io/slahmr. CVPR 2023

  15. arXiv:2207.14279  [pdf, other

    cs.CV

    The One Where They Reconstructed 3D Humans and Environments in TV Shows

    Authors: Georgios Pavlakos, Ethan Weber, Matthew Tancik, Angjoo Kanazawa

    Abstract: TV shows depict a wide variety of human behaviors and have been studied extensively for their potential to be a rich source of data for many applications. However, the majority of the existing work focuses on 2D recognition tasks. In this paper, we make the observation that there is a certain persistence in TV shows, i.e., repetition of the environments and the humans, which makes possible the 3D… ▽ More

    Submitted 28 July, 2022; originally announced July 2022.

    Comments: ECCV 2022. Project page: http://ethanweber.me/sitcoms3D/

  16. Semantic keypoint-based pose estimation from single RGB frames

    Authors: Karl Schmeckpeper, Philip R. Osteen, Yufu Wang, Georgios Pavlakos, Kenneth Chaney, Wyatt Jordan, Xiaowei Zhou, Konstantinos G. Derpanis, Kostas Daniilidis

    Abstract: This paper presents an approach to estimating the continuous 6-DoF pose of an object from a single RGB image. The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model. Unlike prior investigators, we are agnostic to whether the object is textured or textureless, as the convnet learns the optimal representation from the available training-… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: https://sites.google.com/view/rcta-object-keypoints-dataset/home. arXiv admin note: substantial text overlap with arXiv:1703.04670

    Journal ref: Field Robotics, 2, 147-171, 2022

  17. arXiv:2112.04477  [pdf, other

    cs.CV

    Tracking People by Predicting 3D Appearance, Location & Pose

    Authors: Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik

    Abstract: In this paper, we present an approach for tracking people in monocular videos, by predicting their future 3D representations. To achieve this, we first lift people to 3D from a single frame in a robust way. This lifting includes information about the 3D pose of the person, his or her location in the 3D space, and the 3D appearance. As we track a person, we collect 3D observations over time in a tr… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

    Comments: Project Page : https://brjathu.github.io/PHALP/

  18. arXiv:2111.07868  [pdf, other

    cs.CV

    Tracking People with 3D Representations

    Authors: Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik

    Abstract: We present a novel approach for tracking multiple people in video. Unlike past approaches which employ 2D representations, we focus on using 3D representations of people, located in three-dimensional space. To this end, we develop a method, Human Mesh and Appearance Recovery (HMAR) which in addition to extracting the 3D geometry of the person as a SMPL mesh, also extracts appearance as a texture m… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

  19. arXiv:2108.11944  [pdf, other

    cs.CV

    Probabilistic Modeling for Human Mesh Recovery

    Authors: Nikos Kolotouros, Georgios Pavlakos, Dinesh Jayaraman, Kostas Daniilidis

    Abstract: This paper focuses on the problem of 3D human reconstruction from 2D evidence. Although this is an inherently ambiguous problem, the majority of recent works avoid the uncertainty modeling and typically regress a single estimate for a given input. In contrast to that, in this work, we propose to embrace the reconstruction ambiguity and we recast the problem as learning a map** from the input to… ▽ More

    Submitted 26 August, 2021; originally announced August 2021.

    Comments: ICCV 2021. Project page: https://www.seas.upenn.edu/~nkolot/projects/prohmr

  20. arXiv:2012.09843  [pdf, other

    cs.CV

    Human Mesh Recovery from Multiple Shots

    Authors: Georgios Pavlakos, Jitendra Malik, Angjoo Kanazawa

    Abstract: Videos from edited media like movies are a useful, yet under-explored source of information. The rich variety of appearance and interactions between humans depicted over a large temporal context in these films could be a valuable source of data. However, the richness of data comes at the expense of fundamental challenges such as abrupt shot changes and close up shots of actors with heavy truncatio… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

  21. arXiv:2012.05698  [pdf, other

    cs.CV

    Independent Sign Language Recognition with 3D Body, Hands, and Face Reconstruction

    Authors: Agelos Kratimenos, Georgios Pavlakos, Petros Maragos

    Abstract: Independent Sign Language Recognition is a complex visual recognition problem that combines several challenging tasks of Computer Vision due to the necessity to exploit and fuse information from hand gestures, body features and facial expressions. While many state-of-the-art works have managed to deeply elaborate on these features independently, to the best of our knowledge, no work has adequately… ▽ More

    Submitted 24 November, 2020; originally announced December 2020.

    Comments: Submitted to ICASSP 2021

  22. arXiv:2008.09062  [pdf, other

    cs.CV cs.GR

    Monocular Expressive Body Regression through Body-Driven Attention

    Authors: Vasileios Choutas, Georgios Pavlakos, Timo Bolkart, Dimitrios Tzionas, Michael J. Black

    Abstract: To understand how people look, interact, or perform tasks, we need to quickly and accurately capture their 3D body, face, and hands together from an RGB image. Most existing methods focus only on parts of the body. A few recent approaches reconstruct full expressive 3D humans from images using 3D body models that include the face and hands. These methods are optimization-based and thus slow, prone… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

    Comments: Accepted in ECCV'20. Project page: http://expose.is.tue.mpg.de

  23. arXiv:2006.08586  [pdf, other

    cs.CV

    Coherent Reconstruction of Multiple Humans from a Single Image

    Authors: Wen Jiang, Nikos Kolotouros, Georgios Pavlakos, Xiaowei Zhou, Kostas Daniilidis

    Abstract: In this work, we address the problem of multi-person 3D pose estimation from a single image. A typical regression approach in the top-down setting of this problem would first detect all humans and then reconstruct each one of them independently. However, this type of prediction suffers from incoherent results, e.g., interpenetration and inconsistent depth ordering between the people in the scene.… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

    Comments: CVPR 2020. Project Page: https://jiangwenpl.github.io/multiperson/

  24. arXiv:2002.12349  [pdf, other

    cs.RO

    Technical Report: Reactive Semantic Planning in Unexplored Semantic Environments Using Deep Perceptual Feedback

    Authors: Vasileios Vasilopoulos, Georgios Pavlakos, Sean L. Bowman, J. Diego Caporale, Kostas Daniilidis, George J. Pappas, Daniel E. Koditschek

    Abstract: This paper presents a reactive planning system that enriches the topological representation of an environment with a tightly integrated semantic representation, achieved by incorporating and exploiting advances in deep perceptual learning and probabilistic semantic reasoning. Our architecture combines object detection with semantic SLAM, affording robust, reactive logical as well as geometric plan… ▽ More

    Submitted 4 May, 2020; v1 submitted 25 February, 2020; originally announced February 2020.

    Comments: Technical Report accompanying the paper "Reactive Semantic Planning in Unexplored Semantic Environments Using Deep Perceptual Feedback" (12 pages, 8 figures) - Using definitions and equations from arxiv:2002.08946

  25. arXiv:2002.08946  [pdf, other

    cs.RO

    Reactive Navigation in Partially Familiar Planar Environments Using Semantic Perceptual Feedback

    Authors: Vasileios Vasilopoulos, Georgios Pavlakos, Karl Schmeckpeper, Kostas Daniilidis, Daniel E. Koditschek

    Abstract: This paper solves the planar navigation problem by recourse to an online reactive scheme that exploits recent advances in SLAM and visual object recognition to recast prior geometric knowledge in terms of an offline catalogue of familiar objects. The resulting vector field planner guarantees convergence to an arbitrarily specified goal, avoiding collisions along the way with fixed but arbitrarily… ▽ More

    Submitted 18 August, 2021; v1 submitted 20 February, 2020; originally announced February 2020.

    Comments: Preprint of paper in the International Journal of Robotics Research (76 pages, 23 figures) - Includes results used in arXiv:2002.12349

  26. arXiv:1910.11322  [pdf, other

    cs.CV

    TexturePose: Supervising Human Mesh Estimation with Texture Consistency

    Authors: Georgios Pavlakos, Nikos Kolotouros, Kostas Daniilidis

    Abstract: This work addresses the problem of model-based human pose estimation. Recent approaches have made significant progress towards regressing the parameters of parametric human body models directly from images. Because of the absence of images with 3D shape ground truth, relevant approaches rely on 2D annotations or sophisticated architecture designs. In this work, we advocate that there are more cues… ▽ More

    Submitted 24 October, 2019; originally announced October 2019.

  27. arXiv:1909.12828  [pdf, other

    cs.CV

    Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop

    Authors: Nikos Kolotouros, Georgios Pavlakos, Michael J. Black, Kostas Daniilidis

    Abstract: Model-based human pose estimation is currently approached through two different paradigms. Optimization-based methods fit a parametric body model to 2D observations in an iterative manner, leading to accurate image-model alignments, but are often slow and sensitive to the initialization. In contrast, regression-based methods, that use a deep network to directly estimate the model parameters from p… ▽ More

    Submitted 27 September, 2019; originally announced September 2019.

    Comments: To appear at ICCV 2019. Project page: https://seas.upenn.edu/~nkolot/projects/spin

  28. arXiv:1905.03244  [pdf, other

    cs.CV

    Convolutional Mesh Regression for Single-Image Human Shape Reconstruction

    Authors: Nikos Kolotouros, Georgios Pavlakos, Kostas Daniilidis

    Abstract: This paper addresses the problem of 3D human pose and shape estimation from a single image. Previous approaches consider a parametric model of the human body, SMPL, and attempt to regress the model parameters that give rise to a mesh consistent with image evidence. This parameter regression has been a very challenging task, with model-based approaches underperforming compared to nonparametric solu… ▽ More

    Submitted 8 May, 2019; originally announced May 2019.

    Comments: To appear at CVPR 2019 (Oral Presentation). Project page: https://www.seas.upenn.edu/~nkolot/projects/cmr/

  29. arXiv:1904.05866  [pdf, other

    cs.CV

    Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

    Authors: Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, Michael J. Black

    Abstract: To facilitate the analysis of human actions, interactions and emotions, we compute a 3D model of human body pose, hand pose, and facial expression from a single monocular image. To achieve this, we use thousands of 3D scans to train a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with fully articulated hands and an expressive face. Learning to regress the parameters of SMPL-X… ▽ More

    Submitted 11 April, 2019; originally announced April 2019.

    Comments: To appear in CVPR 2019

  30. arXiv:1805.04095  [pdf, other

    cs.CV

    Ordinal Depth Supervision for 3D Human Pose Estimation

    Authors: Georgios Pavlakos, Xiaowei Zhou, Kostas Daniilidis

    Abstract: Our ability to train end-to-end systems for 3D human pose estimation from single images is currently constrained by the limited availability of 3D annotations for natural images. Most datasets are captured using Motion Capture (MoCap) systems in a studio setting and it is difficult to reach the variability of 2D human pose datasets, like MPII or LSP. To alleviate the need for accurate 3D ground tr… ▽ More

    Submitted 10 May, 2018; originally announced May 2018.

    Comments: CVPR 2018 Camera Ready

  31. arXiv:1805.04092  [pdf, other

    cs.CV

    Learning to Estimate 3D Human Pose and Shape from a Single Color Image

    Authors: Georgios Pavlakos, Luyang Zhu, Xiaowei Zhou, Kostas Daniilidis

    Abstract: This work addresses the problem of estimating the full body 3D human pose and shape from a single color image. This is a task where iterative optimization-based solutions have typically prevailed, while Convolutional Networks (ConvNets) have suffered because of the lack of training data and their low resolution 3D predictions. Our work aims to bridge this gap and proposes an efficient and effectiv… ▽ More

    Submitted 10 May, 2018; originally announced May 2018.

    Comments: CVPR 2018 Camera Ready

  32. arXiv:1804.06112  [pdf, other

    cs.CV cs.RO

    Human Motion Capture Using a Drone

    Authors: Xiaowei Zhou, Sikang Liu, Georgios Pavlakos, Vijay Kumar, Kostas Daniilidis

    Abstract: Current motion capture (MoCap) systems generally require markers and multiple calibrated cameras, which can be used only in constrained environments. In this work we introduce a drone-based system for 3D human MoCap. The system only needs an autonomously flying drone with an on-board RGB camera and is usable in various indoor and outdoor environments. A reconstruction algorithm is developed to rec… ▽ More

    Submitted 17 April, 2018; originally announced April 2018.

    Comments: In International Conference on Robotics and Automation (ICRA) 2018

  33. arXiv:1704.04793  [pdf, other

    cs.CV

    Harvesting Multiple Views for Marker-less 3D Human Pose Annotations

    Authors: Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, Kostas Daniilidis

    Abstract: Recent advances with Convolutional Networks (ConvNets) have shifted the bottleneck for many computer vision tasks to annotated data collection. In this paper, we present a geometry-driven approach to automatically collect annotations for human pose prediction tasks. Starting from a generic ConvNet for 2D human pose, and assuming a multi-view setup, we describe an automatic way to collect accurate… ▽ More

    Submitted 16 April, 2017; originally announced April 2017.

    Comments: CVPR 2017 Camera Ready

  34. arXiv:1703.04670  [pdf, other

    cs.CV cs.RO

    6-DoF Object Pose from Semantic Keypoints

    Authors: Georgios Pavlakos, Xiaowei Zhou, Aaron Chan, Konstantinos G. Derpanis, Kostas Daniilidis

    Abstract: This paper presents a novel approach to estimating the continuous six degree of freedom (6-DoF) pose (3D translation and rotation) of an object from a single RGB image. The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model. Unlike prior work, we are agnostic to whether the object is textured or textureless, as the convnet learns the o… ▽ More

    Submitted 14 March, 2017; originally announced March 2017.

    Comments: IEEE International Conference on Robotics and Automation (ICRA), 2017

  35. arXiv:1701.02354  [pdf, other

    cs.CV

    MonoCap: Monocular Human Motion Capture using a CNN Coupled with a Geometric Prior

    Authors: Xiaowei Zhou, Menglong Zhu, Georgios Pavlakos, Spyridon Leonardos, Kostantinos G. Derpanis, Kostas Daniilidis

    Abstract: Recovering 3D full-body human pose is a challenging problem with many applications. It has been successfully addressed by motion capture systems with body worn markers and multiple cameras. In this paper, we address the more challenging case of not only using a single camera but also not leveraging markers: going directly from 2D appearance to 3D geometry. Deep learning approaches have shown remar… ▽ More

    Submitted 9 March, 2018; v1 submitted 9 January, 2017; originally announced January 2017.

    Comments: Accepted by PAMI. Extended version of the following paper: Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video. X Zhou, M Zhu, S Leonardos, K Derpanis, K Daniilidis. CVPR 2016. arXiv admin note: substantial text overlap with arXiv:1511.09439

  36. arXiv:1611.07828  [pdf, other

    cs.CV

    Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose

    Authors: Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, Kostas Daniilidis

    Abstract: This paper addresses the challenge of 3D human pose estimation from a single color image. Despite the general success of the end-to-end learning paradigm, top performing approaches employ a two-step solution consisting of a Convolutional Network (ConvNet) for 2D joint localization and a subsequent optimization step to recover 3D pose. In this paper, we identify the representation of 3D pose as a c… ▽ More

    Submitted 26 July, 2017; v1 submitted 23 November, 2016; originally announced November 2016.

    Comments: CVPR 2017 Camera Ready. Project Page: https://www.seas.upenn.edu/~pavlakos/projects/volumetric/