Skip to main content

Showing 1–42 of 42 results for author: Sheikh, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.02508  [pdf, other

    cs.CV cs.GR

    Rasterized Edge Gradients: Handling Discontinuities Differentiably

    Authors: Stanislav Pidhorskyi, Tomas Simon, Gabriel Schwartz, He Wen, Yaser Sheikh, Jason Saragih

    Abstract: Computing the gradients of a rendering process is paramount for diverse applications in computer vision and graphics. However, accurate computation of these gradients is challenging due to discontinuities and rendering approximations, particularly for surface-based representations and rasterization-based rendering. We present a novel method for computing gradients at visibility discontinuities for… ▽ More

    Submitted 16 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  2. arXiv:2401.05334  [pdf, other

    cs.CV cs.GR

    URHand: Universal Relightable Hands

    Authors: Zhaoxi Chen, Gyeongsik Moon, Kaiwen Guo, Chen Cao, Stanislav Pidhorskyi, Tomas Simon, Rohan Joshi, Yuan Dong, Yichen Xu, Bernardo Pires, He Wen, Lucas Evans, Bo Peng, Julia Buffalini, Autumn Trimble, Kevyn McPhail, Melissa Schoeller, Shoou-I Yu, Javier Romero, Michael Zollhöfer, Yaser Sheikh, Ziwei Liu, Shunsuke Saito

    Abstract: Existing photorealistic relightable hand models require extensive identity-specific observations in different views, poses, and illuminations, and face challenges in generalizing to natural illuminations and novel identities. To bridge this gap, we present URHand, the first universal relightable hand model that generalizes across viewpoints, poses, illuminations, and identities. Our model allows f… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: Project Page https://frozenburning.github.io/projects/urhand/

  3. arXiv:2302.04866  [pdf, other

    cs.CV cs.GR

    RelightableHands: Efficient Neural Relighting of Articulated Hand Models

    Authors: Shun Iwase, Shunsuke Saito, Tomas Simon, Stephen Lombardi, Timur Bagautdinov, Rohan Joshi, Fabian Prada, Takaaki Shiratori, Yaser Sheikh, Jason Saragih

    Abstract: We present the first neural relighting approach for rendering high-fidelity personalized hands that can be animated in real-time under novel illumination. Our approach adopts a teacher-student framework, where the teacher learns appearance under a single point light from images captured in a light-stage, allowing us to synthesize hands in arbitrary illuminations but with heavy compute. Using image… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

    Comments: 8 pages, 16 figures, Website: https://sh8.io/#/relightable_hands

  4. arXiv:2207.11243  [pdf, other

    cs.CV cs.GR

    Multiface: A Dataset for Neural Face Rendering

    Authors: Cheng-hsin Wuu, Ningyuan Zheng, Scott Ardisson, Rohan Bali, Danielle Belko, Eric Brockmeyer, Lucas Evans, Timothy Godisart, Hyowon Ha, Xuhua Huang, Alexander Hypes, Taylor Koska, Steven Krenn, Stephen Lombardi, Xiaomin Luo, Kevyn McPhail, Laura Millerschoen, Michal Perdoch, Mark Pitts, Alexander Richard, Jason Saragih, Junko Saragih, Takaaki Shiratori, Tomas Simon, Matt Stewart , et al. (6 additional authors not shown)

    Abstract: Photorealistic avatars of human faces have come a long way in recent years, yet research along this area is limited by a lack of publicly available, high-quality datasets covering both, dense multi-view camera captures, and rich facial expressions of the captured subjects. In this work, we present Multiface, a new multi-view, high-resolution human face dataset collected from 13 identities at Reali… ▽ More

    Submitted 26 June, 2023; v1 submitted 22 July, 2022; originally announced July 2022.

  5. arXiv:2207.09774  [pdf, other

    cs.CV

    Drivable Volumetric Avatars using Texel-Aligned Features

    Authors: Edoardo Remelli, Timur Bagautdinov, Shunsuke Saito, Tomas Simon, Chenglei Wu, Shih-En Wei, Kaiwen Guo, Zhe Cao, Fabian Prada, Jason Saragih, Yaser Sheikh

    Abstract: Photorealistic telepresence requires both high-fidelity body modeling and faithful driving to enable dynamically synthesized appearance that is indistinguishable from reality. In this work, we propose an end-to-end framework that addresses two core challenges in modeling and driving full-body avatars of real people. One challenge is driving an avatar while staying faithful to details and dynamics… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Journal ref: SIGGRAPH 2022 Conference Proceedings

  6. Dressing Avatars: Deep Photorealistic Appearance for Physically Simulated Clothing

    Authors: Donglai Xiang, Timur Bagautdinov, Tuur Stuyck, Fabian Prada, Javier Romero, Weipeng Xu, Shunsuke Saito, **gfan Guo, Breannan Smith, Takaaki Shiratori, Yaser Sheikh, Jessica Hodgins, Chenglei Wu

    Abstract: Despite recent progress in develo** animatable full-body avatars, realistic modeling of clothing - one of the core aspects of human self-expression - remains an open challenge. State-of-the-art physical simulation methods can generate realistically behaving clothing geometry at interactive rates. Modeling photorealistic appearance, however, usually requires physically-based rendering which is to… ▽ More

    Submitted 19 September, 2022; v1 submitted 30 June, 2022; originally announced June 2022.

    Comments: SIGGRAPH Asia 2022 (ACM ToG) camera ready. The supplementary video can be found on https://research.facebook.com/publications/dressing-avatars-deep-photorealistic-appearance-for-physically-simulated-clothing/

  7. arXiv:2206.03373  [pdf, other

    cs.CV

    Garment Avatars: Realistic Cloth Driving using Pattern Registration

    Authors: Oshri Halimi, Fabian Prada, Tuur Stuyck, Donglai Xiang, Timur Bagautdinov, He Wen, Ron Kimmel, Takaaki Shiratori, Chenglei Wu, Yaser Sheikh

    Abstract: Virtual telepresence is the future of online communication. Clothing is an essential part of a person's identity and self-expression. Yet, ground truth data of registered clothes is currently unavailable in the required resolution and accuracy for training telepresence models for realistic cloth animation. Here, we propose an end-to-end pipeline for building drivable representations for clothing.… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

  8. arXiv:2105.10441  [pdf, other

    cs.CV cs.AI cs.GR

    Driving-Signal Aware Full-Body Avatars

    Authors: Timur Bagautdinov, Chenglei Wu, Tomas Simon, Fabian Prada, Takaaki Shiratori, Shih-En Wei, Weipeng Xu, Yaser Sheikh, Jason Saragih

    Abstract: We present a learning-based method for building driving-signal aware full-body avatars. Our model is a conditional variational autoencoder that can be animated with incomplete driving signals, such as human pose and facial keypoints, and produces a high-quality representation of human geometry and view-dependent appearance. The core intuition behind our method is that better drivability and genera… ▽ More

    Submitted 25 June, 2021; v1 submitted 21 May, 2021; originally announced May 2021.

  9. arXiv:2104.08223  [pdf, other

    cs.CV

    MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement

    Authors: Alexander Richard, Michael Zollhoefer, Yandong Wen, Fernando de la Torre, Yaser Sheikh

    Abstract: This paper presents a generic method for generating full facial 3D animation from speech. Existing approaches to audio-driven facial animation exhibit uncanny or static upper face animation, fail to produce accurate and plausible co-articulation or rely on person-specific models that limit their scalability. To improve upon existing models, we propose a generic audio-driven facial animation approa… ▽ More

    Submitted 20 May, 2022; v1 submitted 16 April, 2021; originally announced April 2021.

    Comments: updated link to github repository and supplemental video

  10. arXiv:2104.04638  [pdf, other

    cs.CV

    Pixel Codec Avatars

    Authors: Shugao Ma, Tomas Simon, Jason Saragih, Dawei Wang, Yuecheng Li, Fernando De La Torre, Yaser Sheikh

    Abstract: Telecommunication with photorealistic avatars in virtual or augmented reality is a promising path for achieving authentic face-to-face communication in 3D over remote physical distances. In this work, we present the Pixel Codec Avatars (PiCA): a deep generative model of 3D human faces that achieves state of the art reconstruction performance while being computationally efficient and adaptive to th… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

    Comments: CVPR 2021 Oral

  11. arXiv:2103.15876  [pdf, other

    cs.CV eess.IV

    High-fidelity Face Tracking for AR/VR via Deep Lighting Adaptation

    Authors: Lele Chen, Chen Cao, Fernando De la Torre, Jason Saragih, Chenliang Xu, Yaser Sheikh

    Abstract: 3D video avatars can empower virtual communications by providing compression, privacy, entertainment, and a sense of presence in AR/VR. Best 3D photo-realistic AR/VR avatars driven by video, that can minimize uncanny effects, rely on person-specific models. However, existing person-specific photo-realistic 3D models are not robust to lighting, hence their results typically miss subtle facial behav… ▽ More

    Submitted 29 March, 2021; originally announced March 2021.

    Comments: The paper is accepted to CVPR 2021

  12. arXiv:2103.11357  [pdf, other

    stat.ME cs.AI cs.LG stat.ML

    Deep ROC Analysis and AUC as Balanced Average Accuracy to Improve Model Selection, Understanding and Interpretation

    Authors: André M. Carrington, Douglas G. Manuel, Paul W. Fieguth, Tim Ramsay, Venet Osmani, Bernhard Wernly, Carol Bennett, Steven Hawken, Matthew McInnes, Olivia Magwood, Yusuf Sheikh, Andreas Holzinger

    Abstract: Optimal performance is critical for decision-making tasks from medicine to autonomous driving, however common performance measures may be too general or too specific. For binary classifiers, diagnostic tests or prognosis at a timepoint, measures such as the area under the receiver operating characteristic curve, or the area under the precision recall curve, are too general because they include unr… ▽ More

    Submitted 21 March, 2021; originally announced March 2021.

    Comments: 14 pages, 6 Figures, submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), currently under review

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence 2022

  13. arXiv:2103.01954  [pdf, other

    cs.GR cs.CV

    Mixture of Volumetric Primitives for Efficient Neural Rendering

    Authors: Stephen Lombardi, Tomas Simon, Gabriel Schwartz, Michael Zollhoefer, Yaser Sheikh, Jason Saragih

    Abstract: Real-time rendering and animation of humans is a core function in games, movies, and telepresence applications. Existing methods have a number of drawbacks we aim to address with our work. Triangle meshes have difficulty modeling thin structures like hair, volumetric representations like Neural Volumes are too low-resolution given a reasonable memory budget, and high-resolution implicit representa… ▽ More

    Submitted 6 May, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

    Comments: 13 pages; SIGGRAPH 2021

  14. Supervision by Registration and Triangulation for Landmark Detection

    Authors: Xuanyi Dong, Yi Yang, Shih-En Wei, Xinshuo Weng, Yaser Sheikh, Shoou-I Yu

    Abstract: We present Supervision by Registration and Triangulation (SRT), an unsupervised approach that utilizes unlabeled multi-view video to improve the accuracy and precision of landmark detectors. Being able to utilize unlabeled data enables our detectors to learn from massive amounts of unlabeled data freely available and not be limited by the quality and quantity of manual human annotations. To utiliz… ▽ More

    Submitted 24 January, 2021; originally announced January 2021.

    Comments: Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020

  15. arXiv:2008.11789  [pdf, other

    cs.CV

    Expressive Telepresence via Modular Codec Avatars

    Authors: Hang Chu, Shugao Ma, Fernando De la Torre, Sanja Fidler, Yaser Sheikh

    Abstract: VR telepresence consists of interacting with another human in a virtual space represented by an avatar. Today most avatars are cartoon-like, but soon the technology will allow video-realistic ones. This paper aims in this direction and presents Modular Codec Avatars (MCA), a method to generate hyper-realistic faces driven by the cameras in the VR headset. MCA extends traditional Codec Avatars (CA)… ▽ More

    Submitted 26 August, 2020; originally announced August 2020.

    Comments: ECCV 2020

  16. arXiv:2008.05023  [pdf, other

    cs.CV

    Audio- and Gaze-driven Facial Animation of Codec Avatars

    Authors: Alexander Richard, Colin Lea, Shugao Ma, Juergen Gall, Fernando de la Torre, Yaser Sheikh

    Abstract: Codec Avatars are a recent class of learned, photorealistic face models that accurately represent the geometry and texture of a person in 3D (i.e., for virtual reality), and are almost indistinguishable from video. In this paper we describe the first approach to animate these parametric models in real-time which could be deployed on commodity virtual reality hardware using audio and/or eye trackin… ▽ More

    Submitted 11 August, 2020; originally announced August 2020.

  17. arXiv:2007.12806  [pdf, other

    cs.CV

    Spatiotemporal Bundle Adjustment for Dynamic 3D Human Reconstruction in the Wild

    Authors: Minh Vo, Yaser Sheikh, Srinivasa G. Narasimhan

    Abstract: Bundle adjustment jointly optimizes camera intrinsics and extrinsics and 3D point triangulation to reconstruct a static scene. The triangulation constraint, however, is invalid for moving points captured in multiple unsynchronized videos and bundle adjustment is not designed to estimate the temporal alignment between cameras. We present a spatiotemporal bundle adjustment framework that jointly opt… ▽ More

    Submitted 24 July, 2020; originally announced July 2020.

    Comments: Accepted to IEEE TPAMI

  18. arXiv:2006.04325  [pdf, other

    cs.CV

    Fully Convolutional Mesh Autoencoder using Efficient Spatially Varying Kernels

    Authors: Yi Zhou, Chenglei Wu, Zimo Li, Chen Cao, Yuting Ye, Jason Saragih, Hao Li, Yaser Sheikh

    Abstract: Learning latent representations of registered meshes is useful for many 3D tasks. Techniques have recently shifted to neural mesh autoencoders. Although they demonstrate higher precision than traditional methods, they remain unable to capture fine-grained deformations. Furthermore, these methods can only be applied to a template-specific surface mesh, and is not applicable to more general meshes,… ▽ More

    Submitted 21 October, 2020; v1 submitted 7 June, 2020; originally announced June 2020.

    Comments: 12 pages

  19. arXiv:2005.13532  [pdf, other

    cs.CV cs.GR

    4D Visualization of Dynamic Events from Unconstrained Multi-View Videos

    Authors: Aayush Bansal, Minh Vo, Yaser Sheikh, Deva Ramanan, Srinivasa Narasimhan

    Abstract: We present a data-driven approach for 4D space-time visualization of dynamic events from videos captured by hand-held multiple cameras. Key to our approach is the use of self-supervised neural networks specific to the scene to compose static and dynamic aspects of an event. Though captured from discrete viewpoints, this model enables us to move around the space-time of the event continuously. This… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: Project Page - http://www.cs.cmu.edu/~aayushb/Open4D/

  20. arXiv:1910.02181  [pdf, other

    cs.CV cs.AI

    To React or not to React: End-to-End Visual Pose Forecasting for Personalized Avatar during Dyadic Conversations

    Authors: Chaitanya Ahuja, Shugao Ma, Louis-Philippe Morency, Yaser Sheikh

    Abstract: Non verbal behaviours such as gestures, facial expressions, body posture, and para-linguistic cues have been shown to complement or clarify verbal messages. Hence to improve telepresence, in form of an avatar, it is important to model these behaviours, especially in dyadic interactions. Creating such personalized avatars not only requires to model intrapersonal dynamics between a avatar's speech a… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

  21. arXiv:1909.13423  [pdf, other

    cs.CV cs.LG

    Single-Network Whole-Body Pose Estimation

    Authors: Gines Hidalgo, Yaadhav Raaj, Haroon Idrees, Donglai Xiang, Hanbyul Joo, Tomas Simon, Yaser Sheikh

    Abstract: We present the first single-network approach for 2D~whole-body pose estimation, which entails simultaneous localization of body, face, hands, and feet keypoints. Due to the bottom-up formulation, our method maintains constant real-time performance regardless of the number of people in the image. The network is trained in a single stage using multi-task learning, through an improved architecture wh… ▽ More

    Submitted 29 September, 2019; originally announced September 2019.

    Comments: ICCV 2019

  22. Neural Volumes: Learning Dynamic Renderable Volumes from Images

    Authors: Stephen Lombardi, Tomas Simon, Jason Saragih, Gabriel Schwartz, Andreas Lehrmann, Yaser Sheikh

    Abstract: Modeling and rendering of dynamic scenes is challenging, as natural scenes often contain complex phenomena such as thin structures, evolving topology, translucency, scattering, occlusion, and biological motion. Mesh-based reconstruction and tracking often fail in these cases, and other approaches (e.g., light field video) typically rely on constrained viewing conditions, which limit interactivity.… ▽ More

    Submitted 18 June, 2019; originally announced June 2019.

    Comments: Accepted to SIGGRAPH 2019

    Journal ref: ACM Transactions on Graphics (SIGGRAPH 2019) 38, 4, Article 65

  23. arXiv:1906.04728  [pdf, other

    cs.CV

    Shapes and Context: In-the-Wild Image Synthesis & Manipulation

    Authors: Aayush Bansal, Yaser Sheikh, Deva Ramanan

    Abstract: We introduce a data-driven approach for interactively synthesizing in-the-wild images from semantic label maps. Our approach is dramatically different from recent work in this space, in that we make use of no learning. Instead, our approach uses simple but classic tools for matching scene context, shapes, and parts to a stored library of exemplars. Though simple, this approach has several notable… ▽ More

    Submitted 11 June, 2019; originally announced June 2019.

    Comments: Project Page: http://www.cs.cmu.edu/~aayushb/OpenShapes/

    Journal ref: CVPR 2019

  24. arXiv:1906.04158  [pdf, other

    cs.CV cs.AI

    Towards Social Artificial Intelligence: Nonverbal Social Signal Prediction in A Triadic Interaction

    Authors: Hanbyul Joo, Tomas Simon, Mina Cikara, Yaser Sheikh

    Abstract: We present a new research task and a dataset to understand human social interactions via computational methods, to ultimately endow machines with the ability to encode and decode a broad channel of social signals humans use. This research direction is essential to make a machine that genuinely communicates with humans, which we call Social Artificial Intelligence. We first formulate the "social si… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

    Comments: CVPR 2019

  25. arXiv:1904.10037  [pdf, other

    cs.CV cs.LG

    LBS Autoencoder: Self-supervised Fitting of Articulated Meshes to Point Clouds

    Authors: Chun-Liang Li, Tomas Simon, Jason Saragih, Barnabás Póczos, Yaser Sheikh

    Abstract: We present LBS-AE; a self-supervised autoencoding algorithm for fitting articulated mesh models to point clouds. As input, we take a sequence of point clouds to be registered as well as an artist-rigged mesh, i.e. a template mesh equipped with a linear-blend skinning (LBS) deformation space parameterized by a skeleton hierarchy. As output, we learn an LBS-based autoencoder that produces registered… ▽ More

    Submitted 22 April, 2019; originally announced April 2019.

    Comments: In the Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019)

  26. arXiv:1812.08008  [pdf, other

    cs.CV

    OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

    Authors: Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, Yaser Sheikh

    Abstract: Realtime multi-person 2D pose estimation is a key component in enabling machines to have an understanding of people in images and videos. In this work, we present a realtime approach to detect the 2D pose of multiple people in an image. The proposed method uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the… ▽ More

    Submitted 30 May, 2019; v1 submitted 18 December, 2018; originally announced December 2018.

    Comments: Journal version of arXiv:1611.08050, with better accuracy and faster speed, release a new foot keypoint dataset: https://cmu-perceptual-computing-lab.github.io/foot_keypoint_dataset/

  27. arXiv:1812.01783   

    cs.CV

    Capture Dense: Markerless Motion Capture Meets Dense Pose Estimation

    Authors: Xiu Li, Yebin Liu, Hanbyul Joo, Qionghai Dai, Yaser Sheikh

    Abstract: We present a method to combine markerless motion capture and dense pose feature estimation into a single framework. We demonstrate that dense pose information can help for multiview/single-view motion capture, and multiview motion capture can help the collection of a high-quality dataset for training the dense pose detector. Specifically, we first introduce a novel markerless motion capture method… ▽ More

    Submitted 10 December, 2018; v1 submitted 4 December, 2018; originally announced December 2018.

    Comments: Withdraw due to incomplete experiment

  28. arXiv:1812.01598  [pdf, other

    cs.CV cs.GR

    Monocular Total Capture: Posing Face, Body, and Hands in the Wild

    Authors: Donglai Xiang, Hanbyul Joo, Yaser Sheikh

    Abstract: We present the first method to capture the 3D total motion of a target person from a monocular view input. Given an image or a monocular video, our method reconstructs the motion from body, face, and fingers represented by a 3D deformable mesh model. We use an efficient representation called 3D Part Orientation Fields (POFs), to encode the 3D orientations of all body parts in the common 2D image s… ▽ More

    Submitted 4 December, 2018; originally announced December 2018.

    Comments: 17 pages, 16 figures

  29. arXiv:1811.11975  [pdf, other

    cs.CV

    Efficient Online Multi-Person 2D Pose Tracking with Recurrent Spatio-Temporal Affinity Fields

    Authors: Yaadhav Raaj, Haroon Idrees, Gines Hidalgo, Yaser Sheikh

    Abstract: We present an online approach to efficiently and simultaneously detect and track the 2D pose of multiple people in a video sequence. We build upon Part Affinity Field (PAF) representation designed for static images, and propose an architecture that can encode and predict Spatio-Temporal Affinity Fields (STAF) across a video sequence. In particular, we propose a novel temporal topology cross-linked… ▽ More

    Submitted 12 June, 2019; v1 submitted 29 November, 2018; originally announced November 2018.

  30. arXiv:1808.05174  [pdf, other

    cs.CV cs.GR cs.LG

    Recycle-GAN: Unsupervised Video Retargeting

    Authors: Aayush Bansal, Shugao Ma, Deva Ramanan, Yaser Sheikh

    Abstract: We introduce a data-driven approach for unsupervised video retargeting that translates content from one domain to another while preserving the style native to a domain, i.e., if contents of John Oliver's speech were to be transferred to Stephen Colbert, then the generated content/speech should be in Stephen Colbert's style. Our approach combines both spatial and temporal information along with adv… ▽ More

    Submitted 15 August, 2018; originally announced August 2018.

    Comments: ECCV 2018; Please refer to project webpage for videos - http://www.cs.cmu.edu/~aayushb/Recycle-GAN

  31. Deep Appearance Models for Face Rendering

    Authors: Stephen Lombardi, Jason Saragih, Tomas Simon, Yaser Sheikh

    Abstract: We introduce a deep appearance model for rendering the human face. Inspired by Active Appearance Models, we develop a data-driven rendering pipeline that learns a joint representation of facial geometry and appearance from a multiview capture setup. Vertex positions and view-specific textures are modeled using a deep variational autoencoder that captures complex nonlinear effects while producing a… ▽ More

    Submitted 1 August, 2018; originally announced August 2018.

    Comments: Accepted to SIGGRAPH 2018

    Journal ref: ACM Transactions on Graphics (SIGGRAPH 2018) 37, 4, Article 68

  32. arXiv:1807.00966  [pdf, other

    cs.CV

    Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors

    Authors: Xuanyi Dong, Shoou-I Yu, Xinshuo Weng, Shih-En Wei, Yi Yang, Yaser Sheikh

    Abstract: In this paper, we present supervision-by-registration, an unsupervised approach to improve the precision of facial landmark detectors on both images and video. Our key observation is that the detections of the same landmark in adjacent frames should be coherent with registration, i.e., optical flow. Interestingly, the coherency of optical flow is a source of supervision that does not require manua… ▽ More

    Submitted 4 July, 2018; v1 submitted 2 July, 2018; originally announced July 2018.

    Comments: Minor modifications to the CVPR 2018 version (add missing references)

  33. Self-supervised Multi-view Person Association and Its Applications

    Authors: Minh Vo, Ersin Yumer, Kalyan Sunkavalli, Sunil Hadap, Yaser Sheikh, Srinivasa Narasimhan

    Abstract: Reliable markerless motion tracking of people participating in a complex group activity from multiple moving cameras is challenging due to frequent occlusions, strong viewpoint and appearance variations, and asynchronous video streams. To solve this problem, reliable association of the same person across distant viewpoints and temporal instances is essential. We present a self-supervised framework… ▽ More

    Submitted 18 April, 2020; v1 submitted 22 May, 2018; originally announced May 2018.

    Comments: Accepted to IEEE TPAMI

  34. arXiv:1804.06510  [pdf, other

    cs.CV

    Structure from Recurrent Motion: From Rigidity to Recurrency

    Authors: Xiu Li, Hongdong Li, Hanbyul Joo, Yebin Liu, Yaser Sheikh

    Abstract: This paper proposes a new method for Non-Rigid Structure-from-Motion (NRSfM) from a long monocular video sequence observing a non-rigid object performing recurrent and possibly repetitive dynamic action. Departing from the traditional idea of using linear low-order or lowrank shape model for the task of NRSfM, our method exploits the property of shape recurrency (i.e., many deforming shapes tend t… ▽ More

    Submitted 17 April, 2018; originally announced April 2018.

    Comments: To appear in CVPR 2018

  35. arXiv:1801.01615  [pdf, other

    cs.CV

    Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies

    Authors: Hanbyul Joo, Tomas Simon, Yaser Sheikh

    Abstract: We present a unified deformation model for the markerless capture of multiple scales of human movement, including facial expressions, body motion, and hand gestures. An initial model is generated by locally stitching together models of the individual parts of the human body, which we refer to as the "Frankenstein" model. This model enables the full expression of part movements, including face and… ▽ More

    Submitted 4 January, 2018; originally announced January 2018.

  36. arXiv:1708.05349  [pdf, other

    cs.CV cs.GR cs.LG

    PixelNN: Example-based Image Synthesis

    Authors: Aayush Bansal, Yaser Sheikh, Deva Ramanan

    Abstract: We present a simple nearest-neighbor (NN) approach that synthesizes high-frequency photorealistic images from an "incomplete" signal such as a low-resolution image, a surface normal map, or edges. Current state-of-the-art deep generative models designed for such conditional image synthesis lack two important things: (1) they are unable to generate a large set of diverse outputs, due to the mode co… ▽ More

    Submitted 17 August, 2017; originally announced August 2017.

    Comments: Project Page: http://www.cs.cmu.edu/~aayushb/pixelNN/

  37. arXiv:1704.07809  [pdf, other

    cs.CV

    Hand Keypoint Detection in Single Images using Multiview Bootstrap**

    Authors: Tomas Simon, Hanbyul Joo, Iain Matthews, Yaser Sheikh

    Abstract: We present an approach that uses a multi-camera system to train fine-grained detectors for keypoints that are prone to occlusion, such as the joints of a hand. We call this procedure multiview bootstrap**: first, an initial keypoint detector is used to produce noisy labels in multiple views of the hand. The noisy detections are then triangulated in 3D using multiview geometry or marked as outlie… ▽ More

    Submitted 25 April, 2017; originally announced April 2017.

    Comments: CVPR 2017

  38. arXiv:1612.03153  [pdf, other

    cs.CV

    Panoptic Studio: A Massively Multiview System for Social Interaction Capture

    Authors: Hanbyul Joo, Tomas Simon, Xulong Li, Hao Liu, Lei Tan, Lin Gui, Sean Banerjee, Timothy Godisart, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, Yaser Sheikh

    Abstract: We present an approach to capture the 3D motion of a group of people engaged in a social interaction. The core challenges in capturing social interactions are: (1) occlusion is functional and frequent; (2) subtle motion needs to be measured over a space large enough to host a social group; (3) human appearance and configuration variation is immense; and (4) attaching markers to the body may prime… ▽ More

    Submitted 9 December, 2016; originally announced December 2016.

    Comments: Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence

  39. arXiv:1611.08050  [pdf, other

    cs.CV

    Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

    Authors: Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh

    Abstract: We present an approach to efficiently detect the 2D pose of multiple people in an image. The approach uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. The architecture encodes global context, allowing a greedy bottom-up parsing step that maintains high accuracy while achieving realtime performance… ▽ More

    Submitted 13 April, 2017; v1 submitted 23 November, 2016; originally announced November 2016.

    Comments: Accepted as CVPR 2017 Oral. Video result: https://youtu.be/pW6nZXeWlGM

  40. arXiv:1604.03130  [pdf

    cs.CY

    Video Analysis for Body-worn Cameras in Law Enforcement

    Authors: Jason J. Corso, Alexandre Alahi, Kristen Grauman, Gregory D. Hager, Louis-Philippe Morency, Harpreet Sawhney, Yaser Sheikh

    Abstract: The social conventions and expectations around the appropriate use of imaging and video has been transformed by the availability of video cameras in our pockets. The impact on law enforcement can easily be seen by watching the nightly news; more and more arrests, interventions, or even routine stops are being caught on cell phones or surveillance video, with both positive and negative consequences… ▽ More

    Submitted 7 May, 2018; v1 submitted 11 April, 2016; originally announced April 2016.

    Comments: A Computing Community Consortium (CCC) white paper, 9 pages

  41. arXiv:1603.08152  [pdf, other

    cs.CV

    How useful is photo-realistic rendering for visual learning?

    Authors: Yair Movshovitz-Attias, Takeo Kanade, Yaser Sheikh

    Abstract: Data seems cheap to get, and in many ways it is, but the process of creating a high quality labeled dataset from a mass of data is time-consuming and expensive. With the advent of rich 3D repositories, photo-realistic rendering systems offer the opportunity to provide nearly limitless data. Yet, their primary value for visual learning may be the quality of the data they can provide rather than t… ▽ More

    Submitted 7 September, 2016; v1 submitted 26 March, 2016; originally announced March 2016.

    Comments: Published in GMDL 2016 In conjunction with ECCV 2016

  42. arXiv:1602.00134  [pdf, other

    cs.CV

    Convolutional Pose Machines

    Authors: Shih-En Wei, Varun Ramakrishna, Takeo Kanade, Yaser Sheikh

    Abstract: Pose Machines provide a sequential prediction framework for learning rich implicit spatial models. In this work we show a systematic design for how convolutional networks can be incorporated into the pose machine framework for learning image features and image-dependent spatial models for the task of pose estimation. The contribution of this paper is to implicitly model long-range dependencies bet… ▽ More

    Submitted 11 April, 2016; v1 submitted 30 January, 2016; originally announced February 2016.

    Comments: camera ready