Skip to main content

Showing 1–20 of 20 results for author: Keskin, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.18080  [pdf, other

    cs.CV

    EgoPoseFormer: A Simple Baseline for Egocentric 3D Human Pose Estimation

    Authors: Chenhongyi Yang, Anastasia Tkach, Shreyas Hampali, Linguang Zhang, Elliot J. Crowley, Cem Keskin

    Abstract: We present EgoPoseFormer, a simple yet effective transformer-based model for stereo egocentric human pose estimation. The main challenge in egocentric pose estimation is overcoming joint invisibility, which is caused by self-occlusion or a limited field of view (FOV) of head-mounted cameras. Our approach overcomes this challenge by incorporating a two-stage pose estimation paradigm: in the first s… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Tech Report

  2. arXiv:2311.18809  [pdf, other

    cs.CV cs.RO

    FoundPose: Unseen Object Pose Estimation with Foundation Features

    Authors: Evin Pınar Örnek, Yann Labbé, Bugra Tekin, Lingni Ma, Cem Keskin, Christian Forster, Tomas Hodan

    Abstract: We propose FoundPose, a method for 6D pose estimation of unseen rigid objects from a single RGB image. The method assumes that 3D models of the objects are available but does not require any object-specific training. This is achieved by building upon DINOv2, a recent vision foundation model with impressive generalization capabilities. An online pose estimation stage is supported by a minimal objec… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  3. arXiv:2304.12301  [pdf, other

    cs.CV

    AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation

    Authors: Takehiko Ohkawa, Kun He, Fadime Sener, Tomas Hodan, Luan Tran, Cem Keskin

    Abstract: We present AssemblyHands, a large-scale benchmark dataset with accurate 3D hand pose annotations, to facilitate the study of egocentric activities with challenging hand-object interactions. The dataset includes synchronized egocentric and exocentric images sampled from the recent Assembly101 dataset, in which participants assemble and disassemble take-apart toys. To obtain high-quality 3D hand pos… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

    Comments: CVPR 2023. Project page: https://assemblyhands.github.io/

  4. arXiv:2211.16193  [pdf, other

    cs.CV

    In-Hand 3D Object Scanning from an RGB Sequence

    Authors: Shreyas Hampali, Tomas Hodan, Luan Tran, Lingni Ma, Cem Keskin, Vincent Lepetit

    Abstract: We propose a method for in-hand 3D scanning of an unknown object with a monocular camera. Our method relies on a neural implicit surface representation that captures both the geometry and the appearance of the object, however, by contrast with most NeRF-based methods, we do not assume that the camera-object relative poses are known. Instead, we simultaneously optimize both the object shape and the… ▽ More

    Submitted 22 June, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: CVPR 2023

  5. UmeTrack: Unified multi-view end-to-end hand tracking for VR

    Authors: Shangchen Han, Po-chen Wu, Yubo Zhang, Beibei Liu, Linguang Zhang, Zheng Wang, Weiguang Si, Peizhao Zhang, Yujun Cai, Tomas Hodan, Randi Cabezas, Luan Tran, Muzaffer Akbay, Tsz-Ho Yu, Cem Keskin, Robert Wang

    Abstract: Real-time tracking of 3D hand pose in world space is a challenging problem and plays an important role in VR interaction. Existing work in this space are limited to either producing root-relative (versus world space) 3D pose or rely on multiple stages such as generating heatmaps and kinematic optimization to obtain 3D pose. Moreover, the typical VR scenario, which involves multi-view tracking from… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Comments: SIGGRAPH Asia 2022 Conference Papers, 8 pages

  6. arXiv:2210.09887  [pdf, other

    cs.CV cs.LG

    MotionDeltaCNN: Sparse CNN Inference of Frame Differences in Moving Camera Videos

    Authors: Mathias Parger, Chengcheng Tang, Thomas Neff, Christopher D. Twigg, Cem Keskin, Robert Wang, Markus Steinberger

    Abstract: Convolutional neural network inference on video input is computationally expensive and requires high memory bandwidth. Recently, DeltaCNN managed to reduce the cost by only processing pixels with significant updates over the previous frame. However, DeltaCNN relies on static camera input. Moving cameras add new challenges in how to fuse newly unveiled image regions with already processed regions e… ▽ More

    Submitted 14 August, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

  7. arXiv:2208.00113  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Neural Correspondence Field for Object Pose Estimation

    Authors: Lin Huang, Tomas Hodan, Lingni Ma, Linguang Zhang, Luan Tran, Christopher Twigg, Po-Chen Wu, Junsong Yuan, Cem Keskin, Robert Wang

    Abstract: We propose a method for estimating the 6DoF pose of a rigid object with an available 3D model from a single RGB image. Unlike classical correspondence-based methods which predict 3D object coordinates at pixels of the input image, the proposed method predicts 3D object coordinates at 3D query points sampled in the camera frustum. The move from pixels to 3D points, which is inspired by recent PIFu-… ▽ More

    Submitted 29 July, 2022; originally announced August 2022.

    Comments: Accepted to ECCV 2022

  8. arXiv:2203.03996  [pdf, other

    cs.CV cs.LG

    DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos

    Authors: Mathias Parger, Chengcheng Tang, Christopher D. Twigg, Cem Keskin, Robert Wang, Markus Steinberger

    Abstract: Convolutional neural network inference on video data requires powerful hardware for real-time processing. Given the inherent coherence across consecutive frames, large parts of a video typically change little. By skip** identical image regions and truncating insignificant pixel updates, computational redundancy can in theory be reduced significantly. However, these theoretical savings have been… ▽ More

    Submitted 2 September, 2023; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  9. arXiv:2112.13709  [pdf, other

    cs.CV

    Rethinking the Data Annotation Process for Multi-view 3D Pose Estimation with Active Learning and Self-Training

    Authors: Qi Feng, Kun He, He Wen, Cem Keskin, Yuting Ye

    Abstract: Pose estimation of the human body and hands is a fundamental problem in computer vision, and learning-based solutions require a large amount of annotated data. In this work, we improve the efficiency of the data annotation process for 3D pose estimation problems with Active Learning (AL) in a multi-view setting. AL selects examples with the highest value to annotate under limited annotation budget… ▽ More

    Submitted 17 January, 2023; v1 submitted 27 December, 2021; originally announced December 2021.

    Comments: IEEE WACV 2023 algorithms track. Code: https://github.com/facebookresearch/multi_view_active_learning

  10. arXiv:2109.05591  [pdf, other

    cs.CV

    Multiresolution Deep Implicit Functions for 3D Shape Representation

    Authors: Zhang Chen, Yinda Zhang, Kyle Genova, Sean Fanello, Sofien Bouaziz, Christian Haene, Ruofei Du, Cem Keskin, Thomas Funkhouser, Danhang Tang

    Abstract: We introduce Multiresolution Deep Implicit Functions (MDIF), a hierarchical representation that can recover fine geometry detail, while being able to perform global operations such as shape completion. Our model represents a complex 3D shape with a hierarchy of latent grids, which can be decoded into different levels of detail and also achieve better accuracy. For shape completion, we propose late… ▽ More

    Submitted 16 September, 2021; v1 submitted 12 September, 2021; originally announced September 2021.

    Comments: 8 pages of main paper, 10 pages of supplementary. Accepted by ICCV'21

  11. arXiv:2103.15573  [pdf, other

    cs.CV

    HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences

    Authors: Feitong Tan, Danhang Tang, Mingsong Dou, Kaiwen Guo, Rohit Pandey, Cem Keskin, Ruofei Du, Deqing Sun, Sofien Bouaziz, Sean Fanello, ** Tan, Yinda Zhang

    Abstract: In this paper, we address the problem of building dense correspondences between human images under arbitrary camera viewpoints and body poses. Prior art either assumes small motion between frames or relies on local descriptors, which cannot handle large motion or visually ambiguous body parts, e.g., left vs. right hand. In contrast, we propose a deep learning framework that maps each pixel to a fe… ▽ More

    Submitted 29 March, 2021; originally announced March 2021.

  12. arXiv:2102.04464  [pdf

    cs.ET physics.optics

    Free-space optical neural network based on thermal atomic nonlinearity

    Authors: Albert Ryou, James Whitehead, Maksym Zhelyeznyakov, Paul Anderson, Cem Keskin, Michal Bajcsy, Arka Majumdar

    Abstract: As artificial neural networks (ANNs) continue to make strides in wide-ranging and diverse fields of technology, the search for more efficient hardware implementations beyond conventional electronics is gaining traction. In particular, optical implementations potentially offer extraordinary gains in terms of speed and reduced energy consumption due to intrinsic parallelism of free-space optics. At… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

  13. arXiv:2010.03533  [pdf, other

    cs.LG cs.CV

    Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

    Authors: Utku Evci, Yani A. Ioannou, Cem Keskin, Yann Dauphin

    Abstract: Sparse Neural Networks (NNs) can match the generalization of dense NNs using a fraction of the compute/storage for inference, and also have the potential to enable efficient training. However, naively training unstructured sparse NNs from random initialization results in significantly worse generalization, with the notable exceptions of Lottery Tickets (LTs) and Dynamic Sparse Training (DST). Thro… ▽ More

    Submitted 15 March, 2022; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: Published in AAAI 2022. Code can be found at https://github.com/google-research/rigl/tree/master/rigl/rigl_tf2

    MSC Class: 68T07

  14. arXiv:2005.08877  [pdf, other

    eess.IV cs.CV cs.LG

    Deep Implicit Volume Compression

    Authors: Danhang Tang, Saurabh Singh, Philip A. Chou, Christian Haene, Mingsong Dou, Sean Fanello, Jonathan Taylor, Philip Davidson, Onur G. Guleryuz, Yinda Zhang, Shahram Izadi, Andrea Tagliasacchi, Sofien Bouaziz, Cem Keskin

    Abstract: We describe a novel approach for compressing truncated signed distance fields (TSDF) stored in 3D voxel grids, and their corresponding textures. To compress the TSDF, our method relies on a block-based neural network architecture trained end-to-end, achieving state-of-the-art rate-distortion trade-off. To prevent topological errors, we losslessly compress the signs of the TSDF, which also upper bo… ▽ More

    Submitted 18 May, 2020; originally announced May 2020.

    Comments: Danhang Tang and Saurabh Singh have equal contribution

  15. arXiv:2002.03933  [pdf, other

    cs.CV

    RePose: Learning Deep Kinematic Priors for Fast Human Pose Estimation

    Authors: Hossam Isack, Christian Haene, Cem Keskin, Sofien Bouaziz, Yuri Boykov, Shahram Izadi, Sameh Khamis

    Abstract: We propose a novel efficient and lightweight model for human pose estimation from a single image. Our model is designed to achieve competitive results at a fraction of the number of parameters and computational cost of various state-of-the-art methods. To this end, we explicitly incorporate part-based structural and geometric priors in a hierarchical prediction framework. At the coarsest resolutio… ▽ More

    Submitted 10 February, 2020; originally announced February 2020.

  16. arXiv:1905.12162  [pdf, other

    cs.CV

    Volumetric Capture of Humans with a Single RGBD Camera via Semi-Parametric Learning

    Authors: Rohit Pandey, Anastasia Tkach, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Ricardo Martin-Brualla, Andrea Tagliasacchi, George Papandreou, Philip Davidson, Cem Keskin, Shahram Izadi, Sean Fanello

    Abstract: Volumetric (4D) performance capture is fundamental for AR/VR content generation. Whereas previous work in 4D performance capture has shown impressive results in studio settings, the technology is still far from being accessible to a typical consumer who, at best, might own a single RGBD sensor. Thus, in this work, we propose a method to synthesize free viewpoint renderings using a single RGBD came… ▽ More

    Submitted 28 May, 2019; originally announced May 2019.

  17. arXiv:1811.05029  [pdf, other

    cs.CV

    LookinGood: Enhancing Performance Capture with Real-time Neural Re-Rendering

    Authors: Ricardo Martin-Brualla, Rohit Pandey, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Julien Valentin, Sameh Khamis, Philip Davidson, Anastasia Tkach, Peter Lincoln, Adarsh Kowdle, Christoph Rhemann, Dan B Goldman, Cem Keskin, Steve Seitz, Shahram Izadi, Sean Fanello

    Abstract: Motivated by augmented and virtual reality applications such as telepresence, there has been a recent focus in real-time performance capture of humans under motion. However, given the real-time constraint, these systems often suffer from artifacts in geometry and texture such as holes and noise in the final rendering, poor lighting, and low-resolution textures. We take the novel approach to augmen… ▽ More

    Submitted 12 November, 2018; originally announced November 2018.

    Comments: The supplementary video is available at: http://youtu.be/Md3tdAKoLGU To be presented at SIGGRAPH Asia 2018

  18. arXiv:1810.13118  [pdf, other

    cs.LG stat.ML

    SplineNets: Continuous Neural Decision Graphs

    Authors: Cem Keskin, Shahram Izadi

    Abstract: We present SplineNets, a practical and novel approach for using conditioning in convolutional neural networks (CNNs). SplineNets are continuous generalizations of neural decision graphs, and they can dramatically reduce runtime complexity and computation costs of CNNs, while maintaining or even increasing accuracy. Functions of SplineNets are both dynamic (i.e., conditioned on the input) and hiera… ▽ More

    Submitted 31 October, 2018; originally announced October 2018.

    Comments: Accepted to NIPS 2018

  19. arXiv:1807.09534  [pdf, other

    cs.CV cs.LG

    Conditional Information Gain Networks

    Authors: Ufuk Can Biçici, Cem Keskin, Lale Akarun

    Abstract: Deep neural network models owe their representational power to the high number of learnable parameters. It is often infeasible to run these largely parametrized deep models in limited resource environments, like mobile phones. Network models employing conditional computing are able to reduce computational requirements while achieving high representational power, with their ability to model hierarc… ▽ More

    Submitted 25 July, 2018; originally announced July 2018.

    Comments: ICPR 2018 Paper

  20. arXiv:1603.05772  [pdf, other

    cs.CV

    Learning to Navigate the Energy Landscape

    Authors: Julien Valentin, Angela Dai, Matthias Nießner, Pushmeet Kohli, Philip Torr, Shahram Izadi, Cem Keskin

    Abstract: In this paper, we present a novel and efficient architecture for addressing computer vision problems that use `Analysis by Synthesis'. Analysis by synthesis involves the minimization of the reconstruction error which is typically a non-convex function of the latent target variables. State-of-the-art methods adopt a hybrid scheme where discriminatively trained predictors like Random Forests or Conv… ▽ More

    Submitted 18 March, 2016; originally announced March 2016.