Skip to main content

Showing 1–12 of 12 results for author: Khirodkar, R

.
  1. arXiv:2404.14199  [pdf, other

    cs.CV

    Generalizable Neural Human Renderer

    Authors: Mana Masuda, **hyung Park, Shun Iwase, Rawal Khirodkar, Kris Kitani

    Abstract: While recent advancements in animatable human rendering have achieved remarkable results, they require test-time optimization for each subject which can be a significant limitation for real-world applications. To address this, we tackle the challenging task of learning a Generalizable Neural Human Renderer (GNH), a novel method for rendering animatable humans from monocular video without any test-… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  2. arXiv:2403.06862  [pdf, other

    cs.CV cs.GR cs.RO

    Real-Time Simulated Avatar from Head-Mounted Sensors

    Authors: Zhengyi Luo, **kun Cao, Rawal Khirodkar, Alexander Winkler, **g Huang, Kris Kitani, Weipeng Xu

    Abstract: We present SimXR, a method for controlling a simulated avatar from information (headset pose and cameras) obtained from AR / VR headsets. Due to the challenging viewpoint of head-mounted cameras, the human body is often clipped out of view, making traditional image-based egocentric pose estimation challenging. On the other hand, headset poses provide valuable information about overall body motion,… ▽ More

    Submitted 24 April, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: CVPR 2024 Hightlight. Website: https://www.zhengyiluo.com/SimXR/

  3. arXiv:2401.15616  [pdf, other

    cs.CV

    Multi-Person 3D Pose Estimation from Multi-View Uncalibrated Depth Cameras

    Authors: Yu-Jhe Li, Yan Xu, Rawal Khirodkar, **hyung Park, Kris Kitani

    Abstract: We tackle the task of multi-view, multi-person 3D human pose estimation from a limited number of uncalibrated depth cameras. Recently, many approaches have been proposed for 3D human pose estimation from multi-view RGB cameras. However, these works (1) assume the number of RGB camera views is large enough for 3D reconstruction, (2) the cameras are calibrated, and (3) rely on ground truth 3D poses… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

    Comments: 17 pages including appendix

  4. arXiv:2311.18259  [pdf, other

    cs.CV cs.AI

    Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

    Authors: Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, **g Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, **g Huang, Md Mohaiminul Islam, Suyog Jain , et al. (76 additional authors not shown)

    Abstract: We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from… ▽ More

    Submitted 29 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: updated baseline results and dataset statistics to match the released v2 data; added table to appendix comparing stats of Ego-Exo4D alongside other datasets

  5. arXiv:2305.16487  [pdf, other

    cs.CV cs.AI

    EgoHumans: An Egocentric 3D Multi-Human Benchmark

    Authors: Rawal Khirodkar, Aayush Bansal, Lingni Ma, Richard Newcombe, Minh Vo, Kris Kitani

    Abstract: We present EgoHumans, a new multi-view multi-human video benchmark to advance the state-of-the-art of egocentric human 3D pose estimation and tracking. Existing egocentric benchmarks either capture single subject or indoor-only scenarios, which limit the generalization of computer vision algorithms for real-world applications. We propose a novel 3D capture setup to construct a comprehensive egocen… ▽ More

    Submitted 18 August, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Accepted to ICCV 2023 (Oral)

  6. arXiv:2210.05387  [pdf, other

    cs.CV cs.LG

    Sequential Ensembling for Semantic Segmentation

    Authors: Rawal Khirodkar, Brandon Smith, Siddhartha Chandra, Amit Agrawal, Antonio Criminisi

    Abstract: Ensemble approaches for deep-learning-based semantic segmentation remain insufficiently explored despite the proliferation of competitive benchmarks and downstream applications. In this work, we explore and benchmark the popular ensembling approach of combining predictions of multiple, independently-trained, state-of-the-art models at test time on popular datasets. Furthermore, we propose a novel… ▽ More

    Submitted 8 October, 2022; originally announced October 2022.

  7. arXiv:2203.14360  [pdf, other

    cs.CV

    Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking

    Authors: **kun Cao, Jiangmiao Pang, Xinshuo Weng, Rawal Khirodkar, Kris Kitani

    Abstract: Kalman filter (KF) based methods for multi-object tracking (MOT) make an assumption that objects move linearly. While this assumption is acceptable for very short periods of occlusion, linear estimates of motion for prolonged time can be highly inaccurate. Moreover, when there is no measurement available to update Kalman filter parameters, the standard convention is to trust the priori state estim… ▽ More

    Submitted 15 March, 2023; v1 submitted 27 March, 2022; originally announced March 2022.

    Comments: Accepted by CVPR 2023. 8 pages + 10 pages of appendix. Renamed OOS as Observation-centric Re-Update (ORU)

  8. arXiv:2203.13349  [pdf, other

    cs.CV cs.AI

    Occluded Human Mesh Recovery

    Authors: Rawal Khirodkar, Shashank Tripathi, Kris Kitani

    Abstract: Top-down methods for monocular human mesh recovery have two stages: (1) detect human bounding boxes; (2) treat each bounding box as an independent single-human mesh recovery task. Unfortunately, the single-human assumption does not hold in images with multi-human occlusion and crowding. Consequently, top-down methods have difficulties in recovering accurate 3D human meshes under severe person-pers… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

  9. arXiv:2104.00633  [pdf, other

    cs.CV

    RePOSE: Fast 6D Object Pose Refinement via Deep Texture Rendering

    Authors: Shun Iwase, Xingyu Liu, Rawal Khirodkar, Rio Yokota, Kris M. Kitani

    Abstract: We present RePOSE, a fast iterative refinement method for 6D object pose estimation. Prior methods perform refinement by feeding zoomed-in input and rendered RGB images into a CNN and directly regressing an update of a refined pose. Their runtime is slow due to the computational cost of CNN, which is especially prominent in multiple-object pose refinement. To overcome this problem, RePOSE leverage… ▽ More

    Submitted 19 August, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

    Comments: ICCV2021

  10. arXiv:2101.11223  [pdf, other

    cs.CV

    Multi-Instance Pose Networks: Rethinking Top-Down Pose Estimation

    Authors: Rawal Khirodkar, Visesh Chari, Amit Agrawal, Ambrish Tyagi

    Abstract: A key assumption of top-down human pose estimation approaches is their expectation of having a single person/instance present in the input bounding box. This often leads to failures in crowded scenes with occlusions. We propose a novel solution to overcome the limitations of this fundamental assumption. Our Multi-Instance Pose Network (MIPNet) allows for predicting multiple 2D pose instances withi… ▽ More

    Submitted 27 October, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

    Comments: ICCV 2021

  11. arXiv:1812.00491  [pdf, other

    cs.CV

    Adversarial Domain Randomization

    Authors: Rawal Khirodkar, Kris M. Kitani

    Abstract: Domain Randomization (DR) is known to require a significant amount of training data for good performance. We argue that this is due to DR's strategy of random data generation using a uniform distribution over simulation parameters, as a result, DR often generates samples which are uninformative for the learner. In this work, we theoretically analyze DR using ideas from multi-source domain adaptati… ▽ More

    Submitted 29 August, 2021; v1 submitted 2 December, 2018; originally announced December 2018.

  12. arXiv:1811.05939  [pdf, other

    cs.CV

    Domain Randomization for Scene-Specific Car Detection and Pose Estimation

    Authors: Rawal Khirodkar, Donghyun Yoo, Kris M. Kitani

    Abstract: We address the issue of domain gap when making use of synthetic data to train a scene-specific object detector and pose estimator. While previous works have shown that the constraints of learning a scene-specific model can be leveraged to create geometrically and photometrically consistent synthetic data, care must be taken to design synthetic content which is as close as possible to the real-worl… ▽ More

    Submitted 14 November, 2018; originally announced November 2018.