Skip to main content

Showing 1–21 of 21 results for author: Ramakrishnan, S K

.
  1. arXiv:2311.18259  [pdf, other

    cs.CV cs.AI

    Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

    Authors: Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, **g Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, **g Huang, Md Mohaiminul Islam, Suyog Jain , et al. (76 additional authors not shown)

    Abstract: We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from… ▽ More

    Submitted 29 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: updated baseline results and dataset statistics to match the released v2 data; added table to appendix comparing stats of Ego-Exo4D alongside other datasets

  2. arXiv:2307.08763  [pdf, other

    cs.CV

    Video-Mined Task Graphs for Keystep Recognition in Instructional Videos

    Authors: Kumar Ashutosh, Santhosh Kumar Ramakrishnan, Triantafyllos Afouras, Kristen Grauman

    Abstract: Procedural activity understanding requires perceiving human actions in terms of a broader task, where multiple keysteps are performed in sequence across a long video to reach a final goal state -- such as the steps of a recipe or a DIY fix-it task. Prior work largely treats keystep recognition in isolation of this broader structure, or else rigidly confines keysteps to align with a predefined sequ… ▽ More

    Submitted 29 October, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023

  3. arXiv:2306.15850  [pdf, other

    cs.CV

    SpotEM: Efficient Video Search for Episodic Memory

    Authors: Santhosh Kumar Ramakrishnan, Ziad Al-Halah, Kristen Grauman

    Abstract: The goal in episodic memory (EM) is to search a long egocentric video to answer a natural language query (e.g., "where did I leave my purse?"). Existing EM methods exhaustively extract expensive fixed-length clip features to look everywhere in the video for the answer, which is infeasible for long wearable-camera videos that span hours or even days. We propose SpotEM, an approach to achieve effici… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: Published in ICML 2023

  4. arXiv:2306.09324  [pdf, other

    cs.CV

    Single-Stage Visual Query Localization in Egocentric Videos

    Authors: Hanwen Jiang, Santhosh Kumar Ramakrishnan, Kristen Grauman

    Abstract: Visual Query Localization on long-form egocentric videos requires spatio-temporal search and localization of visually specified objects and is vital to build episodic memory systems. Prior work develops complex multi-stage pipelines that leverage well-established object detection and tracking methods to perform VQL. However, each stage is independently trained and the complexity of the pipeline re… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: Winner of Ego4D VQ2D challenge 2023

  5. A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

    Authors: Megan M. Baker, Alexander New, Mario Aguilar-Simon, Ziad Al-Halah, Sébastien M. R. Arnold, Ese Ben-Iwhiwhu, Andrew P. Brna, Ethan Brooks, Ryan C. Brown, Zachary Daniels, Anurag Daram, Fabien Delattre, Ryan Dellana, Eric Eaton, Haotian Fu, Kristen Grauman, Jesse Hostetler, Shariq Iqbal, Cassandra Kent, Nicholas Ketz, Soheil Kolouri, George Konidaris, Dhireesha Kudithipudi, Erik Learned-Miller, Seungwon Lee , et al. (22 additional authors not shown)

    Abstract: Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed. This critical gap may be addressed through th… ▽ More

    Submitted 18 January, 2023; originally announced January 2023.

    Comments: To appear in Neural Networks

  6. arXiv:2301.00746  [pdf, other

    cs.CV

    NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory

    Authors: Santhosh Kumar Ramakrishnan, Ziad Al-Halah, Kristen Grauman

    Abstract: Searching long egocentric videos with natural language queries (NLQ) has compelling applications in augmented reality and robotics, where a fluid index into everything that a person (agent) has seen before could augment human memory and surface relevant information on demand. However, the structured nature of the learning problem (free-form text query inputs, localized video temporal window output… ▽ More

    Submitted 25 March, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

    Comments: 13 pages, 7 figures, appearing in CVPR 2023

  7. arXiv:2210.05633  [pdf, other

    cs.CV

    Habitat-Matterport 3D Semantics Dataset

    Authors: Karmesh Yadav, Ram Ramrakhya, Santhosh Kumar Ramakrishnan, Theo Gervet, John Turner, Aaron Gokaslan, Noah Maestre, Angel Xuan Chang, Dhruv Batra, Manolis Savva, Alexander William Clegg, Devendra Singh Chaplot

    Abstract: We present the Habitat-Matterport 3D Semantics (HM3DSEM) dataset. HM3DSEM is the largest dataset of 3D real-world spaces with densely annotated semantics that is currently available to the academic community. It consists of 142,646 object instance annotations across 216 3D spaces and 3,100 rooms within those spaces. The scale, quality, and diversity of object annotations far exceed those of prior… ▽ More

    Submitted 12 October, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: 15 Pages, 11 Figures, 6 Tables

  8. arXiv:2207.11365  [pdf, other

    cs.CV

    EgoEnv: Human-centric environment representations from egocentric video

    Authors: Tushar Nagarajan, Santhosh Kumar Ramakrishnan, Ruta Desai, James Hillis, Kristen Grauman

    Abstract: First-person video highlights a camera-wearer's activities in the context of their persistent environment. However, current video understanding approaches reason over visual features from short video clips that are detached from the underlying physical space and capture only what is immediately visible. To facilitate human-centric environment understanding, we present an approach that links egocen… ▽ More

    Submitted 9 November, 2023; v1 submitted 22 July, 2022; originally announced July 2022.

    Comments: Published in NeurIPS 2023 (Oral)

  9. arXiv:2204.03729  [pdf

    cond-mat.supr-con

    Martensitic transformation in V_3Si single crystal: ^51V NMR evidence for coexistence of cubic and tetragonal phases

    Authors: A. A. Gapud, S. K. Ramakrishnan, E. L. Green, A. P. Reyes

    Abstract: The Martensitic transformation (MT) in A15 binary-alloy superconductor V_3Si, though studied extensively, has not yet been conclusively linked with a transition to superconductivity. Previous NMR studies have mainly been on powder samples and with little emphasis on temperature dependence during the transformation. Here we study a high-quality single crystal, where quadrupolar splitting of NMR spe… ▽ More

    Submitted 7 June, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Revised manuscript submitted 3 June 2022 to Physica C

  10. arXiv:2202.02440  [pdf, other

    cs.CV cs.AI cs.LG

    Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation

    Authors: Ziad Al-Halah, Santhosh K. Ramakrishnan, Kristen Grauman

    Abstract: In reinforcement learning for visual navigation, it is common to develop a model for each new task, and train that model from scratch with task-specific interactions in 3D environments. However, this process is expensive; massive amounts of interactions are needed for the model to generalize well. Moreover, this process is repeated whenever there is a change in the task type or the goal modality.… ▽ More

    Submitted 28 April, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

    Comments: CVPR 2022. Project page: https://vision.cs.utexas.edu/projects/zsel/

  11. arXiv:2201.10029  [pdf, other

    cs.CV cs.AI

    PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning

    Authors: Santhosh Kumar Ramakrishnan, Devendra Singh Chaplot, Ziad Al-Halah, Jitendra Malik, Kristen Grauman

    Abstract: State-of-the-art approaches to ObjectGoal navigation rely on reinforcement learning and typically require significant computational resources and time for learning. We propose Potential functions for ObjectGoal Navigation with Interaction-free learning (PONI), a modular approach that disentangles the skills of `where to look?' for an object and `how to navigate to (x, y)?'. Our key insight is that… ▽ More

    Submitted 17 June, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

    Comments: 8 pages + supplementary. Accepted in CVPR 2022

  12. arXiv:2110.07058  [pdf, other

    cs.CV cs.AI

    Ego4D: Around the World in 3,000 Hours of Egocentric Video

    Authors: Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do , et al. (60 additional authors not shown)

    Abstract: We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with cons… ▽ More

    Submitted 11 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: To appear in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. This version updates the baseline result numbers for the Hands and Objects benchmark (appendix)

  13. arXiv:2109.08238  [pdf, other

    cs.CV cs.AI

    Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI

    Authors: Santhosh K. Ramakrishnan, Aaron Gokaslan, Erik Wijmans, Oleksandr Maksymets, Alex Clegg, John Turner, Eric Undersander, Wojciech Galuba, Andrew Westbury, Angel X. Chang, Manolis Savva, Yili Zhao, Dhruv Batra

    Abstract: We present the Habitat-Matterport 3D (HM3D) dataset. HM3D is a large-scale dataset of 1,000 building-scale 3D reconstructions from a diverse set of real-world locations. Each scene in the dataset consists of a textured 3D mesh reconstruction of interiors such as multi-floor residences, stores, and other private indoor spaces. HM3D surpasses existing datasets available for academic research in te… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.

    Comments: 21 pages, 14 figures

  14. arXiv:2102.02337  [pdf, other

    cs.CV

    Environment Predictive Coding for Embodied Agents

    Authors: Santhosh K. Ramakrishnan, Tushar Nagarajan, Ziad Al-Halah, Kristen Grauman

    Abstract: We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents. In contrast to prior work on self-supervised learning for images, we aim to jointly encode a series of images gathered by an agent as it moves about in 3D environments. We learn these representations via a zone prediction task, where we intelligently mask out porti… ▽ More

    Submitted 3 February, 2021; originally announced February 2021.

    Comments: 9 pages, 6 figures, appendix

  15. arXiv:2008.09622  [pdf, other

    cs.CV cs.AI cs.LG cs.RO cs.SD

    Learning to Set Waypoints for Audio-Visual Navigation

    Authors: Changan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao, Santhosh Kumar Ramakrishnan, Kristen Grauman

    Abstract: In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source (e.g., a phone ringing in another room). Existing models learn to act at a fixed granularity of agent motion and rely on simple recurrent aggregations of the audio observations. We introduce a reinforcement learning approach to audio-visual navig… ▽ More

    Submitted 11 February, 2021; v1 submitted 21 August, 2020; originally announced August 2020.

    Comments: Accepted to ICLR 2021

  16. arXiv:2008.09285  [pdf, other

    cs.CV

    Occupancy Anticipation for Efficient Exploration and Navigation

    Authors: Santhosh K. Ramakrishnan, Ziad Al-Halah, Kristen Grauman

    Abstract: State-of-the-art navigation methods leverage a spatial memory to generalize to new environments, but their occupancy maps are limited to capturing the geometric structures directly observed by the agent. We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions. In doing so, the agent builds its spatial awarene… ▽ More

    Submitted 25 August, 2020; v1 submitted 20 August, 2020; originally announced August 2020.

    Comments: Accepted in ECCV 2020. 19 pages, 6 figures, appendix at end

  17. arXiv:2001.02192  [pdf, other

    cs.CV cs.AI

    An Exploration of Embodied Visual Exploration

    Authors: Santhosh K. Ramakrishnan, Dinesh Jayaraman, Kristen Grauman

    Abstract: Embodied computer vision considers perception for robots in novel, unstructured environments. Of particular importance is the embodied visual exploration problem: how might a robot equipped with a camera scope out a new environment? Despite the progress thus far, many basic questions pertinent to this problem remain unanswered: (i) What does it mean for an agent to explore its environment well? (i… ▽ More

    Submitted 20 August, 2020; v1 submitted 7 January, 2020; originally announced January 2020.

    Comments: 30 main + 21 appendix pages, 23 figures

  18. Emergence of Exploratory Look-Around Behaviors through Active Observation Completion

    Authors: Santhosh K. Ramakrishnan, Dinesh Jayaraman, Kristen Grauman

    Abstract: Standard computer vision systems assume access to intelligently captured inputs (e.g., photos from a human photographer), yet autonomously capturing good observations is a major challenge in itself. We address the problem of learning to look around: how can an agent learn to acquire informative visual observations? We propose a reinforcement learning solution, where the agent is rewarded for reduc… ▽ More

    Submitted 26 June, 2019; originally announced June 2019.

    Comments: Main paper 7 figures, supplementary 6 figures. Published in Science Robotics 2019

  19. arXiv:1807.11010  [pdf, other

    cs.CV

    Sidekick Policy Learning for Active Visual Exploration

    Authors: Santhosh K. Ramakrishnan, Kristen Grauman

    Abstract: We consider an active visual exploration scenario, where an agent must intelligently select its camera motions to efficiently reconstruct the full environment from only a limited set of narrow field-of-view glimpses. While the agent has full observability of the environment during training, it has only partial observability once deployed, being constrained by what portions it has seen and what cam… ▽ More

    Submitted 29 July, 2018; originally announced July 2018.

    Comments: 26 pages, 13 figures, to appear in ECCV 2018

  20. arXiv:1706.02331  [pdf, other

    cs.CV

    CoMaL Tracking: Tracking Points at the Object Boundaries

    Authors: Santhosh K. Ramakrishnan, Swarna Kamlam Ravindran, Anurag Mittal

    Abstract: Traditional point tracking algorithms such as the KLT use local 2D information aggregation for feature detection and tracking, due to which their performance degrades at the object boundaries that separate multiple objects. Recently, CoMaL Features have been proposed that handle such a case. However, they proposed a simple tracking framework where the points are re-detected in each frame and match… ▽ More

    Submitted 7 June, 2017; originally announced June 2017.

    Comments: 10 pages, 10 figures, to appear in 1st Joint BMTT-PETS Workshop on Tracking and Surveillance, CVPR 2017

  21. arXiv:1704.02516  [pdf, other

    cs.CV

    An Empirical Evaluation of Visual Question Answering for Novel Objects

    Authors: Santhosh K. Ramakrishnan, Ambar Pal, Gaurav Sharma, Anurag Mittal

    Abstract: We study the problem of answering questions about images in the harder setting, where the test questions and corresponding images contain novel objects, which were not queried about in the training data. Such setting is inevitable in real world-owing to the heavy tailed distribution of the visual categories, there would be some objects which would not be annotated in the train set. We show that th… ▽ More

    Submitted 8 April, 2017; originally announced April 2017.

    Comments: 11 pages, 4 figures, accepted in CVPR 2017 (poster)