Skip to main content

Showing 1–10 of 10 results for author: Herrasti, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.20083  [pdf, other

    cs.RO cs.CV

    PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators

    Authors: Kuo-Hao Zeng, Zichen Zhang, Kiana Ehsani, Rose Hendrix, Jordi Salvador, Alvaro Herrasti, Ross Girshick, Aniruddha Kembhavi, Luca Weihs

    Abstract: We present PoliFormer (Policy Transformer), an RGB-only indoor navigation agent trained end-to-end with reinforcement learning at scale that generalizes to the real-world without adaptation despite being trained purely in simulation. PoliFormer uses a foundational vision transformer encoder with a causal transformer decoder enabling long-term memory and reasoning. It is trained for hundreds of mil… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2312.09067  [pdf, other

    cs.CV cs.AI cs.CL cs.RO

    Holodeck: Language Guided Generation of 3D Embodied AI Environments

    Authors: Yue Yang, Fan-Yun Sun, Luca Weihs, Eli VanderBilt, Alvaro Herrasti, Winson Han, Jiajun Wu, Nick Haber, Ranjay Krishna, Lingjie Liu, Chris Callison-Burch, Mark Yatskar, Aniruddha Kembhavi, Christopher Clark

    Abstract: 3D simulated environments play a critical role in Embodied AI, but their creation requires expertise and extensive manual effort, restricting their diversity and scope. To mitigate this limitation, we present Holodeck, a system that generates 3D environments to match a user-supplied prompt fully automatedly. Holodeck can generate diverse scenes, e.g., arcades, spas, and museums, adjust the designs… ▽ More

    Submitted 22 April, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Published in CVPR 2024, 21 pages, 27 figures, 2 tables

  3. arXiv:2312.02976  [pdf, other

    cs.RO cs.AI cs.CV

    Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World

    Authors: Kiana Ehsani, Tanmay Gupta, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Kunal Pratap Singh, Ye** Kim, Winson Han, Alvaro Herrasti, Ranjay Krishna, Dustin Schwenk, Eli VanderBilt, Aniruddha Kembhavi

    Abstract: Reinforcement learning (RL) with dense rewards and imitation learning (IL) with human-generated trajectories are the most widely used approaches for training modern embodied agents. RL requires extensive reward sha** and auxiliary losses and is often too slow and ineffective for long-horizon tasks. While IL with human supervision is effective, collecting human trajectories at scale is extremely… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: First six authors contributed equally. Project page: https://spoc-robot.github.io/

  4. arXiv:2211.09960  [pdf, other

    cs.CV cs.AI

    Ask4Help: Learning to Leverage an Expert for Embodied Tasks

    Authors: Kunal Pratap Singh, Luca Weihs, Alvaro Herrasti, Jonghyun Choi, Aniruddha Kemhavi, Roozbeh Mottaghi

    Abstract: Embodied AI agents continue to become more capable every year with the advent of new models, environments, and benchmarks, but are still far away from being performant and reliable enough to be deployed in real, user-facing, applications. In this paper, we ask: can we bridge this gap by enabling agents to ask for assistance from an expert such as a human being? To this end, we propose the Ask4Help… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: Accepted at NeurIPS, 2022

  5. arXiv:2206.06994  [pdf, other

    cs.AI cs.CV cs.RO

    ProcTHOR: Large-Scale Embodied AI Using Procedural Generation

    Authors: Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi

    Abstract: Massive datasets and high-capacity models have driven many recent advancements in computer vision and natural language understanding. This work presents a platform to enable similar success stories in Embodied AI. We propose ProcTHOR, a framework for procedural generation of Embodied AI environments. ProcTHOR enables us to sample arbitrarily large datasets of diverse, interactive, customizable, an… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    Comments: ProcTHOR website: https://procthor.allenai.org

  6. arXiv:2112.00800  [pdf, other

    cs.CL cs.AI

    Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text

    Authors: Christopher Clark, Jordi Salvador, Dustin Schwenk, Derrick Bonafilia, Mark Yatskar, Eric Kolve, Alvaro Herrasti, Jonghyun Choi, Sachin Mehta, Sam Skjonsberg, Carissa Schoenick, Aaron Sarnat, Hannaneh Hajishirzi, Aniruddha Kembhavi, Oren Etzioni, Ali Farhadi

    Abstract: Communicating with humans is challenging for AIs because it requires a shared understanding of the world, complex semantics (e.g., metaphors or analogies), and at times multi-modal gestures (e.g., pointing with a finger, or an arrow in a diagram). We investigate these challenges in the context of Iconary, a collaborative game of drawing and guessing based on Pictionary, that poses a novel challeng… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

    Comments: In EMNLP 2021

  7. arXiv:2104.11213  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    ManipulaTHOR: A Framework for Visual Object Manipulation

    Authors: Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, Roozbeh Mottaghi

    Abstract: The domain of Embodied AI has recently witnessed substantial progress, particularly in navigating agents within their environments. These early successes have laid the building blocks for the community to tackle tasks that require agents to actively interact with objects in their environment. Object manipulation is an established research domain within the robotics community and poses several chal… ▽ More

    Submitted 22 April, 2021; originally announced April 2021.

    Comments: CVPR 2021 -- (Oral presentation)

  8. arXiv:2004.06799  [pdf, other

    cs.CV cs.RO

    RoboTHOR: An Open Simulation-to-Real Embodied AI Platform

    Authors: Matt Deitke, Winson Han, Alvaro Herrasti, Aniruddha Kembhavi, Eric Kolve, Roozbeh Mottaghi, Jordi Salvador, Dustin Schwenk, Eli VanderBilt, Matthew Wallingford, Luca Weihs, Mark Yatskar, Ali Farhadi

    Abstract: Visual recognition ecosystems (e.g. ImageNet, Pascal, COCO) have undeniably played a prevailing role in the evolution of modern computer vision. We argue that interactive and embodied visual AI has reached a stage of development similar to visual recognition prior to the advent of these ecosystems. Recently, various synthetic environments have been introduced to facilitate research in embodied AI.… ▽ More

    Submitted 14 April, 2020; originally announced April 2020.

    Comments: CVPR 2020

  9. arXiv:1912.08195  [pdf, other

    cs.CV cs.AI cs.LG

    Learning Generalizable Visual Representations via Interactive Gameplay

    Authors: Luca Weihs, Aniruddha Kembhavi, Kiana Ehsani, Sarah M Pratt, Winson Han, Alvaro Herrasti, Eric Kolve, Dustin Schwenk, Roozbeh Mottaghi, Ali Farhadi

    Abstract: A growing body of research suggests that embodied gameplay, prevalent not just in human cultures but across a variety of animal species including turtles and ravens, is critical in develo** the neural flexibility for creative problem solving, decision making, and socialization. Comparatively little is known regarding the impact of embodied gameplay upon artificial agents. While recent work has p… ▽ More

    Submitted 25 February, 2021; v1 submitted 17 December, 2019; originally announced December 2019.

    Comments: Replaced with version accepted to ICLR'21

  10. arXiv:1712.05474  [pdf, other

    cs.CV cs.AI cs.LG

    AI2-THOR: An Interactive 3D Environment for Visual AI

    Authors: Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Matt Deitke, Kiana Ehsani, Daniel Gordon, Yuke Zhu, Aniruddha Kembhavi, Abhinav Gupta, Ali Farhadi

    Abstract: We introduce The House Of inteRactions (THOR), a framework for visual AI research, available at http://ai2thor.allenai.org. AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks. AI2-THOR enables research in many different domains including but not limited to deep reinforcement learning, imitation learning,… ▽ More

    Submitted 26 August, 2022; v1 submitted 14 December, 2017; originally announced December 2017.