Skip to main content

Showing 1–29 of 29 results for author: Ehsani, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.20083  [pdf, other

    cs.RO cs.CV

    PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators

    Authors: Kuo-Hao Zeng, Zichen Zhang, Kiana Ehsani, Rose Hendrix, Jordi Salvador, Alvaro Herrasti, Ross Girshick, Aniruddha Kembhavi, Luca Weihs

    Abstract: We present PoliFormer (Policy Transformer), an RGB-only indoor navigation agent trained end-to-end with reinforcement learning at scale that generalizes to the real-world without adaptation despite being trained purely in simulation. PoliFormer uses a foundational vision transformer encoder with a causal transformer decoder enabling long-term memory and reasoning. It is trained for hundreds of mil… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.18915  [pdf, other

    cs.RO cs.CV

    Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

    Authors: Jiafei Duan, Wentao Yuan, Wilbert Pumacay, Yi Ru Wang, Kiana Ehsani, Dieter Fox, Ranjay Krishna

    Abstract: Large-scale endeavors like RT-1 and widespread community efforts such as Open-X-Embodiment have contributed to growing the scale of robot demonstration data. However, there is still an opportunity to improve the quality, quantity, and diversity of robot demonstration data. Although vision-language models have been shown to automatically generate demonstration data, their utility has been limited t… ▽ More

    Submitted 27 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: Project page: https://robot-ma.github.io/

  3. arXiv:2312.09337  [pdf, other

    cs.CV cs.AI cs.RO

    Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences

    Authors: Minyoung Hwang, Luca Weihs, Chanwoo Park, Kimin Lee, Aniruddha Kembhavi, Kiana Ehsani

    Abstract: Customizing robotic behaviors to be aligned with diverse human preferences is an underexplored challenge in the field of embodied AI. In this paper, we present Promptable Behaviors, a novel framework that facilitates efficient personalization of robotic agents to diverse human preferences in complex environments. We use multi-objective reinforcement learning to train a single policy adaptable to a… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  4. arXiv:2312.06639  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Harmonic Mobile Manipulation

    Authors: Ruihan Yang, Ye** Kim, Aniruddha Kembhavi, Xiaolong Wang, Kiana Ehsani

    Abstract: Recent advancements in robotics have enabled robots to navigate complex scenes or manipulate diverse objects independently. However, robots are still impotent in many household tasks requiring coordinated behaviors such as opening doors. The factorization of navigation and manipulation, while effective for some tasks, fails in scenarios requiring coordinated actions. To address this challenge, we… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: More results are on our project site: https://rchalyang.github.io/HarmonicMM/

  5. arXiv:2312.02976  [pdf, other

    cs.RO cs.AI cs.CV

    Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World

    Authors: Kiana Ehsani, Tanmay Gupta, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Kunal Pratap Singh, Ye** Kim, Winson Han, Alvaro Herrasti, Ranjay Krishna, Dustin Schwenk, Eli VanderBilt, Aniruddha Kembhavi

    Abstract: Reinforcement learning (RL) with dense rewards and imitation learning (IL) with human-generated trajectories are the most widely used approaches for training modern embodied agents. RL requires extensive reward sha** and auxiliary losses and is often too slow and ineffective for long-horizon tasks. While IL with human supervision is effective, collecting human trajectories at scale is extremely… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: First six authors contributed equally. Project page: https://spoc-robot.github.io/

  6. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, A**kya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io

  7. arXiv:2307.05663  [pdf, other

    cs.CV cs.AI

    Objaverse-XL: A Universe of 10M+ 3D Objects

    Authors: Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, Eli VanderBilt, Aniruddha Kembhavi, Carl Vondrick, Georgia Gkioxari, Kiana Ehsani, Ludwig Schmidt, Ali Farhadi

    Abstract: Natural language processing and 2D vision models have attained remarkable proficiency on many tasks primarily by escalating the scale of training data. However, 3D vision tasks have not seen the same progress, in part due to the challenges of acquiring high-quality 3D data. In this work, we present Objaverse-XL, a dataset of over 10 million 3D objects. Our dataset comprises deduplicated 3D objects… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  8. arXiv:2212.08051  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    Objaverse: A Universe of Annotated 3D Objects

    Authors: Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, Ali Farhadi

    Abstract: Massive data corpora like WebText, Wikipedia, Conceptual Captions, WebImageText, and LAION have propelled recent dramatic progress in AI. Large neural models trained on such datasets produce impressive results and top many of today's benchmarks. A notable omission within this family of large-scale datasets is 3D data. Despite considerable interest and potential applications in 3D vision, datasets… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

    Comments: Website: objaverse.allenai.org

  9. arXiv:2212.04819  [pdf, other

    cs.RO cs.AI cs.CV

    Phone2Proc: Bringing Robust Robots Into Our Chaotic World

    Authors: Matt Deitke, Rose Hendrix, Luca Weihs, Ali Farhadi, Kiana Ehsani, Aniruddha Kembhavi

    Abstract: Training embodied agents in simulation has become mainstream for the embodied AI community. However, these agents often struggle when deployed in the physical world due to their inability to generalize to real-world environments. In this paper, we present Phone2Proc, a method that uses a 10-minute phone scan and conditional procedural generation to create a distribution of training scenes that are… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

    Comments: https://allenai.org/project/phone2proc

  10. arXiv:2210.06849  [pdf, other

    cs.CV

    Retrospectives on the Embodied AI Workshop

    Authors: Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi , et al. (14 additional authors not shown)

    Abstract: We present a retrospective on the state of Embodied AI research. Our analysis focuses on 13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are grouped into three themes: (1) visual navigation, (2) rearrangement, and (3) embodied vision-and-language. We discuss the dominant datasets within each theme, evaluation metrics for the challenges, and the performance of state-of… ▽ More

    Submitted 4 December, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

  11. arXiv:2207.08997  [pdf, other

    cs.CV

    Structure from Action: Learning Interactions for Articulated Object 3D Structure Discovery

    Authors: Neil Nie, Samir Yitzhak Gadre, Kiana Ehsani, Shuran Song

    Abstract: We introduce Structure from Action (SfA), a framework to discover 3D part geometry and joint parameters of unseen articulated objects via a sequence of inferred interactions. Our key insight is that 3D interaction and perception should be considered in conjunction to construct 3D articulated CAD models, especially for categories not seen during training. By selecting informative interactions, SfA… ▽ More

    Submitted 7 April, 2023; v1 submitted 18 July, 2022; originally announced July 2022.

  12. arXiv:2206.06994  [pdf, other

    cs.AI cs.CV cs.RO

    ProcTHOR: Large-Scale Embodied AI Using Procedural Generation

    Authors: Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi

    Abstract: Massive datasets and high-capacity models have driven many recent advancements in computer vision and natural language understanding. This work presents a platform to enable similar success stories in Embodied AI. We propose ProcTHOR, a framework for procedural generation of Embodied AI environments. ProcTHOR enables us to sample arbitrarily large datasets of diverse, interactive, customizable, an… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    Comments: ProcTHOR website: https://procthor.allenai.org

  13. arXiv:2203.17251  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Continuous Scene Representations for Embodied AI

    Authors: Samir Yitzhak Gadre, Kiana Ehsani, Shuran Song, Roozbeh Mottaghi

    Abstract: We propose Continuous Scene Representations (CSR), a scene representation constructed by an embodied agent navigating within a space, where objects and their relationships are modeled by continuous valued embeddings. Our method captures feature relationships between objects, composes them into a graph structure on-the-fly, and situates an embodied agent within the representation. Our key insight i… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  14. arXiv:2203.08141  [pdf, other

    cs.CV cs.LG cs.RO

    Object Manipulation via Visual Target Localization

    Authors: Kiana Ehsani, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi

    Abstract: Object manipulation is a critical skill required for Embodied AI agents interacting with the world around them. Training agents to manipulate objects, poses many challenges. These include occlusion of the target object by the agent's arm, noisy object detection and localization, and the target frequently going out of view as the agent moves around in the scene. We propose Manipulation via Visual O… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

  15. arXiv:2112.12612  [pdf, other

    cs.RO cs.CV

    Towards Disturbance-Free Visual Mobile Manipulation

    Authors: Tianwei Ni, Kiana Ehsani, Luca Weihs, Jordi Salvador

    Abstract: Deep reinforcement learning has shown promising results on an abundance of robotic tasks in simulation, including visual navigation and manipulation. Prior work generally aims to build embodied agents that solve their assigned tasks as quickly as possible, while largely ignoring the problems caused by collision with objects during interaction. This lack of prioritization is understandable: there i… ▽ More

    Submitted 21 October, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

    Comments: WACV 2023

  16. arXiv:2105.01047  [pdf, other

    cs.CV

    Act the Part: Learning Interaction Strategies for Articulated Object Part Discovery

    Authors: Samir Yitzhak Gadre, Kiana Ehsani, Shuran Song

    Abstract: People often use physical intuition when manipulating articulated objects, irrespective of object semantics. Motivated by this observation, we identify an important embodied task where an agent must play with objects to recover their parts. To this end, we introduce Act the Part (AtP) to learn how to interact with articulated objects to discover and segment their pieces. By coupling action selecti… ▽ More

    Submitted 3 May, 2021; originally announced May 2021.

    Comments: 16 pages, 16 figures

  17. arXiv:2104.11213  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    ManipulaTHOR: A Framework for Visual Object Manipulation

    Authors: Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, Roozbeh Mottaghi

    Abstract: The domain of Embodied AI has recently witnessed substantial progress, particularly in navigating agents within their environments. These early successes have laid the building blocks for the community to tackle tasks that require agents to actively interact with objects in their environment. Object manipulation is an established research domain within the robotics community and poses several chal… ▽ More

    Submitted 22 April, 2021; originally announced April 2021.

    Comments: CVPR 2021 -- (Oral presentation)

  18. arXiv:2103.14005  [pdf, other

    cs.CV cs.LG

    Contrasting Contrastive Self-Supervised Representation Learning Pipelines

    Authors: Klemen Kotar, Gabriel Ilharco, Ludwig Schmidt, Kiana Ehsani, Roozbeh Mottaghi

    Abstract: In the past few years, we have witnessed remarkable breakthroughs in self-supervised representation learning. Despite the success and adoption of representations learned through this paradigm, much is yet to be understood about how different training methods and datasets influence performance on downstream tasks. In this paper, we analyze contrastive approaches as one of the most successful and po… ▽ More

    Submitted 18 August, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

    Comments: ICCV 2021

  19. arXiv:2012.00172  [pdf, other

    cs.LG cs.CV

    Deconstructing the Structure of Sparse Neural Networks

    Authors: Maxwell Van Gelder, Mitchell Wortsman, Kiana Ehsani

    Abstract: Although sparse neural networks have been studied extensively, the focus has been primarily on accuracy. In this work, we focus instead on network structure, and analyze three popular algorithms. We first measure performance when structure persists and weights are reset to a different random initialization, thereby extending experiments in Deconstructing Lottery Tickets (Zhou et al., 2019). This e… ▽ More

    Submitted 30 November, 2020; originally announced December 2020.

    Comments: 6 pages, 4 figures, Accepted to ML-Retrospectives, Surveys & Meta-Analyses @ NeurIPS 2020 Workshop

    ACM Class: I.2.6; I.5.1

  20. arXiv:2010.08539  [pdf, other

    cs.CV

    What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions

    Authors: Kiana Ehsani, Daniel Gordon, Thomas Nguyen, Roozbeh Mottaghi, Ali Farhadi

    Abstract: Learning effective representations of visual data that generalize to a variety of downstream tasks has been a long quest for computer vision. Most representation learning approaches rely solely on visual data such as images or videos. In this paper, we explore a novel approach, where we use human interaction and attention cues to investigate whether we can learn better representations compared to… ▽ More

    Submitted 6 March, 2021; v1 submitted 16 October, 2020; originally announced October 2020.

    Comments: Published as a conference paper at ICLR 2021

  21. arXiv:2003.12045  [pdf, other

    cs.CV cs.LG cs.RO

    Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects

    Authors: Kiana Ehsani, Shubham Tulsiani, Saurabh Gupta, Ali Farhadi, Abhinav Gupta

    Abstract: When we humans look at a video of human-object interaction, we can not only infer what is happening but we can even extract actionable information and imitate those interactions. On the other hand, current recognition or geometric approaches lack the physicality of action representation. In this paper, we take a step towards a more physical understanding of actions. We address the problem of infer… ▽ More

    Submitted 26 March, 2020; originally announced March 2020.

    Comments: CVPR 2020 -- (Oral presentation)

  22. arXiv:2003.07990  [pdf, other

    cs.CV

    Watching the World Go By: Representation Learning from Unlabeled Videos

    Authors: Daniel Gordon, Kiana Ehsani, Dieter Fox, Ali Farhadi

    Abstract: Recent single image unsupervised representation learning techniques show remarkable success on a variety of tasks. The basic principle in these works is instance discrimination: learning to differentiate between two augmented versions of the same image and a large batch of unrelated images. Networks learn to ignore the augmentation noise and extract semantically meaningful representations. Prior w… ▽ More

    Submitted 7 May, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

  23. arXiv:1912.08195  [pdf, other

    cs.CV cs.AI cs.LG

    Learning Generalizable Visual Representations via Interactive Gameplay

    Authors: Luca Weihs, Aniruddha Kembhavi, Kiana Ehsani, Sarah M Pratt, Winson Han, Alvaro Herrasti, Eric Kolve, Dustin Schwenk, Roozbeh Mottaghi, Ali Farhadi

    Abstract: A growing body of research suggests that embodied gameplay, prevalent not just in human cultures but across a variety of animal species including turtles and ravens, is critical in develo** the neural flexibility for creative problem solving, decision making, and socialization. Comparatively little is known regarding the impact of embodied gameplay upon artificial agents. While recent work has p… ▽ More

    Submitted 25 February, 2021; v1 submitted 17 December, 2019; originally announced December 2019.

    Comments: Replaced with version accepted to ICLR'21

  24. arXiv:1812.00971  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning

    Authors: Mitchell Wortsman, Kiana Ehsani, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi

    Abstract: Learning is an inherently continuous phenomenon. When humans learn a new task there is no explicit distinction between training and inference. As we learn a task, we keep learning about it while performing the task. What we learn and how we learn it varies during different stages of learning. Learning how to learn and adapt is a key property that enables us to generalize effortlessly to new settin… ▽ More

    Submitted 26 March, 2019; v1 submitted 3 December, 2018; originally announced December 2018.

  25. arXiv:1803.10827  [pdf, other

    cs.CV

    Who Let The Dogs Out? Modeling Dog Behavior From Visual Data

    Authors: Kiana Ehsani, Hessam Bagherinezhad, Joseph Redmon, Roozbeh Mottaghi, Ali Farhadi

    Abstract: We introduce the task of directly modeling a visually intelligent agent. Computer vision typically focuses on solving various subtasks related to visual intelligence. We depart from this standard approach to computer vision; instead we directly model a visually intelligent agent. Our model takes visual information as input and directly predicts the actions of the agent. Toward this end we introduc… ▽ More

    Submitted 17 May, 2018; v1 submitted 28 March, 2018; originally announced March 2018.

    Comments: Accepted to CVPR18

  26. arXiv:1712.05474  [pdf, other

    cs.CV cs.AI cs.LG

    AI2-THOR: An Interactive 3D Environment for Visual AI

    Authors: Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Matt Deitke, Kiana Ehsani, Daniel Gordon, Yuke Zhu, Aniruddha Kembhavi, Abhinav Gupta, Ali Farhadi

    Abstract: We introduce The House Of inteRactions (THOR), a framework for visual AI research, available at http://ai2thor.allenai.org. AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks. AI2-THOR enables research in many different domains including but not limited to deep reinforcement learning, imitation learning,… ▽ More

    Submitted 26 August, 2022; v1 submitted 14 December, 2017; originally announced December 2017.

  27. arXiv:1703.10239  [pdf, other

    cs.CV

    SeGAN: Segmenting and Generating the Invisible

    Authors: Kiana Ehsani, Roozbeh Mottaghi, Ali Farhadi

    Abstract: Objects often occlude each other in scenes; Inferring their appearance beyond their visible parts plays an important role in scene understanding, depth estimation, object interaction and manipulation. In this paper, we study the challenging problem of completing the appearance of occluded objects. Doing so requires knowing which pixels to paint (segmenting the invisible parts of objects) and what… ▽ More

    Submitted 7 May, 2018; v1 submitted 29 March, 2017; originally announced March 2017.

    Comments: Accepted to CVPR18 as spotlight

  28. arXiv:1411.0132  [pdf, ps, other

    cs.DM

    A Note on Signed k-Submatching in Graphs

    Authors: S. Akbari, M. Dalirrooyfard, K. Ehsani, R. Sherkati

    Abstract: Let $G$ be a graph of order $n$. For every $v\in V(G)$, let $E_G(v)$ denote the set of all edges incident with $v$. A signed $k$-submatching of $G$ is a function $f:E(G)\longrightarrow \{-1,1\}$, satisfying $f(E_G(v))\leq 1$ for at least $k$ vertices, where $f(S)=\sum_{e\in S}f(e)$, for each $ S\subseteq E(G)$. The maximum of the value of $f(E(G))$, taken over all signed $k$-submatching $f$ of… ▽ More

    Submitted 1 November, 2014; originally announced November 2014.

    Comments: 4 pages

    MSC Class: 05C70; 05C78

  29. arXiv:1402.0134  [pdf, other

    cs.DM math.CO

    On the Decision Number of Graphs

    Authors: S. Akbari, M. Dalirrooyfard, S. Davodpoor, K. Ehsani, R. Sherkati

    Abstract: Let $G$ be a graph. A good function is a function $f:V(G)\rightarrow \{-1,1\}$, satisfying $f(N(v))\geq 1$, for each $v\in V(G)$, where $ N(v)=\{u\in V(G)\, |\, uv\in E(G) \} $ and $f(S) = \sum_{u\in S} f(u)$ for every $S \subseteq V(G) $. For every cubic graph $G$ of order $ n, $ we prove that $ γ(G) \leq \frac{5n}{7} $ and show that this inequality is sharp. A function… ▽ More

    Submitted 11 June, 2014; v1 submitted 1 February, 2014; originally announced February 2014.

    Comments: 17 pages, 7 figures

    MSC Class: 05C05; 05C38; 05CC69; 05C78