Skip to main content

Showing 51–100 of 173 results for author: Savarese, S

.
  1. arXiv:2109.01115  [pdf, other

    cs.RO cs.AI cs.LG

    Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation

    Authors: Suraj Nair, Eric Mitchell, Kevin Chen, Brian Ichter, Silvio Savarese, Chelsea Finn

    Abstract: We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction. In order to accomplish this, humans need easy and effective ways of specifying tasks to the robot. Goal images are one popular form of task specification, as they are already grounded in the robot's observation space. However, goal images also have a number of drawbacks: t… ▽ More

    Submitted 31 October, 2021; v1 submitted 2 September, 2021; originally announced September 2021.

    Comments: Conference on Robot Learning (CoRL) 2021. 24 Pages, 18 Figures

  2. arXiv:2108.06038  [pdf, other

    cs.RO cs.AI

    Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

    Authors: Chen Wang, Claudia Pérez-D'Arpino, Danfei Xu, Li Fei-Fei, C. Karen Liu, Silvio Savarese

    Abstract: We present a method for learning a human-robot collaboration policy from human-human collaboration demonstrations. An effective robot assistant must learn to handle diverse human behaviors shown in the demonstrations and be robust when the humans adjust their strategies during online task execution. Our method co-optimizes a human policy and a robot policy in an interactive learning process: the h… ▽ More

    Submitted 20 September, 2023; v1 submitted 12 August, 2021; originally announced August 2021.

    Comments: CoRL 2021

  3. arXiv:2108.03332  [pdf, other

    cs.RO cs.AI cs.CV

    BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments

    Authors: Sanjana Srivastava, Chengshu Li, Michael Lingelbach, Roberto Martín-Martín, Fei Xia, Kent Vainio, Zheng Lian, Cem Gokmen, Shyamal Buch, C. Karen Liu, Silvio Savarese, Hyowon Gweon, Jiajun Wu, Li Fei-Fei

    Abstract: We introduce BEHAVIOR, a benchmark for embodied AI with 100 activities in simulation, spanning a range of everyday household chores such as cleaning, maintenance, and food preparation. These activities are designed to be realistic, diverse, and complex, aiming to reproduce the challenges that agents must face in the real world. Building such a benchmark poses three fundamental difficulties for eac… ▽ More

    Submitted 6 August, 2021; originally announced August 2021.

  4. arXiv:2108.03298  [pdf, other

    cs.RO cs.AI cs.LG

    What Matters in Learning from Offline Human Demonstrations for Robot Manipulation

    Authors: Ajay Mandlekar, Danfei Xu, Josiah Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, Roberto Martín-Martín

    Abstract: Imitating human demonstrations is a promising approach to endow robots with various manipulation capabilities. While recent advances have been made in imitation learning and batch (offline) reinforcement learning, a lack of open-source human datasets and reproducible learning methods make assessing the state of the field difficult. In this paper, we conduct an extensive study of six offline learni… ▽ More

    Submitted 24 September, 2021; v1 submitted 6 August, 2021; originally announced August 2021.

    Comments: CoRL 2021 (Oral)

  5. arXiv:2108.03272  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks

    Authors: Chengshu Li, Fei Xia, Roberto Martín-Martín, Michael Lingelbach, Sanjana Srivastava, Bokui Shen, Kent Vainio, Cem Gokmen, Gokul Dharan, Tanish Jain, Andrey Kurenkov, C. Karen Liu, Hyowon Gweon, Jiajun Wu, Li Fei-Fei, Silvio Savarese

    Abstract: Recent research in embodied AI has been boosted by the use of simulation environments to develop and train robot learning approaches. However, the use of simulation has skewed the attention to tasks that only require what robotics simulators can simulate: motion and physical contact. We present iGibson 2.0, an open-source simulation environment that supports the simulation of a more diverse set of… ▽ More

    Submitted 3 November, 2021; v1 submitted 6 August, 2021; originally announced August 2021.

    Comments: Accepted at Conference on Robot Learning (CoRL) 2021. Project website: http://svl.stanford.edu/igibson/

  6. arXiv:2106.13935  [pdf, other

    cs.RO cs.AI cs.LG

    Discovering Generalizable Skills via Automated Generation of Diverse Tasks

    Authors: Kuan Fang, Yuke Zhu, Silvio Savarese, Li Fei-Fei

    Abstract: The learning efficiency and generalization ability of an intelligent agent can be greatly improved by utilizing a useful set of skills. However, the design of robot skills can often be intractable in real-world applications due to the prohibitive amount of effort and expertise that it requires. In this work, we introduce Skill Learning In Diversified Environments (SLIDE), a method to discover gene… ▽ More

    Submitted 25 June, 2021; originally announced June 2021.

    Comments: RSS 2021

  7. arXiv:2106.08827  [pdf, other

    cs.CV

    JRDB-Act: A Large-scale Dataset for Spatio-temporal Action, Social Group and Activity Detection

    Authors: Mahsa Ehsanpour, Fatemeh Saleh, Silvio Savarese, Ian Reid, Hamid Rezatofighi

    Abstract: The availability of large-scale video action understanding datasets has facilitated advances in the interpretation of visual scenes containing people. However, learning to recognise human actions and their social interactions in an unconstrained real-world environment comprising numerous people, with potentially highly unbalanced and long-tailed distributed action labels from a stream of sensory d… ▽ More

    Submitted 23 November, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

  8. TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild

    Authors: Vida Adeli, Mahsa Ehsanpour, Ian Reid, Juan Carlos Niebles, Silvio Savarese, Ehsan Adeli, Hamid Rezatofighi

    Abstract: Joint forecasting of human trajectory and pose dynamics is a fundamental building block of various applications ranging from robotics and autonomous driving to surveillance systems. Predicting body dynamics requires capturing subtle information embedded in the humans' interactions with each other and with the objects present in the scene. In this paper, we propose a novel TRajectory and POse Dynam… ▽ More

    Submitted 27 August, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

    Journal ref: IEEE/CVF International Conference on Computer Vision, pp. 13390-13400. 2021

  9. arXiv:2103.15793  [pdf, other

    cs.RO cs.AI

    LASER: Learning a Latent Action Space for Efficient Reinforcement Learning

    Authors: Arthur Allshire, Roberto Martín-Martín, Charles Lin, Shawn Manuel, Silvio Savarese, Animesh Garg

    Abstract: The process of learning a manipulation task depends strongly on the action space used for exploration: posed in the incorrect action space, solving a task with reinforcement learning can be drastically inefficient. Additionally, similar tasks or instances of the same task family impose latent manifold constraints on the most effective action space: the task family can be best solved with actions i… ▽ More

    Submitted 30 March, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

    Comments: Accepted as a conference paper at ICRA 2021. 7 pages, 8 figures

  10. arXiv:2103.00375  [pdf, other

    cs.RO cs.AI cs.LG

    Generalization Through Hand-Eye Coordination: An Action Space for Learning Spatially-Invariant Visuomotor Control

    Authors: Chen Wang, Rui Wang, Ajay Mandlekar, Li Fei-Fei, Silvio Savarese, Danfei Xu

    Abstract: Imitation Learning (IL) is an effective framework to learn visuomotor skills from offline demonstration data. However, IL methods often fail to generalize to new scene configurations not covered by training data. On the other hand, humans can manipulate objects in varying conditions. Key to such capability is hand-eye coordination, a cognitive ability that enables humans to adaptively direct their… ▽ More

    Submitted 16 August, 2021; v1 submitted 27 February, 2021; originally announced March 2021.

    Comments: First two authors contributed equally

  11. arXiv:2102.10809  [pdf, other

    cs.LG

    Local Calibration: Metrics and Recalibration

    Authors: Rachel Luo, Aadyot Bhatnagar, Yu Bai, Shengjia Zhao, Huan Wang, Caiming Xiong, Silvio Savarese, Stefano Ermon, Edward Schmerling, Marco Pavone

    Abstract: Probabilistic classifiers output confidence scores along with their predictions, and these confidence scores should be calibrated, i.e., they should reflect the reliability of the prediction. Confidence scores that minimize standard metrics such as the expected calibration error (ECE) accurately measure the reliability on average across the entire population. However, it is in general impossible t… ▽ More

    Submitted 18 August, 2022; v1 submitted 22 February, 2021; originally announced February 2021.

  12. Embodied Intelligence via Learning and Evolution

    Authors: Agrim Gupta, Silvio Savarese, Surya Ganguli, Li Fei-Fei

    Abstract: The intertwined processes of learning and evolution in complex environmental niches have resulted in a remarkable diversity of morphological forms. Moreover, many aspects of animal intelligence are deeply embodied in these evolved morphologies. However, the principles governing relations between environmental complexity, evolved morphology, and the learnability of intelligent control, remain elusi… ▽ More

    Submitted 3 February, 2021; originally announced February 2021.

    Comments: Video available at https://youtu.be/MMrIiNavkuY

  13. Towards new servo control algorithms at the TNG telescope

    Authors: P. Schipani, M. Gonzalez, F. Perrotta, S. Savarese, M. Colapietro, A. Ghedina, M. Hernandez Diaz, H. Ventura

    Abstract: The servo control algorithms of the TNG, developed in the nineties, have been working for more than 20 years with no major updates. The original hardware was based on a VME-bus based platform running a real time operating system, a rather popular choice for similar applications at the time. Recently, the obsolescence of the hardware and the lack of spares pushed the observatory towards a complete… ▽ More

    Submitted 2 January, 2021; originally announced January 2021.

    Comments: Proc. SPIE 11445, Ground-based and Airborne Telescopes VIII, 1144552 (2020)

  14. arXiv:2012.15597  [pdf, other

    astro-ph.IM

    An Automated Pipeline for the VST Data Log Analysis

    Authors: Salvatore Savarese, Pietro Schipani, Giulio Capasso, Mirko Colapietro, Sergio D'Orsi, Laurent Marty, Francesco Perrotta

    Abstract: The VST Telescope Control Software logs continuously detailed information about the telescope and instrument operations. Commands, telemetries, errors, weather conditions and anything may be relevant for the instrument maintenance and the identification of problem sources is regularly saved. All information are recorded in textual form. These log files are often examined individually by the observ… ▽ More

    Submitted 4 January, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: 4 pages, 2 figures, submitted to the proceedings of the Astronomical Data Analysis Software and Systems (ADASS) XXX. This is a replacement to correct a problem with the pdflatex command which prevented to obtain the correct pdf output

  15. arXiv:2012.12722  [pdf

    astro-ph.IM

    Progress and tests on the Instrument Control Electronics for SOXS

    Authors: M. Colapietro, G. Capasso, S. D'Orsi, P. Schipani, L. Marty, S. Savarese, I. Coretti, S. Campana, R. Claudi, M. Aliverti, A. Baruffolo, S. Ben-Ami, F. Biondi, R. Cosentino, F. D'Alessio, P. D'Avanzo, O. Hershko, H. Kuncarayakti, M. Landoni, M. Munari, G. Pignata, A. Rubin, S. Scuderi, F. Vitali, D. Young , et al. (24 additional authors not shown)

    Abstract: The forthcoming SOXS (Son Of X-Shooter) will be a new spectroscopic facility for the ESO New Technology Telescope in La Silla, focused on transient events and able to cover both the UV-VIS and NIR bands. The instrument passed the Final Design Review in 2018 and is currently in manufacturing and integration phase. This paper is focused on the assembly and testing of the instrument control electroni… ▽ More

    Submitted 23 December, 2020; originally announced December 2020.

    Comments: 10 pages, 12 figures, presented at SPIE

  16. Development status of the SOXS spectrograph for the ESO-NTT telescope

    Authors: P. Schipani, S. Campana, R. Claudi, M. Aliverti, A. Baruffolo, S. Ben-Ami, F. Biondi, G. Capasso, R. Cosentino, F. D'Alessio, P. D'Avanzo, O. Hershko, H. Kuncarayakti, M. Landoni, M. Munari, G. Pignata, A. Rubin, S. Scuderi, F. Vitali, D. Young, J. Achren, J. A. Araiza-Duran, I. Arcavi, A. Brucalassi, R. Bruch , et al. (29 additional authors not shown)

    Abstract: SOXS (Son Of X-Shooter) is a single object spectrograph, characterized by offering a wide simultaneous spectral coverage from U- to H-band, built by an international consortium for the 3.6-m ESO New Technology Telescope at the La Silla Observatory, in the Southern part of the Chilean Atacama Desert. The consortium is focussed on a clear scientific goal: the spectrograph will observe all kind of tr… ▽ More

    Submitted 23 December, 2020; originally announced December 2020.

    Comments: Proc SPIE Volume 11447, Ground-based and Airborne Instrumentation for Astronomy VIII, 1144709,2020

  17. arXiv:2012.06738  [pdf, other

    cs.RO cs.AI cs.LG

    Learning Multi-Arm Manipulation Through Collaborative Teleoperation

    Authors: Albert Tung, Josiah Wong, Ajay Mandlekar, Roberto Martín-Martín, Yuke Zhu, Li Fei-Fei, Silvio Savarese

    Abstract: Imitation Learning (IL) is a powerful paradigm to teach robots to perform manipulation tasks by allowing them to learn from human demonstrations collected via teleoperation, but has mostly been limited to single-arm manipulation. However, many real-world tasks require multiple arms, such as lifting a heavy object or assembling a desk. Unfortunately, applying IL to multi-arm manipulation tasks has… ▽ More

    Submitted 12 December, 2020; originally announced December 2020.

    Comments: First two authors contributed equally

  18. arXiv:2012.06733  [pdf, other

    cs.RO cs.AI cs.LG

    Human-in-the-Loop Imitation Learning using Remote Teleoperation

    Authors: Ajay Mandlekar, Danfei Xu, Roberto Martín-Martín, Yuke Zhu, Li Fei-Fei, Silvio Savarese

    Abstract: Imitation Learning is a promising paradigm for learning complex robot manipulation skills by reproducing behavior from human demonstrations. However, manipulation tasks often contain bottleneck regions that require a sequence of precise actions to make meaningful progress, such as a robot inserting a pod into a coffee machine to make coffee. Trained policies can fail in these regions because small… ▽ More

    Submitted 12 December, 2020; originally announced December 2020.

  19. arXiv:2012.05292  [pdf, other

    cs.RO cs.AI cs.CL cs.CV

    Topological Planning with Transformers for Vision-and-Language Navigation

    Authors: Kevin Chen, Junshen K. Chen, Jo Chuang, Marynel Vázquez, Silvio Savarese

    Abstract: Conventional approaches to vision-and-language navigation (VLN) are trained end-to-end but struggle to perform well in freely traversable environments. Inspired by the robotics community, we propose a modular approach to VLN using topological maps. Given a natural language instruction and topological map, our approach leverages attention mechanisms to predict a navigation plan in the map. The plan… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

  20. arXiv:2012.04060  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Semantic and Geometric Modeling with Neural Message Passing in 3D Scene Graphs for Hierarchical Mechanical Search

    Authors: Andrey Kurenkov, Roberto Martín-Martín, Jeff Ichnowski, Ken Goldberg, Silvio Savarese

    Abstract: Searching for objects in indoor organized environments such as homes or offices is part of our everyday activities. When looking for a target object, we jointly reason about the rooms and containers the object is likely to be in; the same type of container will have a different probability of having the target depending on the room it is in. We also combine geometric and semantic information to in… ▽ More

    Submitted 23 May, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

  21. arXiv:2012.02924  [pdf, other

    cs.AI cs.CV cs.RO

    iGibson 1.0: a Simulation Environment for Interactive Tasks in Large Realistic Scenes

    Authors: Bokui Shen, Fei Xia, Chengshu Li, Roberto Martín-Martín, Linxi Fan, Guanzhi Wang, Claudia Pérez-D'Arpino, Shyamal Buch, Sanjana Srivastava, Lyne P. Tchapmi, Micael E. Tchapmi, Kent Vainio, Josiah Wong, Li Fei-Fei, Silvio Savarese

    Abstract: We present iGibson 1.0, a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes. Our environment contains 15 fully interactive home-sized scenes with 108 rooms populated with rigid and articulated objects. The scenes are replicas of real-world homes, with distribution and the layout of objects aligned to those of the real world. iGibson 1.0… ▽ More

    Submitted 10 August, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

    Journal ref: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021)

  22. arXiv:2011.08424  [pdf, other

    cs.RO

    Deep Affordance Foresight: Planning Through What Can Be Done in the Future

    Authors: Danfei Xu, Ajay Mandlekar, Roberto Martín-Martín, Yuke Zhu, Silvio Savarese, Li Fei-Fei

    Abstract: Planning in realistic environments requires searching in large planning spaces. Affordances are a powerful concept to simplify this search, because they model what actions can be successful in a given situation. However, the classical notion of affordance is not suitable for long horizon planning because it only informs the robot about the immediate outcome of actions instead of what actions are b… ▽ More

    Submitted 23 June, 2021; v1 submitted 17 November, 2020; originally announced November 2020.

    Comments: ICRA 2021

  23. arXiv:2011.06698  [pdf, other

    cs.RO cs.CV cs.LG

    Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation

    Authors: Bryan Chen, Alexander Sax, Gene Lewis, Iro Armeni, Silvio Savarese, Amir Zamir, Jitendra Malik, Lerrel Pinto

    Abstract: Vision-based robotics often separates the control loop into one module for perception and a separate module for control. It is possible to train the whole system end-to-end (e.g. with deep RL), but doing it "from scratch" comes with a high sample complexity cost and the final result is often brittle, failing unexpectedly if the test environment differs from that of training. We study the effects… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

    Comments: Extended version of CoRL 2020 camera ready. Supplementary released separately

  24. arXiv:2010.13021  [pdf, other

    cs.RO

    Multimodal Sensor Fusion with Differentiable Filters

    Authors: Michelle A. Lee, Brent Yi, Roberto Martín-Martín, Silvio Savarese, Jeannette Bohg

    Abstract: Leveraging multimodal information with recursive Bayesian filters improves performance and robustness of state estimation, as recursive filters can combine different modalities according to their uncertainties. Prior work has studied how to optimally fuse different sensor modalities with analytical state estimation algorithms. However, deriving the dynamics and measurement models along with their… ▽ More

    Submitted 23 December, 2020; v1 submitted 24 October, 2020; originally announced October 2020.

    Comments: Published in IROS 2020. Updated sponsors, fixed Kalman gain typo

  25. arXiv:2010.08600  [pdf, other

    cs.RO cs.AI

    Robot Navigation in Constrained Pedestrian Environments using Reinforcement Learning

    Authors: Claudia Pérez-D'Arpino, Can Liu, Patrick Goebel, Roberto Martín-Martín, Silvio Savarese

    Abstract: Navigating fluently around pedestrians is a necessary capability for mobile robots deployed in human environments, such as buildings and homes. While research on social navigation has focused mainly on the scalability with the number of pedestrians in open spaces, typical indoor environments present the additional challenge of constrained spaces such as corridors and doorways that limit maneuverab… ▽ More

    Submitted 16 November, 2020; v1 submitted 16 October, 2020; originally announced October 2020.

  26. arXiv:2008.09643  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Privacy Preserving Recalibration under Domain Shift

    Authors: Rachel Luo, Shengjia Zhao, Jiaming Song, Jonathan Kuck, Stefano Ermon, Silvio Savarese

    Abstract: Classifiers deployed in high-stakes real-world applications must output calibrated confidence scores, i.e. their predicted probabilities should reflect empirical frequencies. Recalibration algorithms can greatly improve a model's probability estimates; however, existing algorithms are not applicable in real-world situations where the test data follows a different distribution from the training dat… ▽ More

    Submitted 21 August, 2020; originally announced August 2020.

  27. arXiv:2008.07792  [pdf, other

    cs.AI cs.CV cs.LG cs.RO

    ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation

    Authors: Fei Xia, Chengshu Li, Roberto Martín-Martín, Or Litany, Alexander Toshev, Silvio Savarese

    Abstract: Many Reinforcement Learning (RL) approaches use joint control signals (positions, velocities, torques) as action space for continuous control tasks. We propose to lift the action space to a higher level in the form of subgoals for a motion generator (a combination of motion planner and trajectory executor). We argue that, by lifting the action space and by leveraging sampling-based motion planners… ▽ More

    Submitted 26 March, 2021; v1 submitted 18 August, 2020; originally announced August 2020.

    Comments: First two authors contributed equally. Access project website at http://svl.stanford.edu/projects/relmogen

  28. arXiv:2008.06073  [pdf, other

    cs.AI cs.LG cs.RO

    Visuomotor Mechanical Search: Learning to Retrieve Target Objects in Clutter

    Authors: Andrey Kurenkov, Joseph Taglic, Rohun Kulkarni, Marcus Dominguez-Kuhne, Animesh Garg, Roberto Martín-Martín, Silvio Savarese

    Abstract: When searching for objects in cluttered environments, it is often necessary to perform complex interactions in order to move occluding objects out of the way and fully reveal the object of interest and make it graspable. Due to the complexity of the physics involved and the lack of accurate models of the clutter, planning and controlling precise predefined interactions with accurate outcome is ext… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

  29. arXiv:2008.03533  [pdf, other

    cs.CV

    How Trustworthy are Performance Evaluations for Basic Vision Tasks?

    Authors: Tran Thien Dat Nguyen, Hamid Rezatofighi, Ba-Ngu Vo, Ba-Tuong Vo, Silvio Savarese, Ian Reid

    Abstract: This paper examines performance evaluation criteria for basic vision tasks involving sets of objects namely, object detection, instance-level segmentation and multi-object tracking. The rankings of algorithms by an existing criterion can fluctuate with different choices of parameters, e.g. Intersection over Union (IoU) threshold, making their evaluations unreliable. More importantly, there is no m… ▽ More

    Submitted 22 July, 2022; v1 submitted 8 August, 2020; originally announced August 2020.

    Comments: Tran Thien Dat Nguyen and Hamid Rezatofighi have contributed equally

  30. arXiv:2007.07170  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Goal-Aware Prediction: Learning to Model What Matters

    Authors: Suraj Nair, Silvio Savarese, Chelsea Finn

    Abstract: Learned dynamics models combined with both planning and policy learning algorithms have shown promise in enabling artificial agents to learn to perform many diverse tasks with limited supervision. However, one of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model (future state reconstruction), and that of the downstream p… ▽ More

    Submitted 10 August, 2020; v1 submitted 14 July, 2020; originally announced July 2020.

  31. arXiv:2007.00350  [pdf, other

    cs.LG cs.RO stat.ML

    Adaptive Procedural Task Generation for Hard-Exploration Problems

    Authors: Kuan Fang, Yuke Zhu, Silvio Savarese, Li Fei-Fei

    Abstract: We introduce Adaptive Procedural Task Generation (APT-Gen), an approach to progressively generate a sequence of tasks as curricula to facilitate reinforcement learning in hard-exploration problems. At the heart of our approach, a task generator learns to create tasks from a parameterized task space via a black-box procedural generation module. To enable curriculum learning in the absence of a dire… ▽ More

    Submitted 18 March, 2021; v1 submitted 1 July, 2020; originally announced July 2020.

    Comments: ICLR 2021

  32. arXiv:2006.12356  [pdf, other

    cs.CV

    Generative Sparse Detection Networks for 3D Single-shot Object Detection

    Authors: JunYoung Gwak, Christopher Choy, Silvio Savarese

    Abstract: 3D object detection has been widely studied due to its potential applicability to many promising areas such as robotics and augmented reality. Yet, the sparse nature of the 3D data poses unique challenges to this task. Most notably, the observable surface of the 3D point clouds is disjoint from the center of the instance to ground the bounding box prediction on. To this end, we propose Generative… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

  33. arXiv:2003.09224  [pdf, other

    cs.RO

    Probabilistic Visual Navigation with Bidirectional Image Prediction

    Authors: Noriaki Hirose, Shun Taguchi, Fei Xia, Roberto Martin-Martin, Kosuke Tahara, Masanori Ishigaki, Silvio Savarese

    Abstract: Humans can robustly follow a visual trajectory defined by a sequence of images (i.e. a video) regardless of substantial changes in the environment or the presence of obstacles. We aim at endowing similar visual navigation capabilities to mobile robots solely equipped with a RGB fisheye camera. We propose a novel probabilistic visual navigation system that learns to follow a sequence of images with… ▽ More

    Submitted 18 February, 2022; v1 submitted 20 March, 2020; originally announced March 2020.

    Comments: 14 pages, 9 figures, 4 tables

    Journal ref: IROS 2021

  34. arXiv:2003.06085  [pdf, other

    cs.RO cs.AI cs.LG

    Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations

    Authors: Ajay Mandlekar, Danfei Xu, Roberto Martín-Martín, Silvio Savarese, Li Fei-Fei

    Abstract: Imitation learning is an effective and safe technique to train robot policies in the real world because it does not depend on an expensive random exploration process. However, due to the lack of exploration, learning policies that generalize beyond the demonstrated behaviors is still an open challenge. We present a novel imitation learning framework to enable robots to 1) learn complex real world… ▽ More

    Submitted 23 June, 2021; v1 submitted 12 March, 2020; originally announced March 2020.

    Comments: RSS 2020; First two authors contributed equally

  35. arXiv:2002.08397  [pdf, other

    cs.CV cs.RO

    JRMOT: A Real-Time 3D Multi-Object Tracker and a New Large-Scale Dataset

    Authors: Abhijeet Shenoi, Mihir Patel, JunYoung Gwak, Patrick Goebel, Amir Sadeghian, Hamid Rezatofighi, Roberto Martín-Martín, Silvio Savarese

    Abstract: Robots navigating autonomously need to perceive and track the motion of objects and other agents in its surroundings. This information enables planning and executing robust and safe trajectories. To facilitate these processes, the motion should be perceived in 3D Cartesian space. However, most recent multi-object tracking (MOT) research has focused on tracking people and moving objects in 2D RGB v… ▽ More

    Submitted 22 July, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

    Comments: 8 pages, 5 figures, 2 tables; Accepted at IROS 2020

  36. arXiv:1912.11121  [pdf, other

    cs.CV cs.LG cs.NE cs.RO

    Learning to Navigate Using Mid-Level Visual Priors

    Authors: Alexander Sax, Jeffrey O. Zhang, Bradley Emi, Amir Zamir, Silvio Savarese, Leonidas Guibas, Jitendra Malik

    Abstract: How much does having visual priors about the world (e.g. the fact that the world is 3D) assist in learning to perform downstream motor tasks (e.g. navigating a complex environment)? What are the consequences of not utilizing such visual priors in learning? We study these questions by integrating a generic perceptual skill set (a distance estimator, an edge detector, etc.) within a reinforcement le… ▽ More

    Submitted 23 December, 2019; originally announced December 2019.

    Comments: In Conference on Robot Learning, 2019. See project website and demos at http://perceptual.actor/

  37. arXiv:1911.05321  [pdf, other

    cs.RO cs.AI cs.LG

    IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data

    Authors: Ajay Mandlekar, Fabio Ramos, Byron Boots, Silvio Savarese, Li Fei-Fei, Animesh Garg, Dieter Fox

    Abstract: Learning from offline task demonstrations is a problem of great interest in robotics. For simple short-horizon manipulation tasks with modest variation in task instances, offline learning from a small set of demonstrations can produce controllers that successfully solve the task. However, leveraging a fixed batch of data can be problematic for larger datasets and longer-horizon tasks with greater… ▽ More

    Submitted 22 February, 2020; v1 submitted 13 November, 2019; originally announced November 2019.

  38. arXiv:1911.04052  [pdf, other

    cs.RO cs.HC cs.LG

    Scaling Robot Supervision to Hundreds of Hours with RoboTurk: Robotic Manipulation Dataset through Human Reasoning and Dexterity

    Authors: Ajay Mandlekar, Jonathan Booher, Max Spero, Albert Tung, Anchit Gupta, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei

    Abstract: Large, richly annotated datasets have accelerated progress in fields such as computer vision and natural language processing, but replicating these successes in robotics has been challenging. While prior data collection methodologies such as self-supervision have resulted in large datasets, the data can have poor signal-to-noise ratio. By contrast, previous efforts to collect task demonstrations w… ▽ More

    Submitted 10 November, 2019; originally announced November 2019.

    Comments: Published at IROS 2019

  39. arXiv:1910.14442  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Interactive Gibson Benchmark (iGibson 0.5): A Benchmark for Interactive Navigation in Cluttered Environments

    Authors: Fei Xia, William B. Shen, Chengshu Li, Priya Kasimbeg, Micael Tchapmi, Alexander Toshev, Li Fei-Fei, Roberto Martín-Martín, Silvio Savarese

    Abstract: We present Interactive Gibson Benchmark, the first comprehensive benchmark for training and evaluating Interactive Navigation: robot navigation strategies where physical interaction with objects is allowed and even encouraged to accomplish a task. For example, the robot can move objects if needed in order to clear a path leading to the goal location. Our benchmark comprises two novel elements: 1)… ▽ More

    Submitted 9 August, 2021; v1 submitted 29 October, 2019; originally announced October 2019.

    Comments: 9 pages, 8 figures. Consider citing a newer version (https://arxiv.longhoe.net/abs/2012.02924) if you are using iGibson

    Journal ref: IEEE Robotics and Automation Letters, Vol. 5, No. 2, April 2020

  40. arXiv:1910.13395  [pdf, other

    cs.RO cs.CV cs.LG

    Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation

    Authors: Kuan Fang, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei

    Abstract: The fundamental challenge of planning for multi-step manipulation is to find effective and plausible action sequences that lead to the task goal. We present Cascaded Variational Inference (CAVIN) Planner, a model-based method that hierarchically generates plans by sampling from latent spaces. To facilitate planning over long time horizons, our method learns latent representations that decouple the… ▽ More

    Submitted 17 March, 2020; v1 submitted 29 October, 2019; originally announced October 2019.

    Comments: CoRL 2019

  41. arXiv:1910.11977  [pdf, other

    cs.RO

    KETO: Learning Keypoint Representations for Tool Manipulation

    Authors: Zengyi Qin, Kuan Fang, Yuke Zhu, Li Fei-Fei, Silvio Savarese

    Abstract: We aim to develop an algorithm for robots to manipulate novel objects as tools for completing different task goals. An efficient and informative representation would facilitate the effectiveness and generalization of such algorithms. For this purpose, we present KETO, a framework of learning keypoint representations of tool-based manipulation. For each task, a set of task-specific keypoints is joi… ▽ More

    Submitted 29 October, 2019; v1 submitted 25 October, 2019; originally announced October 2019.

  42. JRDB: A Dataset and Benchmark of Egocentric Robot Visual Perception of Humans in Built Environments

    Authors: Roberto Martín-Martín, Mihir Patel, Hamid Rezatofighi, Abhijeet Shenoi, JunYoung Gwak, Eric Frankel, Amir Sadeghian, Silvio Savarese

    Abstract: We present JRDB, a novel egocentric dataset collected from our social mobile manipulator JackRabbot. The dataset includes 64 minutes of annotated multimodal sensor data including stereo cylindrical 360$^\circ$ RGB video at 15 fps, 3D point clouds from two Velodyne 16 Lidars, line 3D point clouds from two Sick Lidars, audio signal, RGB-D video at 30 fps, 360$^\circ$ spherical image from a fisheye c… ▽ More

    Submitted 24 April, 2021; v1 submitted 25 October, 2019; originally announced October 2019.

  43. arXiv:1910.11432  [pdf, other

    cs.LG cs.AI cs.CV cs.RO stat.ML

    HRL4IN: Hierarchical Reinforcement Learning for Interactive Navigation with Mobile Manipulators

    Authors: Chengshu Li, Fei Xia, Roberto Martin-Martin, Silvio Savarese

    Abstract: Most common navigation tasks in human environments require auxiliary arm interactions, e.g. opening doors, pressing buttons and pushing obstacles away. This type of navigation tasks, which we call Interactive Navigation, requires the use of mobile manipulators: mobile bases with manipulation capabilities. Interactive Navigation tasks are usually long-horizon and composed of heterogeneous phases of… ▽ More

    Submitted 24 October, 2019; originally announced October 2019.

    Comments: Conference on Robot Learning (CoRL) 2019

  44. arXiv:1910.10750  [pdf, other

    cs.CV cs.RO

    6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints

    Authors: Chen Wang, Roberto Martín-Martín, Danfei Xu, Jun Lv, Cewu Lu, Li Fei-Fei, Silvio Savarese, Yuke Zhu

    Abstract: We present 6-PACK, a deep learning approach to category-level 6D object pose tracking on RGB-D data. Our method tracks in real-time novel object instances of known object categories such as bowls, laptops, and mugs. 6-PACK learns to compactly represent an object by a handful of 3D keypoints, based on which the interframe motion of an object instance can be estimated through keypoint matching. Thes… ▽ More

    Submitted 23 October, 2019; originally announced October 2019.

  45. arXiv:1910.02527  [pdf, other

    cs.CV cs.RO

    3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera

    Authors: Iro Armeni, Zhi-Yang He, JunYoung Gwak, Amir R. Zamir, Martin Fischer, Jitendra Malik, Silvio Savarese

    Abstract: A comprehensive semantic understanding of a scene is important for many applications - but in what space should diverse semantic information (e.g., objects, scene categories, material types, texture, etc.) be grounded and what should be its structure? Aspiring to have one unified structure that hosts diverse types of semantics, we follow the Scene Graph paradigm in 3D, generating a 3D Scene Graph.… ▽ More

    Submitted 6 October, 2019; originally announced October 2019.

    Comments: ICCV 2019

  46. arXiv:1910.01751  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Causal Induction from Visual Observations for Goal Directed Tasks

    Authors: Suraj Nair, Yuke Zhu, Silvio Savarese, Li Fei-Fei

    Abstract: Causal reasoning has been an indispensable capability for humans and other intelligent animals to interact with the physical world. In this work, we propose to endow an artificial agent with the capability of causal reasoning for completing goal-directed tasks. We develop learning-based approaches to inducing causal knowledge in the form of directed acyclic graphs, which can be used to contextuali… ▽ More

    Submitted 3 October, 2019; originally announced October 2019.

    Comments: 13 pages, 6 figures

  47. arXiv:1909.13072  [pdf, other

    cs.AI cs.CV cs.LG cs.RO

    Regression Planning Networks

    Authors: Danfei Xu, Roberto Martín-Martín, De-An Huang, Yuke Zhu, Silvio Savarese, Li Fei-Fei

    Abstract: Recent learning-to-plan methods have shown promising results on planning directly from observation space. Yet, their ability to plan for long-horizon tasks is limited by the accuracy of the prediction model. On the other hand, classical symbolic planners show remarkable capabilities in solving long-horizon tasks, but they require predefined symbolic rules and symbolic states, restricting their rea… ▽ More

    Submitted 28 September, 2019; originally announced September 2019.

    Comments: Accepted at NeurIPS 2019

  48. arXiv:1909.12989  [pdf, other

    cs.LG cs.RO stat.ME

    SURREAL-System: Fully-Integrated Stack for Distributed Deep Reinforcement Learning

    Authors: Linxi Fan, Yuke Zhu, Jiren Zhu, Zihua Liu, Orien Zeng, Anchit Gupta, Joan Creus-Costa, Silvio Savarese, Li Fei-Fei

    Abstract: We present an overview of SURREAL-System, a reproducible, flexible, and scalable framework for distributed reinforcement learning (RL). The framework consists of a stack of four layers: Provisioner, Orchestrator, Protocol, and Algorithms. The Provisioner abstracts away the machine hardware and node pools across different cloud providers. The Orchestrator provides a unified interface for scheduling… ▽ More

    Submitted 11 October, 2019; v1 submitted 27 September, 2019; originally announced September 2019.

    Comments: Technical report of the SURREAL system. See more details at https://surreal.stanford.edu

  49. arXiv:1909.04121  [pdf, other

    cs.LG cs.AI stat.ML

    AC-Teach: A Bayesian Actor-Critic Method for Policy Learning with an Ensemble of Suboptimal Teachers

    Authors: Andrey Kurenkov, Ajay Mandlekar, Roberto Martin-Martin, Silvio Savarese, Animesh Garg

    Abstract: The exploration mechanism used by a Deep Reinforcement Learning (RL) agent plays a key role in determining its sample efficiency. Thus, improving over random exploration is crucial to solve long-horizon tasks with sparse rewards. We propose to leverage an ensemble of partial solutions as teachers that guide the agent's exploration with action suggestions throughout training. While the setup of lea… ▽ More

    Submitted 12 December, 2019; v1 submitted 9 September, 2019; originally announced September 2019.

    Comments: Conference on Robot Learning (CoRL) 2019

  50. arXiv:1908.09073  [pdf, other

    cs.CV

    Situational Fusion of Visual Representation for Visual Navigation

    Authors: Bokui Shen, Danfei Xu, Yuke Zhu, Leonidas J. Guibas, Li Fei-Fei, Silvio Savarese

    Abstract: A complex visual navigation task puts an agent in different situations which call for a diverse range of visual perception abilities. For example, to "go to the nearest chair", the agent might need to identify a chair in a living room using semantics, follow along a hallway using vanishing point cues, and avoid obstacles using depth. Therefore, utilizing the appropriate visual perception abilities… ▽ More

    Submitted 3 August, 2021; v1 submitted 23 August, 2019; originally announced August 2019.