Skip to main content

Showing 51–100 of 177 results for author: Fei-Fei, L

.
  1. arXiv:2109.07991  [pdf, other

    cs.RO cs.CV cs.GR cs.LG

    ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile Representations

    Authors: Ruohan Gao, Yen-Yu Chang, Shivani Mall, Li Fei-Fei, Jiajun Wu

    Abstract: Multisensory object-centric perception, reasoning, and interaction have been a key research topic in recent years. However, the progress in these directions is limited by the small set of objects available -- synthetic objects are not realistic enough and are mostly centered around geometry, while real object datasets such as YCB are often practically challenging and unstable to acquire due to int… ▽ More

    Submitted 7 November, 2021; v1 submitted 16 September, 2021; originally announced September 2021.

    Comments: In CoRL 2021. Chang and Mall contributed equally to this work. Project page: https://ai.stanford.edu/~rhgao/objectfolder/

  2. arXiv:2108.07258  [pdf, other

    cs.LG cs.AI cs.CY

    On the Opportunities and Risks of Foundation Models

    Authors: Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh , et al. (89 additional authors not shown)

    Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their cap… ▽ More

    Submitted 12 July, 2022; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Report page with citation guidelines: https://crfm.stanford.edu/report.html

  3. arXiv:2108.06038  [pdf, other

    cs.RO cs.AI

    Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

    Authors: Chen Wang, Claudia Pérez-D'Arpino, Danfei Xu, Li Fei-Fei, C. Karen Liu, Silvio Savarese

    Abstract: We present a method for learning a human-robot collaboration policy from human-human collaboration demonstrations. An effective robot assistant must learn to handle diverse human behaviors shown in the demonstrations and be robust when the humans adjust their strategies during online task execution. Our method co-optimizes a human policy and a robot policy in an interactive learning process: the h… ▽ More

    Submitted 20 September, 2023; v1 submitted 12 August, 2021; originally announced August 2021.

    Comments: CoRL 2021

  4. arXiv:2108.03332  [pdf, other

    cs.RO cs.AI cs.CV

    BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments

    Authors: Sanjana Srivastava, Chengshu Li, Michael Lingelbach, Roberto Martín-Martín, Fei Xia, Kent Vainio, Zheng Lian, Cem Gokmen, Shyamal Buch, C. Karen Liu, Silvio Savarese, Hyowon Gweon, Jiajun Wu, Li Fei-Fei

    Abstract: We introduce BEHAVIOR, a benchmark for embodied AI with 100 activities in simulation, spanning a range of everyday household chores such as cleaning, maintenance, and food preparation. These activities are designed to be realistic, diverse, and complex, aiming to reproduce the challenges that agents must face in the real world. Building such a benchmark poses three fundamental difficulties for eac… ▽ More

    Submitted 6 August, 2021; originally announced August 2021.

  5. arXiv:2108.03298  [pdf, other

    cs.RO cs.AI cs.LG

    What Matters in Learning from Offline Human Demonstrations for Robot Manipulation

    Authors: Ajay Mandlekar, Danfei Xu, Josiah Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, Roberto Martín-Martín

    Abstract: Imitating human demonstrations is a promising approach to endow robots with various manipulation capabilities. While recent advances have been made in imitation learning and batch (offline) reinforcement learning, a lack of open-source human datasets and reproducible learning methods make assessing the state of the field difficult. In this paper, we conduct an extensive study of six offline learni… ▽ More

    Submitted 24 September, 2021; v1 submitted 6 August, 2021; originally announced August 2021.

    Comments: CoRL 2021 (Oral)

  6. arXiv:2108.03272  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks

    Authors: Chengshu Li, Fei Xia, Roberto Martín-Martín, Michael Lingelbach, Sanjana Srivastava, Bokui Shen, Kent Vainio, Cem Gokmen, Gokul Dharan, Tanish Jain, Andrey Kurenkov, C. Karen Liu, Hyowon Gweon, Jiajun Wu, Li Fei-Fei, Silvio Savarese

    Abstract: Recent research in embodied AI has been boosted by the use of simulation environments to develop and train robot learning approaches. However, the use of simulation has skewed the attention to tasks that only require what robotics simulators can simulate: motion and physical contact. We present iGibson 2.0, an open-source simulation environment that supports the simulation of a more diverse set of… ▽ More

    Submitted 3 November, 2021; v1 submitted 6 August, 2021; originally announced August 2021.

    Comments: Accepted at Conference on Robot Learning (CoRL) 2021. Project website: http://svl.stanford.edu/igibson/

  7. arXiv:2107.09285  [pdf, other

    cs.CL cs.AI

    Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning

    Authors: Kaylee Burns, Christopher D. Manning, Li Fei-Fei

    Abstract: Although virtual agents are increasingly situated in environments where natural language is the most effective mode of interaction with humans, these exchanges are rarely used as an opportunity for learning. Leveraging language interactions effectively requires addressing limitations in the two most common approaches to language grounding: semantic parsers built on top of fixed object categories a… ▽ More

    Submitted 20 July, 2021; originally announced July 2021.

    Comments: 17 pages, 10 figures

    ACM Class: I.2.7

  8. arXiv:2107.02331  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering

    Authors: Siddharth Karamcheti, Ranjay Krishna, Li Fei-Fei, Christopher D. Manning

    Abstract: Active learning promises to alleviate the massive data needs of supervised machine learning: it has successfully improved sample efficiency by an order of magnitude on traditional tasks like topic classification and object recognition. However, we uncover a striking contrast to this promise: across 5 models and 4 datasets on the task of visual question answering, a wide variety of active learning… ▽ More

    Submitted 5 July, 2021; originally announced July 2021.

    Comments: Accepted at ACL-IJCNLP 2021. 17 pages, 16 Figures

  9. arXiv:2106.13935  [pdf, other

    cs.RO cs.AI cs.LG

    Discovering Generalizable Skills via Automated Generation of Diverse Tasks

    Authors: Kuan Fang, Yuke Zhu, Silvio Savarese, Li Fei-Fei

    Abstract: The learning efficiency and generalization ability of an intelligent agent can be greatly improved by utilizing a useful set of skills. However, the design of robot skills can often be intractable in real-world applications due to the prohibitive amount of effort and expertise that it requires. In this work, we introduce Skill Learning In Diversified Environments (SLIDE), a method to discover gene… ▽ More

    Submitted 25 June, 2021; originally announced June 2021.

    Comments: RSS 2021

  10. arXiv:2106.09678  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

    Authors: Linxi Fan, Guanzhi Wang, De-An Huang, Zhiding Yu, Li Fei-Fei, Yuke Zhu, Anima Anandkumar

    Abstract: Generalization has been a long-standing challenge for reinforcement learning (RL). Visual RL, in particular, can be easily distracted by irrelevant factors in high-dimensional observation space. In this work, we consider robust policy learning which targets zero-shot generalization to unseen visual environments with large distributional shift. We propose SECANT, a novel self-expert cloning techniq… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: ICML 2021. Website: https://linxifan.github.io/secant-site/

  11. arXiv:2106.08261  [pdf, other

    cs.AI cs.CV

    Physion: Evaluating Physical Prediction from Vision in Humans and Machines

    Authors: Daniel M. Bear, Elias Wang, Damian Mrowca, Felix J. Binder, Hsiao-Yu Fish Tung, R. T. Pramod, Cameron Holdaway, Sirui Tao, Kevin Smith, Fan-Yun Sun, Li Fei-Fei, Nancy Kanwisher, Joshua B. Tenenbaum, Daniel L. K. Yamins, Judith E. Fan

    Abstract: While current vision algorithms excel at many challenging tasks, it is unclear how well they understand the physical dynamics of real-world environments. Here we introduce Physion, a dataset and benchmark for rigorously evaluating the ability to predict how physical scenarios will evolve over time. Our dataset features realistic simulations of a wide range of physical phenomena, including rigid an… ▽ More

    Submitted 20 June, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: 28 pages

    ACM Class: I.2.10; I.4.8; I.5

  12. arXiv:2106.06047  [pdf, other

    cs.LG cs.CV

    Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning

    Authors: Liangqiong Qu, Yuyin Zhou, Paul Pu Liang, Yingda Xia, Feifei Wang, Ehsan Adeli, Li Fei-Fei, Daniel Rubin

    Abstract: Federated learning is an emerging research paradigm enabling collaborative training of machine learning models among different organizations while kee** data private at each institution. Despite recent progress, there remain fundamental challenges such as the lack of convergence and the potential for catastrophic forgetting across real-world heterogeneous devices. In this paper, we demonstrate t… ▽ More

    Submitted 13 April, 2022; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Published as a conference paper at CVPR 2022

  13. arXiv:2104.09052  [pdf, other

    cs.LG

    Metadata Normalization

    Authors: Mandy Lu, Qingyu Zhao, Jiequan Zhang, Kilian M. Pohl, Li Fei-Fei, Juan Carlos Niebles, Ehsan Adeli

    Abstract: Batch Normalization (BN) and its variants have delivered tremendous success in combating the covariate shift induced by the training step of deep learning methods. While these techniques normalize feature distributions by standardizing with batch statistics, they do not correct the influence on features from extraneous variables or multiple distributions. Such extra variables, referred to as metad… ▽ More

    Submitted 5 May, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

    Comments: Accepted to CVPR 2021. Project page: https://mml.stanford.edu/MDN/

  14. arXiv:2103.06191  [pdf, other

    cs.CV

    A Study of Face Obfuscation in ImageNet

    Authors: Kaiyu Yang, Jacqueline Yau, Li Fei-Fei, Jia Deng, Olga Russakovsky

    Abstract: Face obfuscation (blurring, mosaicing, etc.) has been shown to be effective for privacy protection; nevertheless, object recognition research typically assumes access to complete, unobfuscated images. In this paper, we explore the effects of face obfuscation on the popular ImageNet challenge visual recognition benchmark. Most categories in the ImageNet challenge are not people categories; however,… ▽ More

    Submitted 9 June, 2022; v1 submitted 10 March, 2021; originally announced March 2021.

    Comments: Accepted to ICML 2022

  15. arXiv:2103.04174  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction

    Authors: Bohan Wu, Suraj Nair, Roberto Martin-Martin, Li Fei-Fei, Chelsea Finn

    Abstract: A video prediction model that generalizes to diverse scenes would enable intelligent agents such as robots to perform a variety of tasks via planning with the model. However, while existing video prediction models have produced promising results on small datasets, they suffer from severe underfitting when trained on large and diverse datasets. To address this underfitting challenge, we first obser… ▽ More

    Submitted 19 June, 2021; v1 submitted 6 March, 2021; originally announced March 2021.

    Comments: Equal advising and contribution for last two authors

  16. arXiv:2103.00375  [pdf, other

    cs.RO cs.AI cs.LG

    Generalization Through Hand-Eye Coordination: An Action Space for Learning Spatially-Invariant Visuomotor Control

    Authors: Chen Wang, Rui Wang, Ajay Mandlekar, Li Fei-Fei, Silvio Savarese, Danfei Xu

    Abstract: Imitation Learning (IL) is an effective framework to learn visuomotor skills from offline demonstration data. However, IL methods often fail to generalize to new scene configurations not covered by training data. On the other hand, humans can manipulate objects in varying conditions. Key to such capability is hand-eye coordination, a cognitive ability that enables humans to adaptively direct their… ▽ More

    Submitted 16 August, 2021; v1 submitted 27 February, 2021; originally announced March 2021.

    Comments: First two authors contributed equally

  17. Embodied Intelligence via Learning and Evolution

    Authors: Agrim Gupta, Silvio Savarese, Surya Ganguli, Li Fei-Fei

    Abstract: The intertwined processes of learning and evolution in complex environmental niches have resulted in a remarkable diversity of morphological forms. Moreover, many aspects of animal intelligence are deeply embodied in these evolved morphologies. However, the principles governing relations between environmental complexity, evolved morphology, and the learnability of intelligent control, remain elusi… ▽ More

    Submitted 3 February, 2021; originally announced February 2021.

    Comments: Video available at https://youtu.be/MMrIiNavkuY

  18. arXiv:2012.06738  [pdf, other

    cs.RO cs.AI cs.LG

    Learning Multi-Arm Manipulation Through Collaborative Teleoperation

    Authors: Albert Tung, Josiah Wong, Ajay Mandlekar, Roberto Martín-Martín, Yuke Zhu, Li Fei-Fei, Silvio Savarese

    Abstract: Imitation Learning (IL) is a powerful paradigm to teach robots to perform manipulation tasks by allowing them to learn from human demonstrations collected via teleoperation, but has mostly been limited to single-arm manipulation. However, many real-world tasks require multiple arms, such as lifting a heavy object or assembling a desk. Unfortunately, applying IL to multi-arm manipulation tasks has… ▽ More

    Submitted 12 December, 2020; originally announced December 2020.

    Comments: First two authors contributed equally

  19. arXiv:2012.06733  [pdf, other

    cs.RO cs.AI cs.LG

    Human-in-the-Loop Imitation Learning using Remote Teleoperation

    Authors: Ajay Mandlekar, Danfei Xu, Roberto Martín-Martín, Yuke Zhu, Li Fei-Fei, Silvio Savarese

    Abstract: Imitation Learning is a promising paradigm for learning complex robot manipulation skills by reproducing behavior from human demonstrations. However, manipulation tasks often contain bottleneck regions that require a sequence of precise actions to make meaningful progress, such as a robot inserting a pod into a coffee machine to make coffee. Trained policies can fail in these regions because small… ▽ More

    Submitted 12 December, 2020; originally announced December 2020.

  20. arXiv:2012.02924  [pdf, other

    cs.AI cs.CV cs.RO

    iGibson 1.0: a Simulation Environment for Interactive Tasks in Large Realistic Scenes

    Authors: Bokui Shen, Fei Xia, Chengshu Li, Roberto Martín-Martín, Linxi Fan, Guanzhi Wang, Claudia Pérez-D'Arpino, Shyamal Buch, Sanjana Srivastava, Lyne P. Tchapmi, Micael E. Tchapmi, Kent Vainio, Josiah Wong, Li Fei-Fei, Silvio Savarese

    Abstract: We present iGibson 1.0, a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes. Our environment contains 15 fully interactive home-sized scenes with 108 rooms populated with rigid and articulated objects. The scenes are replicas of real-world homes, with distribution and the layout of objects aligned to those of the real world. iGibson 1.0… ▽ More

    Submitted 10 August, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

    Journal ref: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021)

  21. arXiv:2011.08424  [pdf, other

    cs.RO

    Deep Affordance Foresight: Planning Through What Can Be Done in the Future

    Authors: Danfei Xu, Ajay Mandlekar, Roberto Martín-Martín, Yuke Zhu, Silvio Savarese, Li Fei-Fei

    Abstract: Planning in realistic environments requires searching in large planning spaces. Affordances are a powerful concept to simplify this search, because they model what actions can be successful in a given situation. However, the classical notion of affordance is not suitable for long horizon planning because it only informs the robot about the immediate outcome of actions instead of what actions are b… ▽ More

    Submitted 23 June, 2021; v1 submitted 17 November, 2020; originally announced November 2020.

    Comments: ICRA 2021

  22. arXiv:2008.02311  [pdf, other

    cs.HC cs.AI

    Conceptual Metaphors Impact Perceptions of Human-AI Collaboration

    Authors: Pranav Khadpe, Ranjay Krishna, Li Fei-Fei, Jeffrey Hancock, Michael Bernstein

    Abstract: With the emergence of conversational artificial intelligence (AI) agents, it is important to understand the mechanisms that influence users' experiences of these agents. We study a common tool in the designer's toolkit: conceptual metaphors. Metaphors can present an agent as akin to a wry teenager, a toddler, or an experienced butler. How might a choice of metaphor influence our experience of the… ▽ More

    Submitted 5 August, 2020; originally announced August 2020.

    Comments: CSCW 2020

    Journal ref: PACM HCI Volume 4 CSCW 2, 2020

  23. arXiv:2007.08920  [pdf, other

    cs.CV cs.LG eess.IV

    Vision-based Estimation of MDS-UPDRS Gait Scores for Assessing Parkinson's Disease Motor Severity

    Authors: Mandy Lu, Kathleen Poston, Adolf Pfefferbaum, Edith V. Sullivan, Li Fei-Fei, Kilian M. Pohl, Juan Carlos Niebles, Ehsan Adeli

    Abstract: Parkinson's disease (PD) is a progressive neurological disorder primarily affecting motor function resulting in tremor at rest, rigidity, bradykinesia, and postural instability. The physical severity of PD impairments can be quantified through the Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS), a widely used clinical rating scale. Accurate and quantitative assessmen… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

    Comments: Accepted as a conference paper at MICCAI (Medical Image Computing and Computer Assisted Intervention), Lima, Peru, October 2020. 11 pages, LaTeX

  24. arXiv:2007.00350  [pdf, other

    cs.LG cs.RO stat.ML

    Adaptive Procedural Task Generation for Hard-Exploration Problems

    Authors: Kuan Fang, Yuke Zhu, Silvio Savarese, Li Fei-Fei

    Abstract: We introduce Adaptive Procedural Task Generation (APT-Gen), an approach to progressively generate a sequence of tasks as curricula to facilitate reinforcement learning in hard-exploration problems. At the heart of our approach, a task generator learns to create tasks from a parameterized task space via a black-box procedural generation module. To enable curriculum learning in the absence of a dire… ▽ More

    Submitted 18 March, 2021; v1 submitted 1 July, 2020; originally announced July 2020.

    Comments: ICLR 2021

  25. arXiv:2006.12373  [pdf, other

    cs.CV cs.LG

    Learning Physical Graph Representations from Visual Scenes

    Authors: Daniel M. Bear, Chaofei Fan, Damian Mrowca, Yunzhu Li, Seth Alter, Aran Nayebi, Jeremy Schwartz, Li Fei-Fei, Jiajun Wu, Joshua B. Tenenbaum, Daniel L. K. Yamins

    Abstract: Convolutional Neural Networks (CNNs) have proved exceptional at learning representations for visual object categorization. However, CNNs do not explicitly encode objects, parts, and their physical properties, which has limited CNNs' success on tasks that require structured understanding of visual scenes. To overcome these limitations, we introduce the idea of Physical Scene Graphs (PSGs), which re… ▽ More

    Submitted 24 June, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: 23 pages; corrected affiliations and acknowledgments

    ACM Class: I.4.8; I.2.6

  26. arXiv:2003.06085  [pdf, other

    cs.RO cs.AI cs.LG

    Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations

    Authors: Ajay Mandlekar, Danfei Xu, Roberto Martín-Martín, Silvio Savarese, Li Fei-Fei

    Abstract: Imitation learning is an effective and safe technique to train robot policies in the real world because it does not depend on an expensive random exploration process. However, due to the lack of exploration, learning policies that generalize beyond the demonstrated behaviors is still an open challenge. We present a novel imitation learning framework to enable robots to 1) learn complex real world… ▽ More

    Submitted 23 June, 2021; v1 submitted 12 March, 2020; originally announced March 2020.

    Comments: RSS 2020; First two authors contributed equally

  27. Towards Fairer Datasets: Filtering and Balancing the Distribution of the People Subtree in the ImageNet Hierarchy

    Authors: Kaiyu Yang, Klint Qinami, Li Fei-Fei, Jia Deng, Olga Russakovsky

    Abstract: Computer vision technology is being used by many but remains representative of only a few. People have reported misbehavior of computer vision models, including offensive prediction results and lower performance for underrepresented groups. Current computer vision models are typically developed using datasets consisting of manually annotated images or videos; the data and label distributions in th… ▽ More

    Submitted 16 December, 2019; originally announced December 2019.

    Comments: Accepted to FAT* 2020

  28. arXiv:1912.06992  [pdf, other

    cs.CV

    Action Genome: Actions as Composition of Spatio-temporal Scene Graphs

    Authors: **gwei Ji, Ranjay Krishna, Li Fei-Fei, Juan Carlos Niebles

    Abstract: Action recognition has typically treated actions and activities as monolithic events that occur in videos. However, there is evidence from Cognitive Science and Neuroscience that people actively encode activities into consistent hierarchical part structures. However in Computer Vision, few explorations on representations encoding event partonomies have been made. Inspired by evidence that the prot… ▽ More

    Submitted 15 December, 2019; originally announced December 2019.

  29. arXiv:1912.01119  [pdf, other

    cs.CV cs.CL

    Deep Bayesian Active Learning for Multiple Correct Outputs

    Authors: Khaled Jedoui, Ranjay Krishna, Michael Bernstein, Li Fei-Fei

    Abstract: Typical active learning strategies are designed for tasks, such as classification, with the assumption that the output space is mutually exclusive. The assumption that these tasks always have exactly one correct answer has resulted in the creation of numerous uncertainty-based measurements, such as entropy and least confidence, which operate over a model's outputs. Unfortunately, many real-world v… ▽ More

    Submitted 8 December, 2019; v1 submitted 2 December, 2019; originally announced December 2019.

    Comments: 18 pages, 9 figures

  30. arXiv:1911.05864  [pdf, other

    cs.RO cs.AI cs.CV

    Motion Reasoning for Goal-Based Imitation Learning

    Authors: De-An Huang, Yu-Wei Chao, Chris Paxton, Xinke Deng, Li Fei-Fei, Juan Carlos Niebles, Animesh Garg, Dieter Fox

    Abstract: We address goal-based imitation learning, where the aim is to output the symbolic goal from a third-person video demonstration. This enables the robot to plan for execution and reproduce the same goal in a completely different environment. The key challenge is that the goal of a video demonstration is often ambiguous at the level of semantic actions. The human demonstrators might unintentionally a… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

  31. arXiv:1911.05321  [pdf, other

    cs.RO cs.AI cs.LG

    IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data

    Authors: Ajay Mandlekar, Fabio Ramos, Byron Boots, Silvio Savarese, Li Fei-Fei, Animesh Garg, Dieter Fox

    Abstract: Learning from offline task demonstrations is a problem of great interest in robotics. For simple short-horizon manipulation tasks with modest variation in task instances, offline learning from a small set of demonstrations can produce controllers that successfully solve the task. However, leveraging a fixed batch of data can be problematic for larger datasets and longer-horizon tasks with greater… ▽ More

    Submitted 22 February, 2020; v1 submitted 13 November, 2019; originally announced November 2019.

  32. arXiv:1911.04052  [pdf, other

    cs.RO cs.HC cs.LG

    Scaling Robot Supervision to Hundreds of Hours with RoboTurk: Robotic Manipulation Dataset through Human Reasoning and Dexterity

    Authors: Ajay Mandlekar, Jonathan Booher, Max Spero, Albert Tung, Anchit Gupta, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei

    Abstract: Large, richly annotated datasets have accelerated progress in fields such as computer vision and natural language processing, but replicating these successes in robotics has been challenging. While prior data collection methodologies such as self-supervision have resulted in large datasets, the data can have poor signal-to-noise ratio. By contrast, previous efforts to collect task demonstrations w… ▽ More

    Submitted 10 November, 2019; originally announced November 2019.

    Comments: Published at IROS 2019

  33. arXiv:1910.14442  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Interactive Gibson Benchmark (iGibson 0.5): A Benchmark for Interactive Navigation in Cluttered Environments

    Authors: Fei Xia, William B. Shen, Chengshu Li, Priya Kasimbeg, Micael Tchapmi, Alexander Toshev, Li Fei-Fei, Roberto Martín-Martín, Silvio Savarese

    Abstract: We present Interactive Gibson Benchmark, the first comprehensive benchmark for training and evaluating Interactive Navigation: robot navigation strategies where physical interaction with objects is allowed and even encouraged to accomplish a task. For example, the robot can move objects if needed in order to clear a path leading to the goal location. Our benchmark comprises two novel elements: 1)… ▽ More

    Submitted 9 August, 2021; v1 submitted 29 October, 2019; originally announced October 2019.

    Comments: 9 pages, 8 figures. Consider citing a newer version (https://arxiv.longhoe.net/abs/2012.02924) if you are using iGibson

    Journal ref: IEEE Robotics and Automation Letters, Vol. 5, No. 2, April 2020

  34. arXiv:1910.13395  [pdf, other

    cs.RO cs.CV cs.LG

    Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation

    Authors: Kuan Fang, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei

    Abstract: The fundamental challenge of planning for multi-step manipulation is to find effective and plausible action sequences that lead to the task goal. We present Cascaded Variational Inference (CAVIN) Planner, a model-based method that hierarchically generates plans by sampling from latent spaces. To facilitate planning over long time horizons, our method learns latent representations that decouple the… ▽ More

    Submitted 17 March, 2020; v1 submitted 29 October, 2019; originally announced October 2019.

    Comments: CoRL 2019

  35. arXiv:1910.11977  [pdf, other

    cs.RO

    KETO: Learning Keypoint Representations for Tool Manipulation

    Authors: Zengyi Qin, Kuan Fang, Yuke Zhu, Li Fei-Fei, Silvio Savarese

    Abstract: We aim to develop an algorithm for robots to manipulate novel objects as tools for completing different task goals. An efficient and informative representation would facilitate the effectiveness and generalization of such algorithms. For this purpose, we present KETO, a framework of learning keypoint representations of tool-based manipulation. For each task, a set of task-specific keypoints is joi… ▽ More

    Submitted 29 October, 2019; v1 submitted 25 October, 2019; originally announced October 2019.

  36. arXiv:1910.10750  [pdf, other

    cs.CV cs.RO

    6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints

    Authors: Chen Wang, Roberto Martín-Martín, Danfei Xu, Jun Lv, Cewu Lu, Li Fei-Fei, Silvio Savarese, Yuke Zhu

    Abstract: We present 6-PACK, a deep learning approach to category-level 6D object pose tracking on RGB-D data. Our method tracks in real-time novel object instances of known object categories such as bowls, laptops, and mugs. 6-PACK learns to compactly represent an object by a handful of 3D keypoints, based on which the interframe motion of an object instance can be estimated through keypoint matching. Thes… ▽ More

    Submitted 23 October, 2019; originally announced October 2019.

  37. arXiv:1910.03676  [pdf, other

    cs.CV cs.LG

    Representation Learning with Statistical Independence to Mitigate Bias

    Authors: Ehsan Adeli, Qingyu Zhao, Adolf Pfefferbaum, Edith V. Sullivan, Li Fei-Fei, Juan Carlos Niebles, Kilian M. Pohl

    Abstract: Presence of bias (in datasets or tasks) is inarguably one of the most critical challenges in machine learning applications that has alluded to pivotal debates in recent years. Such challenges range from spurious associations between variables in medical studies to the bias of race in gender or face recognition systems. Controlling for all types of biases in the dataset curation stage is cumbersome… ▽ More

    Submitted 20 November, 2020; v1 submitted 8 October, 2019; originally announced October 2019.

    Comments: WACV 2021

  38. arXiv:1910.01751  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Causal Induction from Visual Observations for Goal Directed Tasks

    Authors: Suraj Nair, Yuke Zhu, Silvio Savarese, Li Fei-Fei

    Abstract: Causal reasoning has been an indispensable capability for humans and other intelligent animals to interact with the physical world. In this work, we propose to endow an artificial agent with the capability of causal reasoning for completing goal-directed tasks. We develop learning-based approaches to inducing causal knowledge in the form of directed acyclic graphs, which can be used to contextuali… ▽ More

    Submitted 3 October, 2019; originally announced October 2019.

    Comments: 13 pages, 6 figures

  39. arXiv:1909.13072  [pdf, other

    cs.AI cs.CV cs.LG cs.RO

    Regression Planning Networks

    Authors: Danfei Xu, Roberto Martín-Martín, De-An Huang, Yuke Zhu, Silvio Savarese, Li Fei-Fei

    Abstract: Recent learning-to-plan methods have shown promising results on planning directly from observation space. Yet, their ability to plan for long-horizon tasks is limited by the accuracy of the prediction model. On the other hand, classical symbolic planners show remarkable capabilities in solving long-horizon tasks, but they require predefined symbolic rules and symbolic states, restricting their rea… ▽ More

    Submitted 28 September, 2019; originally announced September 2019.

    Comments: Accepted at NeurIPS 2019

  40. arXiv:1909.13003  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs

    Authors: Yunbo Wang, Bo Liu, Jiajun Wu, Yuke Zhu, Simon S. Du, Li Fei-Fei, Joshua B. Tenenbaum

    Abstract: A major difficulty of solving continuous POMDPs is to infer the multi-modal distribution of the unobserved true states and to make the planning algorithm dependent on the perceived uncertainty. We cast POMDP filtering and planning problems as two closely related Sequential Monte Carlo (SMC) processes, one over the real states and the other over the future optimal trajectories, and combine the meri… ▽ More

    Submitted 7 May, 2020; v1 submitted 27 September, 2019; originally announced September 2019.

    Comments: IJCAI 2020

  41. arXiv:1909.12989  [pdf, other

    cs.LG cs.RO stat.ME

    SURREAL-System: Fully-Integrated Stack for Distributed Deep Reinforcement Learning

    Authors: Linxi Fan, Yuke Zhu, Jiren Zhu, Zihua Liu, Orien Zeng, Anchit Gupta, Joan Creus-Costa, Silvio Savarese, Li Fei-Fei

    Abstract: We present an overview of SURREAL-System, a reproducible, flexible, and scalable framework for distributed reinforcement learning (RL). The framework consists of a stack of four layers: Provisioner, Orchestrator, Protocol, and Algorithms. The Provisioner abstracts away the machine hardware and node pools across different cloud providers. The Orchestrator provides a unified interface for scheduling… ▽ More

    Submitted 11 October, 2019; v1 submitted 27 September, 2019; originally announced September 2019.

    Comments: Technical report of the SURREAL system. See more details at https://surreal.stanford.edu

  42. arXiv:1908.09073  [pdf, other

    cs.CV

    Situational Fusion of Visual Representation for Visual Navigation

    Authors: Bokui Shen, Danfei Xu, Yuke Zhu, Leonidas J. Guibas, Li Fei-Fei, Silvio Savarese

    Abstract: A complex visual navigation task puts an agent in different situations which call for a diverse range of visual perception abilities. For example, to "go to the nearest chair", the agent might need to identify a chair in a living room using semantics, follow along a hallway using vanishing point cues, and avoid obstacles using depth. Therefore, utilizing the appropriate visual perception abilities… ▽ More

    Submitted 3 August, 2021; v1 submitted 23 August, 2019; originally announced August 2019.

  43. arXiv:1908.06769  [pdf, other

    cs.AI cs.LG cs.RO

    Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning

    Authors: De-An Huang, Danfei Xu, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei, Juan Carlos Niebles

    Abstract: We address one-shot imitation learning, where the goal is to execute a previously unseen task based on a single demonstration. While there has been exciting progress in this direction, most of the approaches still require a few hundred tasks for meta-training, which limits the scalability of the approaches. Our main contribution is to formulate one-shot imitation learning as a symbolic planning pr… ▽ More

    Submitted 4 November, 2019; v1 submitted 16 August, 2019; originally announced August 2019.

    Comments: IROS 2019

  44. arXiv:1907.13098  [pdf, other

    cs.RO cs.LG

    Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks

    Authors: Michelle A. Lee, Yuke Zhu, Peter Zachares, Matthew Tan, Krishnan Srinivasan, Silvio Savarese, Li Fei-Fei, Animesh Garg, Jeannette Bohg

    Abstract: Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. It is non-trivial to manually design a robot controller that combines these modalities which have very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to deploy… ▽ More

    Submitted 27 July, 2019; originally announced July 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1810.10191

  45. arXiv:1907.01172  [pdf, other

    cs.CV

    Procedure Planning in Instructional Videos

    Authors: Chien-Yi Chang, De-An Huang, Danfei Xu, Ehsan Adeli, Li Fei-Fei, Juan Carlos Niebles

    Abstract: In this paper, we study the problem of procedure planning in instructional videos, which can be seen as a step towards enabling autonomous agents to plan for complex tasks in everyday settings such as cooking. Given the current visual observation of the world and a visual goal, we ask the question "What actions need to be taken in order to achieve the goal?". The key technical challenge is to lear… ▽ More

    Submitted 13 April, 2020; v1 submitted 2 July, 2019; originally announced July 2019.

    Comments: 14 pages, 7 figures

  46. arXiv:1906.04876  [pdf, other

    cs.CV

    Learning Predicates as Functions to Enable Few-shot Scene Graph Prediction

    Authors: Apoorva Dornadula, Austin Narcomey, Ranjay Krishna, Michael Bernstein, Li Fei-Fei

    Abstract: Scene graph prediction --- classifying the set of objects and predicates in a visual scene --- requires substantial training data. However, most predicates only occur a handful of times making them difficult to learn. We introduce the first scene graph prediction model that supports few-shot learning of predicates. Existing scene graph generation models represent objects using pretrained object de… ▽ More

    Submitted 5 December, 2019; v1 submitted 11 June, 2019; originally announced June 2019.

    Comments: 14 pages, 10 figures, preprint

  47. arXiv:1904.11622  [pdf, other

    cs.CV cs.AI

    Scene Graph Prediction with Limited Labels

    Authors: Vincent S. Chen, Paroma Varma, Ranjay Krishna, Michael Bernstein, Christopher Re, Li Fei-Fei

    Abstract: Visual knowledge bases such as Visual Genome power numerous applications in computer vision, including visual question answering and captioning, but suffer from sparse, incomplete relationships. All scene graph models to date are limited to training on a small set of visual relationships that have thousands of training labels each. Hiring human annotators is expensive, and using textual knowledge… ▽ More

    Submitted 30 November, 2019; v1 submitted 25 April, 2019; originally announced April 2019.

    Comments: ICCV 2019, 10 pages, 9 figures

    Journal ref: International Conference on Computer Vision, 2019

  48. arXiv:1904.01121  [pdf, other

    cs.CV cs.HC cs.LG

    HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models

    Authors: Sharon Zhou, Mitchell L. Gordon, Ranjay Krishna, Austin Narcomey, Li Fei-Fei, Michael S. Bernstein

    Abstract: Generative models often use human evaluations to measure the perceived quality of their outputs. Automated metrics are noisy indirect proxies, because they rely on heuristics or pretrained embeddings. However, up until now, direct human evaluation strategies have been ad-hoc, neither standardized nor validated. Our work establishes a gold standard human benchmark for generative realism. We constru… ▽ More

    Submitted 31 October, 2019; v1 submitted 1 April, 2019; originally announced April 2019.

    Comments: https://hype.stanford.edu

  49. arXiv:1903.11207  [pdf, other

    cs.CV

    Information Maximizing Visual Question Generation

    Authors: Ranjay Krishna, Michael Bernstein, Li Fei-Fei

    Abstract: Though image-to-sequence generation models have become overwhelmingly popular in human-computer communications, they suffer from strongly favoring safe generic questions ("What is in this picture?"). Generating uninformative but relevant questions is not sufficient or useful. We argue that a good question is one that has a tightly focused purpose --- one that is aimed at expecting a specific type… ▽ More

    Submitted 26 March, 2019; originally announced March 2019.

    Comments: CVPR 2019

    Journal ref: IEEE Conference on Computer Vision and Pattern Recognition, 2019

  50. arXiv:1903.03878  [pdf, other

    cs.LG cs.CV cs.RO stat.ML

    Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks

    Authors: Kuan Fang, Alexander Toshev, Li Fei-Fei, Silvio Savarese

    Abstract: Many robotic applications require the agent to perform long-horizon tasks in partially observable environments. In such applications, decision making at any step can depend on observations received far in the past. Hence, being able to properly memorize and utilize the long-term history is crucial. In this work, we propose a novel memory-based policy, named Scene Memory Transformer (SMT). The prop… ▽ More

    Submitted 9 March, 2019; originally announced March 2019.

    Comments: CVPR 2019 paper with supplementary material