Skip to main content

Showing 1–17 of 17 results for author: Radosavovic, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.12945  [pdf, other

    cs.RO

    DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (74 additional authors not shown)

    Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important step** stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://droid-dataset.github.io/

  2. arXiv:2402.19469  [pdf, other

    cs.RO cs.CV cs.LG

    Humanoid Locomotion as Next Token Prediction

    Authors: Ilija Radosavovic, Bike Zhang, Baifeng Shi, Jathushan Rajasegaran, Sarthak Kamat, Trevor Darrell, Koushil Sreenath, Jitendra Malik

    Abstract: We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language. Our model is a causal transformer trained via autoregressive prediction of sensorimotor trajectories. To account for the multi-modal nature of the data, we perform prediction in a modality-aligned way, and for each input token predict the next token from the same modality. This gen… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  3. arXiv:2312.05251  [pdf, other

    cs.CV

    Reconstructing Hands in 3D with Transformers

    Authors: Georgios Pavlakos, Dandan Shan, Ilija Radosavovic, Angjoo Kanazawa, David Fouhey, Jitendra Malik

    Abstract: We present an approach that can reconstruct hands in 3D from monocular input. Our approach for Hand Mesh Recovery, HaMeR, follows a fully transformer-based architecture and can analyze hands with significantly increased accuracy and robustness compared to previous work. The key to HaMeR's success lies in scaling up both the data used for training and the capacity of the deep network for hand recon… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  4. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, A**kya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io

  5. arXiv:2306.10007  [pdf, other

    cs.RO cs.CV cs.LG

    Robot Learning with Sensorimotor Pre-training

    Authors: Ilija Radosavovic, Baifeng Shi, Letian Fu, Ken Goldberg, Trevor Darrell, Jitendra Malik

    Abstract: We present a self-supervised sensorimotor pre-training approach for robotics. Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens. Given a sequence of camera images, proprioceptive robot states, and actions, we encode the sequence into tokens, mask out a subset, and train a model to predict the missing content from the rest. We hypothesize that if a robot can… ▽ More

    Submitted 14 December, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

    Comments: CoRL 2023; Project page: https://robotic-pretrained-transformer.github.io

  6. arXiv:2303.03381  [pdf, other

    cs.RO cs.LG

    Real-World Humanoid Locomotion with Reinforcement Learning

    Authors: Ilija Radosavovic, Tete Xiao, Bike Zhang, Trevor Darrell, Jitendra Malik, Koushil Sreenath

    Abstract: Humanoid robots that can autonomously operate in diverse environments have the potential to help address labour shortages in factories, assist elderly at homes, and colonize new planets. While classical controllers for humanoid robots have shown impressive results in a number of settings, they are challenging to generalize and adapt to new environments. Here, we present a fully learning-based appr… ▽ More

    Submitted 14 December, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Project page: https://learning-humanoid-locomotion.github.io

  7. arXiv:2211.13225  [pdf, other

    cs.CV cs.LG cs.RO

    Learning to Imitate Object Interactions from Internet Videos

    Authors: Austin Patel, Andrew Wang, Ilija Radosavovic, Jitendra Malik

    Abstract: We study the problem of imitating object interactions from Internet videos. This requires understanding the hand-object interactions in 4D, spatially in 3D and over time, which is challenging due to mutual hand-object occlusions. In this paper we make two main contributions: (1) a novel reconstruction technique RHOV (Reconstructing Hands and Objects from Videos), which reconstructs 4D trajectories… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: Project page: https://austinapatel.github.io/imitate-video

  8. arXiv:2210.03109  [pdf, other

    cs.RO cs.CV cs.LG

    Real-World Robot Learning with Masked Visual Pre-training

    Authors: Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, Trevor Darrell

    Abstract: In this work, we explore self-supervised visual pre-training on images from diverse, in-the-wild videos for real-world robotic tasks. Like prior work, our visual representations are pre-trained via a masked autoencoder (MAE), frozen, and then passed into a learnable control module. Unlike prior work, we show that the pre-trained representations are effective across a range of real-world robotic ta… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: CoRL 2022; Project page: https://tetexiao.com/projects/real-mvp

  9. arXiv:2209.12892  [pdf, other

    cs.LG cs.CV

    Learning to Learn with Generative Models of Neural Network Checkpoints

    Authors: William Peebles, Ilija Radosavovic, Tim Brooks, Alexei A. Efros, Jitendra Malik

    Abstract: We explore a data-driven approach for learning to optimize neural networks. We construct a dataset of neural network checkpoints and train a generative model on the parameters. In particular, our model is a conditional diffusion transformer that, given an initial input parameter vector and a prompted loss, error, or return, predicts the distribution over parameter updates that achieve the desired… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: Code available at https://www.github.com/wpeebles/G.pt . Project page and videos available at https://www.wpeebles.com/Gpt

  10. arXiv:2203.06173  [pdf, other

    cs.CV cs.LG cs.RO

    Masked Visual Pre-training for Motor Control

    Authors: Tete Xiao, Ilija Radosavovic, Trevor Darrell, Jitendra Malik

    Abstract: This paper shows that self-supervised visual pre-training from real-world images is effective for learning motor control tasks from pixels. We first train the visual representations by masked modeling of natural images. We then freeze the visual encoder and train neural network controllers on top with reinforcement learning. We do not perform any task-specific fine-tuning of the encoder; the same… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

    Comments: Code and videos at: https://tetexiao.com/projects/mvp

  11. arXiv:2110.07058  [pdf, other

    cs.CV cs.AI

    Ego4D: Around the World in 3,000 Hours of Egocentric Video

    Authors: Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do , et al. (60 additional authors not shown)

    Abstract: We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with cons… ▽ More

    Submitted 11 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: To appear in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. This version updates the baseline result numbers for the Hands and Objects benchmark (appendix)

  12. arXiv:2012.09856  [pdf, other

    cs.CV

    Reconstructing Hand-Object Interactions in the Wild

    Authors: Zhe Cao, Ilija Radosavovic, Angjoo Kanazawa, Jitendra Malik

    Abstract: In this work we explore reconstructing hand-object interactions in the wild. The core challenge of this problem is the lack of appropriate 3D labeled data. To overcome this issue, we propose an optimization-based procedure which does not require direct 3D supervision. The general strategy we adopt is to exploit all available related data (2D bounding boxes, 2D hand keypoints, 2D instance masks, 3D… ▽ More

    Submitted 30 December, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

    Comments: Project page: https://people.eecs.berkeley.edu/~zhecao/rhoi/

  13. arXiv:2004.04650  [pdf, other

    cs.RO cs.LG stat.ML

    State-Only Imitation Learning for Dexterous Manipulation

    Authors: Ilija Radosavovic, Xiaolong Wang, Lerrel Pinto, Jitendra Malik

    Abstract: Modern model-free reinforcement learning methods have recently demonstrated impressive results on a number of problems. However, complex domains like dexterous manipulation remain a challenge due to the high sample complexity. To address this, current approaches employ expert demonstrations in the form of state-action pairs, which are difficult to obtain for real-world settings such as learning fr… ▽ More

    Submitted 29 December, 2021; v1 submitted 7 April, 2020; originally announced April 2020.

    Comments: IROS 2021

  14. arXiv:2003.13678  [pdf, other

    cs.CV cs.LG

    Designing Network Design Spaces

    Authors: Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár

    Abstract: In this work, we present a new network design paradigm. Our goal is to help advance the understanding of network design and discover design principles that generalize across settings. Instead of focusing on designing individual network instances, we design network design spaces that parametrize populations of networks. The overall process is analogous to classic manual design of networks, but elev… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

    Comments: CVPR 2020

  15. arXiv:1905.13214  [pdf, other

    cs.CV cs.LG

    On Network Design Spaces for Visual Recognition

    Authors: Ilija Radosavovic, Justin Johnson, Saining Xie, Wan-Yen Lo, Piotr Dollár

    Abstract: Over the past several years progress in designing better neural network architectures for visual recognition has been substantial. To help sustain this rate of progress, in this work we propose to reexamine the methodology for comparing network architectures. In particular, we introduce a new comparison paradigm of distribution estimates, in which network design spaces are compared by applying sta… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

    Comments: tech report

  16. arXiv:1904.08918  [pdf, other

    cs.CV

    Attentive Single-Tasking of Multiple Tasks

    Authors: Kevis-Kokitsi Maninis, Ilija Radosavovic, Iasonas Kokkinos

    Abstract: In this work we address task interference in universal networks by considering that a network is trained on multiple tasks, but performs one task at a time, an approach we refer to as "single-tasking multiple tasks". The network thus modifies its behaviour through task-dependent feature adaptation, or task attention. This gives the network the ability to accentuate the features that are adapted to… ▽ More

    Submitted 18 April, 2019; originally announced April 2019.

    Comments: CVPR 2019 Camera Ready

  17. arXiv:1712.04440  [pdf, other

    cs.CV

    Data Distillation: Towards Omni-Supervised Learning

    Authors: Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, Kaiming He

    Abstract: We investigate omni-supervised learning, a special regime of semi-supervised learning in which the learner exploits all available labeled data plus internet-scale sources of unlabeled data. Omni-supervised learning is lower-bounded by performance on existing labeled datasets, offering the potential to surpass state-of-the-art fully supervised methods. To exploit the omni-supervised setting, we pro… ▽ More

    Submitted 12 December, 2017; originally announced December 2017.

    Comments: tech report