Search | arXiv e-print repository

BAKU: An Efficient Transformer for Multi-Task Policy Learning

Authors: Siddhant Haldar, Zhuoran Peng, Lerrel Pinto

Abstract: Training generalist agents capable of solving diverse tasks is challenging, often requiring large datasets of expert demonstrations. This is particularly problematic in robotics, where each data point requires physical execution of actions in the real world. Thus, there is a pressing need for architectures that can effectively leverage the available training data. In this work, we present BAKU, a… ▽ More Training generalist agents capable of solving diverse tasks is challenging, often requiring large datasets of expert demonstrations. This is particularly problematic in robotics, where each data point requires physical execution of actions in the real world. Thus, there is a pressing need for architectures that can effectively leverage the available training data. In this work, we present BAKU, a simple transformer architecture that enables efficient learning of multi-task robot policies. BAKU builds upon recent advancements in offline imitation learning and meticulously combines observation trunks, action chunking, multi-sensory observations, and action heads to substantially improve upon prior work. Our experiments on 129 simulated tasks across LIBERO, Meta-World suite, and the Deepmind Control suite exhibit an overall 18% absolute improvement over RT-1 and MT-ACT, with a 36% improvement on the harder LIBERO benchmark. On 30 real-world manipulation tasks, given an average of just 17 demonstrations per task, BAKU achieves a 91% success rate. Videos of the robot are best viewed at https://baku-robot.github.io/. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.04318 [pdf, other]

Adaptive Sampling of k-Space in Magnetic Resonance for Rapid Pathology Prediction

Authors: Chen-Yu Yen, Raghav Singhal, Umang Sharma, Rajesh Ranganath, Sumit Chopra, Lerrel Pinto

Abstract: Magnetic Resonance (MR) imaging, despite its proven diagnostic utility, remains an inaccessible imaging modality for disease surveillance at the population level. A major factor rendering MR inaccessible is lengthy scan times. An MR scanner collects measurements associated with the underlying anatomy in the Fourier space, also known as the k-space. Creating a high-fidelity image requires collectin… ▽ More Magnetic Resonance (MR) imaging, despite its proven diagnostic utility, remains an inaccessible imaging modality for disease surveillance at the population level. A major factor rendering MR inaccessible is lengthy scan times. An MR scanner collects measurements associated with the underlying anatomy in the Fourier space, also known as the k-space. Creating a high-fidelity image requires collecting large quantities of such measurements, increasing the scan time. Traditionally to accelerate an MR scan, image reconstruction from under-sampled k-space data is the method of choice. However, recent works show the feasibility of bypassing image reconstruction and directly learning to detect disease directly from a sparser learned subset of the k-space measurements. In this work, we propose Adaptive Sampling for MR (ASMR), a sampling method that learns an adaptive policy to sequentially select k-space samples to optimize for target disease detection. On 6 out of 8 pathology classification tasks spanning the Knee, Brain, and Prostate MR scans, ASMR reaches within 2% of the performance of a fully sampled classifier while using only 8% of the k-space, as well as outperforming prior state-of-the-art work in k-space sampling such as EMRT, LOUPE, and DPS. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: ICML 2024. Project website at https://adaptive-sampling-mr.github.io

arXiv:2403.18836 [pdf, other]

Triangulated structure for Bondarenko's Categories

Authors: Germán Benitez, Gustavo Costa, Lucas Q. Pinto

Abstract: V. Bondarenko and Y. Drozd gives a description of all indecomposable objects in a category of representations of posets, nowadays known as the Bondarenko's category. This category was essential for V. Bekkert and H. Merklen classify all indecomposable objects of the derived category of gentle algebras. In view of this connection with the derived category, which possess a triangulated structure, an… ▽ More V. Bondarenko and Y. Drozd gives a description of all indecomposable objects in a category of representations of posets, nowadays known as the Bondarenko's category. This category was essential for V. Bekkert and H. Merklen classify all indecomposable objects of the derived category of gentle algebras. In view of this connection with the derived category, which possess a triangulated structure, and of the fact that in this paper we show that the Bondarenko's category is not an abelian category, it is reasonable to contemplate the existence of a triangulated structure for the Bondarenko's category. In this paper we introduce a triangulated category structure over a quotient of Bondarenko's category, which will allow to use the techniques of triangulated category to study representations of posets. △ Less

Submitted 14 February, 2024; originally announced March 2024.

Comments: 12 pages

MSC Class: 16G20; 18G80

arXiv:2403.08439 [pdf, other]

Characterisation of Anti-Arrhythmic Drug Effects on Cardiac Electrophysiology using Physics-Informed Neural Networks

Authors: Ching-En Chiu, Arieh Levy Pinto, Rasheda A Chowdhury, Kim Christensen, Marta Varela

Abstract: The ability to accurately infer cardiac electrophysiological (EP) properties is key to improving arrhythmia diagnosis and treatment. In this work, we developed a physics-informed neural networks (PINNs) framework to predict how different myocardial EP parameters are modulated by anti-arrhythmic drugs. Using $\textit{in vitro}$ optical map** images and the 3-channel Fenton-Karma model, we estimat… ▽ More The ability to accurately infer cardiac electrophysiological (EP) properties is key to improving arrhythmia diagnosis and treatment. In this work, we developed a physics-informed neural networks (PINNs) framework to predict how different myocardial EP parameters are modulated by anti-arrhythmic drugs. Using $\textit{in vitro}$ optical map** images and the 3-channel Fenton-Karma model, we estimated the changes in ionic channel conductance caused by these drugs. Our framework successfully characterised the action of drugs HMR1556, nifedipine and lidocaine - respectively, blockade of $I_{K}$, $I_{Ca}$, and $I_{Na}$ currents - by estimating that they decreased the respective channel conductance by $31.8\pm2.7\%$ $(p=8.2 \times 10^{-5})$, $80.9\pm21.6\%$ $(p=0.02)$, and $8.6\pm0.5\%$ $ (p=0.03)$, leaving the conductance of other channels unchanged. For carbenoxolone, whose main action is the blockade of intercellular gap junctions, PINNs also successfully predicted no significant changes $(p>0.09)$ in all ionic conductances. Our results are an important step towards the deployment of PINNs for model parameter estimation from experimental data, bringing this framework closer to clinical or laboratory images analysis and for the personalisation of mathematical models. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: Accepted for publication in the 21st IEEE International Symposium on Biomedical Imaging 2024

arXiv:2403.07870 [pdf, other]

OPEN TEACH: A Versatile Teleoperation System for Robotic Manipulation

Authors: Aadhithya Iyer, Zhuoran Peng, Yinlong Dai, Irmak Guzey, Siddhant Haldar, Soumith Chintala, Lerrel Pinto

Abstract: Open-sourced, user-friendly tools form the bedrock of scientific advancement across disciplines. The widespread adoption of data-driven learning has led to remarkable progress in multi-fingered dexterity, bimanual manipulation, and applications ranging from logistics to home robotics. However, existing data collection platforms are often proprietary, costly, or tailored to specific robotic morphol… ▽ More Open-sourced, user-friendly tools form the bedrock of scientific advancement across disciplines. The widespread adoption of data-driven learning has led to remarkable progress in multi-fingered dexterity, bimanual manipulation, and applications ranging from logistics to home robotics. However, existing data collection platforms are often proprietary, costly, or tailored to specific robotic morphologies. We present OPEN TEACH, a new teleoperation system leveraging VR headsets to immerse users in mixed reality for intuitive robot control. Built on the affordable Meta Quest 3, which costs $500, OPEN TEACH enables real-time control of various robots, including multi-fingered hands and bimanual arms, through an easy-to-use app. Using natural hand gestures and movements, users can manipulate robots at up to 90Hz with smooth visual feedback and interface widgets offering closeup environment views. We demonstrate the versatility of OPEN TEACH across 38 tasks on different robots. A comprehensive user study indicates significant improvement in teleoperation capability over the AnyTeleop framework. Further experiments exhibit that the collected data is compatible with policy learning on 10 dexterous and contact-rich manipulation tasks. Currently supporting Franka, xArm, Jaco, and Allegro platforms, OPEN TEACH is fully open-sourced to promote broader adoption. Videos are available at https://open-teach.github.io/. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.03181 [pdf, other]

Behavior Generation with Latent Actions

Authors: Seungjae Lee, Yibin Wang, Haritheja Etukuru, H. ** Kim, Nur Muhammad Mahi Shafiullah, Lerrel Pinto

Abstract: Generative modeling of complex behaviors from labeled datasets has been a longstanding problem in decision making. Unlike language or image generation, decision making requires modeling actions - continuous-valued vectors that are multimodal in their distribution, potentially drawn from uncurated sources, where generation errors can compound in sequential prediction. A recent class of models calle… ▽ More Generative modeling of complex behaviors from labeled datasets has been a longstanding problem in decision making. Unlike language or image generation, decision making requires modeling actions - continuous-valued vectors that are multimodal in their distribution, potentially drawn from uncurated sources, where generation errors can compound in sequential prediction. A recent class of models called Behavior Transformers (BeT) addresses this by discretizing actions using k-means clustering to capture different modes. However, k-means struggles to scale for high-dimensional action spaces or long sequences, and lacks gradient information, and thus BeT suffers in modeling long-range actions. In this work, we present Vector-Quantized Behavior Transformer (VQ-BeT), a versatile model for behavior generation that handles multimodal action prediction, conditional generation, and partial observations. VQ-BeT augments BeT by tokenizing continuous actions with a hierarchical vector quantization module. Across seven environments including simulated manipulation, autonomous driving, and robotics, VQ-BeT improves on state-of-the-art models such as BeT and Diffusion Policies. Importantly, we demonstrate VQ-BeT's improved ability to capture behavior modes while accelerating inference speed 5x over Diffusion Policies. Videos and code can be found https://sjlee.cc/vq-bet △ Less

Submitted 28 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: Github repo: https://github.com/jayLEE0301/vq_bet_official

arXiv:2402.10211 [pdf, other]

Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling

Authors: Raunaq Bhirangi, Chenyu Wang, Venkatesh Pattabiraman, Carmel Majidi, Abhinav Gupta, Tess Hellebrekers, Lerrel Pinto

Abstract: Reasoning from sequences of raw sensory data is a ubiquitous problem across fields ranging from medical devices to robotics. These problems often involve using long sequences of raw sensor data (e.g. magnetometers, piezoresistors) to predict sequences of desirable physical quantities (e.g. force, inertial measurements). While classical approaches are powerful for locally-linear prediction problems… ▽ More Reasoning from sequences of raw sensory data is a ubiquitous problem across fields ranging from medical devices to robotics. These problems often involve using long sequences of raw sensor data (e.g. magnetometers, piezoresistors) to predict sequences of desirable physical quantities (e.g. force, inertial measurements). While classical approaches are powerful for locally-linear prediction problems, they often fall short when using real-world sensors. These sensors are typically non-linear, are affected by extraneous variables (e.g. vibration), and exhibit data-dependent drift. For many problems, the prediction task is exacerbated by small labeled datasets since obtaining ground-truth labels requires expensive equipment. In this work, we present Hierarchical State-Space Models (HiSS), a conceptually simple, new technique for continuous sequential prediction. HiSS stacks structured state-space models on top of each other to create a temporal hierarchy. Across six real-world sensor datasets, from tactile-based state prediction to accelerometer-based inertial measurement, HiSS outperforms state-of-the-art sequence models such as causal Transformers, LSTMs, S4, and Mamba by at least 23% on MSE. Our experiments further indicate that HiSS demonstrates efficient scaling to smaller datasets and is compatible with existing data-filtering techniques. Code, datasets and videos can be found on https://hiss-csp.github.io. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2401.12202 [pdf, other]

OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics

Authors: Peiqi Liu, Yaswanth Orru, Jay Vakil, Chris Paxton, Nur Muhammad Mahi Shafiullah, Lerrel Pinto

Abstract: Remarkable progress has been made in recent years in the fields of vision, language, and robotics. We now have vision models capable of recognizing objects based on language queries, navigation systems that can effectively control mobile systems, and gras** models that can handle a wide range of objects. Despite these advancements, general-purpose applications of robotics still lag behind, even… ▽ More Remarkable progress has been made in recent years in the fields of vision, language, and robotics. We now have vision models capable of recognizing objects based on language queries, navigation systems that can effectively control mobile systems, and gras** models that can handle a wide range of objects. Despite these advancements, general-purpose applications of robotics still lag behind, even though they rely on these fundamental capabilities of recognition, navigation, and gras**. In this paper, we adopt a systems-first approach to develop a new Open Knowledge-based robotics framework called OK-Robot. By combining Vision-Language Models (VLMs) for object detection, navigation primitives for movement, and gras** primitives for object manipulation, OK-Robot offers a integrated solution for pick-and-drop operations without requiring any training. To evaluate its performance, we run OK-Robot in 10 real-world home environments. The results demonstrate that OK-Robot achieves a 58.5% success rate in open-ended pick-and-drop tasks, representing a new state-of-the-art in Open Vocabulary Mobile Manipulation (OVMM) with nearly 1.8x the performance of prior work. On cleaner, uncluttered environments, OK-Robot's performance increases to 82%. However, the most important insight gained from OK-Robot is the critical role of nuanced details when combining Open Knowledge systems like VLMs with robotic modules. Videos of our experiments and code are available on our website: https://ok-robot.github.io △ Less

Submitted 29 February, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: Github repo: https://github.com/ok-robot/ok-robot

arXiv:2401.09252 [pdf, other]

doi 10.1145/3519021

3D Scene Geometry Estimation from 360$^\circ$ Imagery: A Survey

Authors: Thiago Lopes Trugillo da Silveira, Paulo Gamarra Lessa Pinto, Jeffri Erwin Murrugarra Llerena, Claudio Rosito Jung

Abstract: This paper provides a comprehensive survey on pioneer and state-of-the-art 3D scene geometry estimation methodologies based on single, two, or multiple images captured under the omnidirectional optics. We first revisit the basic concepts of the spherical camera model, and review the most common acquisition technologies and representation formats suitable for omnidirectional (also called 360… ▽ More This paper provides a comprehensive survey on pioneer and state-of-the-art 3D scene geometry estimation methodologies based on single, two, or multiple images captured under the omnidirectional optics. We first revisit the basic concepts of the spherical camera model, and review the most common acquisition technologies and representation formats suitable for omnidirectional (also called 360$^\circ$, spherical or panoramic) images and videos. We then survey monocular layout and depth inference approaches, highlighting the recent advances in learning-based solutions suited for spherical data. The classical stereo matching is then revised on the spherical domain, where methodologies for detecting and describing sparse and dense features become crucial. The stereo matching concepts are then extrapolated for multiple view camera setups, categorizing them among light fields, multi-view stereo, and structure from motion (or visual simultaneous localization and map**). We also compile and discuss commonly adopted datasets and figures of merit indicated for each purpose and list recent results for completeness. We conclude this paper by pointing out current and future trends. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: Published in ACM Computing Surveys

Journal ref: ACM Comput. Surv. 55, 4, Article 68, 2023

arXiv:2312.17261 [pdf, other]

Transformer-Based Multi-Object Smoothing with Decoupled Data Association and Smoothing

Authors: Juliano Pinto, Georg Hess, Yuxuan Xia, Henk Wymeersch, Lennart Svensson

Abstract: Multi-object tracking (MOT) is the task of estimating the state trajectories of an unknown and time-varying number of objects over a certain time window. Several algorithms have been proposed to tackle the multi-object smoothing task, where object detections can be conditioned on all the measurements in the time window. However, the best-performing methods suffer from intractable computational com… ▽ More Multi-object tracking (MOT) is the task of estimating the state trajectories of an unknown and time-varying number of objects over a certain time window. Several algorithms have been proposed to tackle the multi-object smoothing task, where object detections can be conditioned on all the measurements in the time window. However, the best-performing methods suffer from intractable computational complexity and require approximations, performing suboptimally in complex settings. Deep learning based algorithms are a possible venue for tackling this issue but have not been applied extensively in settings where accurate multi-object models are available and measurements are low-dimensional. We propose a novel DL architecture specifically tailored for this setting that decouples the data association task from the smoothing task. We compare the performance of the proposed smoother to the state-of-the-art in different tasks of varying difficulty and provide, to the best of our knowledge, the first comparison between traditional Bayesian trackers and DL trackers in the smoothing problem setting. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2312.07540 [pdf, other]

diff History for Neural Language Agents

Authors: Ulyana Piterbarg, Lerrel Pinto, Rob Fergus

Abstract: Neural Language Models (LMs) offer an exciting solution for general-purpose embodied control. However, a key technical issue arises when using an LM-based controller: environment observations must be converted to text, which coupled with history, results in long and verbose textual prompts. As a result, prior work in LM agents is limited to restricted domains with small observation size as well as… ▽ More Neural Language Models (LMs) offer an exciting solution for general-purpose embodied control. However, a key technical issue arises when using an LM-based controller: environment observations must be converted to text, which coupled with history, results in long and verbose textual prompts. As a result, prior work in LM agents is limited to restricted domains with small observation size as well as minimal needs for interaction history or instruction tuning. In this paper, we introduce diff history, a simple and highly effective solution to these issues. By applying the Unix diff command on consecutive text observations in the interaction histories used to prompt LM policies, we can both abstract away redundant information and focus the content of textual inputs on the salient changes in the environment. On NetHack, an unsolved video game that requires long-horizon reasoning for decision-making, LMs tuned with diff history match state-of-the-art performance for neural agents while needing 1800x fewer training examples compared to prior work. Even on the simpler BabyAI-Text environment with concise text observations, we find that although diff history increases the length of prompts, the representation it provides offers a 25% improvement in the efficiency of low-sample instruction tuning. Further, we show that diff history scales favorably across different tuning dataset sizes. We open-source our code and data to https://diffhistory.github.io. △ Less

Submitted 11 June, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: ICML 2024 version

arXiv:2311.16098 [pdf, other]

On Bringing Robots Home

Authors: Nur Muhammad Mahi Shafiullah, Anant Rai, Haritheja Etukuru, Yiqian Liu, Ishan Misra, Soumith Chintala, Lerrel Pinto

Abstract: Throughout history, we have successfully integrated various machines into our homes. Dishwashers, laundry machines, stand mixers, and robot vacuums are a few recent examples. However, these machines excel at performing only a single task effectively. The concept of a "generalist machine" in homes - a domestic assistant that can adapt and learn from our needs, all while remaining cost-effective - h… ▽ More Throughout history, we have successfully integrated various machines into our homes. Dishwashers, laundry machines, stand mixers, and robot vacuums are a few recent examples. However, these machines excel at performing only a single task effectively. The concept of a "generalist machine" in homes - a domestic assistant that can adapt and learn from our needs, all while remaining cost-effective - has long been a goal in robotics that has been steadily pursued for decades. In this work, we initiate a large-scale effort towards this goal by introducing Dobb-E, an affordable yet versatile general-purpose system for learning robotic manipulation within household settings. Dobb-E can learn a new task with only five minutes of a user showing it how to do it, thanks to a demonstration collection tool ("The Stick") we built out of cheap parts and iPhones. We use the Stick to collect 13 hours of data in 22 homes of New York City, and train Home Pretrained Representations (HPR). Then, in a novel home environment, with five minutes of demonstrations and fifteen minutes of adapting the HPR model, we show that Dobb-E can reliably solve the task on the Stretch, a mobile robot readily available on the market. Across roughly 30 days of experimentation in homes of New York City and surrounding areas, we test our system in 10 homes, with a total of 109 tasks in different environments, and finally achieve a success rate of 81%. Beyond success percentages, our experiments reveal a plethora of unique challenges absent or ignored in lab robotics. These range from effects of strong shadows, to variable demonstration quality by non-expert users. With the hope of accelerating research on home robots, and eventually seeing robot butlers in every home, we open-source Dobb-E software stack and models, our data, and our hardware designs at https://dobb-e.com △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: Project website and videos are available at https://dobb-e.com, technical documentation for getting started is available at https://docs.dobb-e.com, and code is released at https://github.com/notmahi/dobb-e

arXiv:2310.08864 [pdf, other]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, A**kya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io. △ Less

Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: Project website: https://robotics-transformer-x.github.io

arXiv:2310.08573 [pdf, other]

PolyTask: Learning Unified Policies through Behavior Distillation

Authors: Siddhant Haldar, Lerrel Pinto

Abstract: Unified models capable of solving a wide variety of tasks have gained traction in vision and NLP due to their ability to share regularities and structures across tasks, which improves individual task performance and reduces computational footprint. However, the impact of such models remains limited in embodied learning problems, which present unique challenges due to interactivity, sample ineffici… ▽ More Unified models capable of solving a wide variety of tasks have gained traction in vision and NLP due to their ability to share regularities and structures across tasks, which improves individual task performance and reduces computational footprint. However, the impact of such models remains limited in embodied learning problems, which present unique challenges due to interactivity, sample inefficiency, and sequential task presentation. In this work, we present PolyTask, a novel method for learning a single unified model that can solve various embodied tasks through a 'learn then distill' mechanism. In the 'learn' step, PolyTask leverages a few demonstrations for each task to train task-specific policies. Then, in the 'distill' step, task-specific policies are distilled into a single policy using a new distillation method called Behavior Distillation. Given a unified policy, individual task behavior can be extracted through conditioning variables. PolyTask is designed to be conceptually simple while being able to leverage well-established algorithms in RL to enable interactivity, a handful of expert demonstrations to allow for sample efficiency, and preventing interactive access to tasks during distillation to enable lifelong learning. Experiments across three simulated environment suites and a real-robot suite show that PolyTask outperforms prior state-of-the-art approaches in multi-task and lifelong learning settings by significant margins. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.08174 [pdf, other]

Map** Water on the Moon and Mars using a Muon Tomograph

Authors: Olin Lyod Pinto, Jörg Miikael Tiit

Abstract: The search for water on the Lunar and Martian surfaces is a fundamental aspect of space exploration, contributing to the understanding of the history and evolution of these celestial bodies. However, the current understanding of the distribution, concentration, origin, and migration of water on these surfaces is limited. Moreover, there is a need for more detailed data on these aspects of Lunar an… ▽ More The search for water on the Lunar and Martian surfaces is a fundamental aspect of space exploration, contributing to the understanding of the history and evolution of these celestial bodies. However, the current understanding of the distribution, concentration, origin, and migration of water on these surfaces is limited. Moreover, there is a need for more detailed data on these aspects of Lunar and Martian water. The natural flux of cosmic-ray muons, capable of penetrating the planetary surface, offers a method to study the water-ice content, composition, and density of these surfaces. In this paper, the author presents a novel approach to address these knowledge gaps by employing cosmic-ray muon detectors and backscattered radiation. The study describes a cutting-edge muon tracking system developed by GScan and highlights the results of preliminary simulations conducted using GEANT4. These findings suggest that muon tomography could be a potential tool for investigating water-ice content on the Lunar and Martian surfaces, pointing to new avenues for space science exploration. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: The paper has been submitted to the JOURNAL OF ADVANCED INSTRUMENTATION IN SCIENCE

arXiv:2309.12300 [pdf, other]

See to Touch: Learning Tactile Dexterity through Visual Incentives

Authors: Irmak Guzey, Yinlong Dai, Ben Evans, Soumith Chintala, Lerrel Pinto

Abstract: Equip** multi-fingered robots with tactile sensing is crucial for achieving the precise, contact-rich, and dexterous manipulation that humans excel at. However, relying solely on tactile sensing fails to provide adequate cues for reasoning about objects' spatial configurations, limiting the ability to correct errors and adapt to changing situations. In this paper, we present Tactile Adaptation f… ▽ More Equip** multi-fingered robots with tactile sensing is crucial for achieving the precise, contact-rich, and dexterous manipulation that humans excel at. However, relying solely on tactile sensing fails to provide adequate cues for reasoning about objects' spatial configurations, limiting the ability to correct errors and adapt to changing situations. In this paper, we present Tactile Adaptation from Visual Incentives (TAVI), a new framework that enhances tactile-based dexterity by optimizing dexterous policies using vision-based rewards. First, we use a contrastive-based objective to learn visual representations. Next, we construct a reward function using these visual representations through optimal-transport based matching on one human demonstration. Finally, we use online reinforcement learning on our robot to optimize tactile-based policies that maximize the visual reward. On six challenging tasks, such as peg pick-and-place, unstacking bowls, and flip** slender objects, TAVI achieves a success rate of 73% using our four-fingered Allegro robot hand. The increase in performance is 108% higher than policies using tactile and vision-based rewards and 135% higher than policies without tactile observational input. Robot videos are best viewed on our project website: https://see-to-touch.github.io/. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2307.12505 [pdf]

Optimizing parameter search for community detection in time evolving networks of complex systems

Authors: ItaloIvo Lima Dias Pinto, Javier Omar Garcia, Kanika Bansal

Abstract: Network representations have been effectively employed to analyze complex systems across various areas and applications, leading to the development of network science as a core tool to study systems with multiple components and complex interactions. There is a growing interest in understanding the temporal dynamics of complex networks to decode the underlying dynamic processes through the temporal… ▽ More Network representations have been effectively employed to analyze complex systems across various areas and applications, leading to the development of network science as a core tool to study systems with multiple components and complex interactions. There is a growing interest in understanding the temporal dynamics of complex networks to decode the underlying dynamic processes through the temporal changes in network structure. Community detection algorithms, which are specialized clustering algorithms, have been instrumental in studying these temporal changes. They work by grou** nodes into communities based on the structure and intensity of network connections over time aiming to maximize modularity of the network partition. However, the performance of these algorithms is highly influenced by the selection of resolution parameters of the modularity function used, which dictate the scale of the represented network, both in size of communities and the temporal resolution of dynamic structure. The selection of these parameters has often been subjective and heavily reliant on the characteristics of the data used to create the network structure. Here, we introduce a method to objectively determine the values of the resolution parameters based on the elements of self-organization. We propose two key approaches: (1) minimization of the biases in spatial scale network characterization and (2) maximization of temporal scale-freeness. We demonstrate the effectiveness of these approaches using benchmark network structures as well as real-world datasets. To implement our method, we also provide an automated parameter selection software package that can be applied to a wide range of complex systems. △ Less

Submitted 23 July, 2023; originally announced July 2023.

Comments: 28 pages, 7 figures

arXiv:2306.12554 [pdf, other]

Improving Long-Horizon Imitation Through Instruction Prediction

Authors: Joey Hejna, Pieter Abbeel, Lerrel Pinto

Abstract: Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents. Difficulties in such settings are exacerbated in low data regimes where over-fitting stifles generalization and compounding errors hurt accuracy. In this work, we explore the use of an often unused source of auxiliary supervision: language. Inspired by recent advances in transformer-based m… ▽ More Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents. Difficulties in such settings are exacerbated in low data regimes where over-fitting stifles generalization and compounding errors hurt accuracy. In this work, we explore the use of an often unused source of auxiliary supervision: language. Inspired by recent advances in transformer-based models, we train agents with an instruction prediction loss that encourages learning temporally extended representations that operate at a high level of abstraction. Concretely, we demonstrate that instruction modeling significantly improves performance in planning environments when training with a limited number of demonstrations on the BabyAI and Crafter benchmarks. In further analysis we find that instruction modeling is most important for tasks that require complex reasoning, while understandably offering smaller gains in environments that require simple plans. More details and code can be found at https://github.com/jhejna/instruction-prediction. △ Less

Submitted 21 June, 2023; originally announced June 2023.

Comments: Published at AAAI 2023

arXiv:2306.00942 [pdf, other]

Train Offline, Test Online: A Real Robot Learning Benchmark

Authors: Gaoyue Zhou, Victoria Dean, Mohan Kumar Srirama, Aravind Rajeswaran, Jyothish Pari, Kyle Hatch, Aryan Jain, Tianhe Yu, Pieter Abbeel, Lerrel Pinto, Chelsea Finn, Abhinav Gupta

Abstract: Three challenges limit the progress of robot learning research: robots are expensive (few labs can participate), everyone uses different robots (findings do not generalize across labs), and we lack internet-scale robotics data. We take on these challenges via a new benchmark: Train Offline, Test Online (TOTO). TOTO provides remote users with access to shared robotic hardware for evaluating methods… ▽ More Three challenges limit the progress of robot learning research: robots are expensive (few labs can participate), everyone uses different robots (findings do not generalize across labs), and we lack internet-scale robotics data. We take on these challenges via a new benchmark: Train Offline, Test Online (TOTO). TOTO provides remote users with access to shared robotic hardware for evaluating methods on common tasks and an open-source dataset of these tasks for offline training. Its manipulation task suite requires challenging generalization to unseen objects, positions, and lighting. We present initial results on TOTO comparing five pretrained visual representations and four offline policy learning baselines, remotely contributed by five institutions. The real promise of TOTO, however, lies in the future: we release the benchmark for additional submissions from any user, enabling easy, direct comparison to several methods without the need to obtain hardware or collect data. △ Less

Submitted 30 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: Accepted to ICRA 2023

arXiv:2305.19240 [pdf, other]

NetHack is Hard to Hack

Authors: Ulyana Piterbarg, Lerrel Pinto, Rob Fergus

Abstract: Neural policy learning methods have achieved remarkable results in various control problems, ranging from Atari games to simulated locomotion. However, these methods struggle in long-horizon tasks, especially in open-ended environments with multi-modal observations, such as the popular dungeon-crawler game, NetHack. Intriguingly, the NeurIPS 2021 NetHack Challenge revealed that symbolic agents out… ▽ More Neural policy learning methods have achieved remarkable results in various control problems, ranging from Atari games to simulated locomotion. However, these methods struggle in long-horizon tasks, especially in open-ended environments with multi-modal observations, such as the popular dungeon-crawler game, NetHack. Intriguingly, the NeurIPS 2021 NetHack Challenge revealed that symbolic agents outperformed neural approaches by over four times in median game score. In this paper, we delve into the reasons behind this performance gap and present an extensive study on neural policy learning for NetHack. To conduct this study, we analyze the winning symbolic agent, extending its codebase to track internal strategy selection in order to generate one of the largest available demonstration datasets. Utilizing this dataset, we examine (i) the advantages of an action hierarchy; (ii) enhancements in neural architecture; and (iii) the integration of reinforcement learning with imitation learning. Our investigations produce a state-of-the-art neural agent that surpasses previous fully neural policies by 127% in offline settings and 25% in online settings on median game score. However, we also demonstrate that mere scaling is insufficient to bridge the performance gap with the best symbolic models or even the top human players. △ Less

Submitted 30 October, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023

arXiv:2303.12076 [pdf, other]

Dexterity from Touch: Self-Supervised Pre-Training of Tactile Representations with Robotic Play

Authors: Irmak Guzey, Ben Evans, Soumith Chintala, Lerrel Pinto

Abstract: Teaching dexterity to multi-fingered robots has been a longstanding challenge in robotics. Most prominent work in this area focuses on learning controllers or policies that either operate on visual observations or state estimates derived from vision. However, such methods perform poorly on fine-grained manipulation tasks that require reasoning about contact forces or about objects occluded by the… ▽ More Teaching dexterity to multi-fingered robots has been a longstanding challenge in robotics. Most prominent work in this area focuses on learning controllers or policies that either operate on visual observations or state estimates derived from vision. However, such methods perform poorly on fine-grained manipulation tasks that require reasoning about contact forces or about objects occluded by the hand itself. In this work, we present T-Dex, a new approach for tactile-based dexterity, that operates in two phases. In the first phase, we collect 2.5 hours of play data, which is used to train self-supervised tactile encoders. This is necessary to bring high-dimensional tactile readings to a lower-dimensional embedding. In the second phase, given a handful of demonstrations for a dexterous task, we learn non-parametric policies that combine the tactile observations with visual ones. Across five challenging dexterous tasks, we show that our tactile-based dexterity models outperform purely vision and torque-based models by an average of 1.7X. Finally, we provide a detailed analysis on factors critical to T-Dex including the importance of play data, architectures, and representation learning. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: Video and code can be accessed here: https://tactile-dexterity.github.io/

arXiv:2303.05647 [pdf, other]

doi 10.1088/1402-4896/ad4d27

Electronic states in quantum wires on a Möbius strip

Authors: J. J. L. R. Pinto, J. E. G. Silva, C. A. S. Almeida

Abstract: We study the properties of a two-dimensional non-relativistic electron gas (TDEG) constrained on wires along a Möbius strip. We considered wires around the strip and along the transverse direction, across the width of the strip. For each direction, we investigate how the curvature modifies the electronic states and their corresponding energy spectrum. At the center of the strip, the wires around t… ▽ More We study the properties of a two-dimensional non-relativistic electron gas (TDEG) constrained on wires along a Möbius strip. We considered wires around the strip and along the transverse direction, across the width of the strip. For each direction, we investigate how the curvature modifies the electronic states and their corresponding energy spectrum. At the center of the strip, the wires around the surface form quantum rings whose spectrum depends on the strip radius $a$. For wires at the edge of the strip, the inner edge turns into the outer edge. Accordingly, the curvature yields localized states in the middle of the wire. Along the strip width, the effective potential exhibits a parity symmetry breaking leading to the localization of the bound state on one side of the strip. △ Less

Submitted 6 June, 2024; v1 submitted 9 March, 2023; originally announced March 2023.

Comments: 16 pages, 11 captioned figures. Updated version to match that one published in Physica Scripta

Journal ref: Phys. Scr. 99 (2024) 0659c2

arXiv:2303.01497 [pdf, other]

Teach a Robot to FISH: Versatile Imitation from One Minute of Demonstrations

Authors: Siddhant Haldar, Jyothish Pari, Anant Rai, Lerrel Pinto

Abstract: While imitation learning provides us with an efficient toolkit to train robots, learning skills that are robust to environment variations remains a significant challenge. Current approaches address this challenge by relying either on large amounts of demonstrations that span environment variations or on handcrafted reward functions that require state estimates. Both directions are not scalable to… ▽ More While imitation learning provides us with an efficient toolkit to train robots, learning skills that are robust to environment variations remains a significant challenge. Current approaches address this challenge by relying either on large amounts of demonstrations that span environment variations or on handcrafted reward functions that require state estimates. Both directions are not scalable to fast imitation. In this work, we present Fast Imitation of Skills from Humans (FISH), a new imitation learning approach that can learn robust visual skills with less than a minute of human demonstrations. Given a weak base-policy trained by offline imitation of demonstrations, FISH computes rewards that correspond to the "match" between the robot's behavior and the demonstrations. These rewards are then used to adaptively update a residual policy that adds on to the base-policy. Across all tasks, FISH requires at most twenty minutes of interactive learning to imitate demonstrations on object configurations that were not seen in the demonstrations. Importantly, FISH is constructed to be versatile, which allows it to be used across robot morphologies (e.g. xArm, Allegro, Stretch) and camera configurations (e.g. third-person, eye-in-hand). Our experimental evaluations on 9 different tasks show that FISH achieves an average success rate of 93%, which is around 3.8x higher than prior state-of-the-art methods. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: Code and robot videos are available at https://fast-imitation.github.io/

arXiv:2210.10047 [pdf, other]

From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data

Authors: Zichen Jeff Cui, Yibin Wang, Nur Muhammad Mahi Shafiullah, Lerrel Pinto

Abstract: While large-scale sequence modeling from offline data has led to impressive performance gains in natural language and image generation, directly translating such ideas to robotics has been challenging. One critical reason for this is that uncurated robot demonstration data, i.e. play data, collected from non-expert human demonstrators are often noisy, diverse, and distributionally multi-modal. Thi… ▽ More While large-scale sequence modeling from offline data has led to impressive performance gains in natural language and image generation, directly translating such ideas to robotics has been challenging. One critical reason for this is that uncurated robot demonstration data, i.e. play data, collected from non-expert human demonstrators are often noisy, diverse, and distributionally multi-modal. This makes extracting useful, task-centric behaviors from such data a difficult generative modeling problem. In this work, we present Conditional Behavior Transformers (C-BeT), a method that combines the multi-modal generation ability of Behavior Transformer with future-conditioned goal specification. On a suite of simulated benchmark tasks, we find that C-BeT improves upon prior state-of-the-art work in learning from play data by an average of 45.7%. Further, we demonstrate for the first time that useful task-centric behaviors can be learned on a real-world robot purely from play data without any task labels or reward information. Robot videos are best viewed on our project website: https://play-to-policy.github.io △ Less

Submitted 15 December, 2022; v1 submitted 18 October, 2022; originally announced October 2022.

Comments: Code and data available at: https://play-to-policy.github.io; (fixed metadata author name format)

arXiv:2210.06463 [pdf, other]

Holo-Dex: Teaching Dexterity with Immersive Mixed Reality

Authors: Sridhar Pandian Arunachalam, Irmak Güzey, Soumith Chintala, Lerrel Pinto

Abstract: A fundamental challenge in teaching robots is to provide an effective interface for human teachers to demonstrate useful skills to a robot. This challenge is exacerbated in dexterous manipulation, where teaching high-dimensional, contact-rich behaviors often require esoteric teleoperation tools. In this work, we present Holo-Dex, a framework for dexterous manipulation that places a teacher in an i… ▽ More A fundamental challenge in teaching robots is to provide an effective interface for human teachers to demonstrate useful skills to a robot. This challenge is exacerbated in dexterous manipulation, where teaching high-dimensional, contact-rich behaviors often require esoteric teleoperation tools. In this work, we present Holo-Dex, a framework for dexterous manipulation that places a teacher in an immersive mixed reality through commodity VR headsets. The high-fidelity hand pose estimator onboard the headset is used to teleoperate the robot and collect demonstrations for a variety of general-purpose dexterous tasks. Given these demonstrations, we use powerful feature learning combined with non-parametric imitation to train dexterous skills. Our experiments on six common dexterous tasks, including in-hand rotation, spinning, and bottle opening, indicate that Holo-Dex can both collect high-quality demonstration data and train skills in a matter of hours. Finally, we find that our trained skills can exhibit generalization on objects not seen in training. Videos of Holo-Dex are available at https://holo-dex.github.io. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: Data, code and videos are available at https://holo-dex.github.io

arXiv:2210.05663 [pdf, other]

CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory

Authors: Nur Muhammad Mahi Shafiullah, Chris Paxton, Lerrel Pinto, Soumith Chintala, Arthur Szlam

Abstract: We propose CLIP-Fields, an implicit scene model that can be used for a variety of tasks, such as segmentation, instance identification, semantic search over space, and view localization. CLIP-Fields learns a map** from spatial locations to semantic embedding vectors. Importantly, we show that this map** can be trained with supervision coming only from web-image and web-text trained models such… ▽ More We propose CLIP-Fields, an implicit scene model that can be used for a variety of tasks, such as segmentation, instance identification, semantic search over space, and view localization. CLIP-Fields learns a map** from spatial locations to semantic embedding vectors. Importantly, we show that this map** can be trained with supervision coming only from web-image and web-text trained models such as CLIP, Detic, and Sentence-BERT; and thus uses no direct human supervision. When compared to baselines like Mask-RCNN, our method outperforms on few-shot instance identification or semantic segmentation on the HM3D dataset with only a fraction of the examples. Finally, we show that using CLIP-Fields as a scene memory, robots can perform semantic navigation in real-world environments. Our code and demonstration videos are available here: https://mahis.life/clip-fields △ Less

Submitted 22 May, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

Comments: Code, video, and interactive demonstrations available at https://mahis.life/clip-fields. Accepted for publication at Robotics: Science and Systems 2023 in Daegu, Korea

arXiv:2210.01116 [pdf, other]

That Sounds Right: Auditory Self-Supervision for Dynamic Robot Manipulation

Authors: Abitha Thankaraj, Lerrel Pinto

Abstract: Learning to produce contact-rich, dynamic behaviors from raw sensory data has been a longstanding challenge in robotics. Prominent approaches primarily focus on using visual or tactile sensing, where unfortunately one fails to capture high-frequency interaction, while the other can be too delicate for large-scale data collection. In this work, we propose a data-centric approach to dynamic manipula… ▽ More Learning to produce contact-rich, dynamic behaviors from raw sensory data has been a longstanding challenge in robotics. Prominent approaches primarily focus on using visual or tactile sensing, where unfortunately one fails to capture high-frequency interaction, while the other can be too delicate for large-scale data collection. In this work, we propose a data-centric approach to dynamic manipulation that uses an often ignored source of information: sound. We first collect a dataset of 25k interaction-sound pairs across five dynamic tasks using commodity contact microphones. Then, given this data, we leverage self-supervised learning to accelerate behavior prediction from sound. Our experiments indicate that this self-supervised 'pretraining' is crucial to achieving high performance, with a 34.5% lower MSE than plain supervised learning and a 54.3% lower MSE over visual training. Importantly, we find that when asked to generate desired sound profiles, online rollouts of our models on a UR10 robot can produce dynamic behavior that achieves an average of 11.5% improvement over supervised learning on audio similarity metrics. △ Less

Submitted 3 October, 2022; originally announced October 2022.

Comments: Videos and audio data are best seen on our project website: audio-robot-learning.github.io

arXiv:2208.02932 [pdf, other]

Human Decision Makings on Curriculum Reinforcement Learning with Difficulty Adjustment

Authors: Yilei Zeng, Jiali Duan, Yang Li, Emilio Ferrara, Lerrel Pinto, C. -C. Jay Kuo, Stefanos Nikolaidis

Abstract: Human-centered AI considers human experiences with AI performance. While abundant research has been hel** AI achieve superhuman performance either by fully automatic or weak supervision learning, fewer endeavors are experimenting with how AI can tailor to humans' preferred skill level given fine-grained input. In this work, we guide the curriculum reinforcement learning results towards a preferr… ▽ More Human-centered AI considers human experiences with AI performance. While abundant research has been hel** AI achieve superhuman performance either by fully automatic or weak supervision learning, fewer endeavors are experimenting with how AI can tailor to humans' preferred skill level given fine-grained input. In this work, we guide the curriculum reinforcement learning results towards a preferred performance level that is neither too hard nor too easy via learning from the human decision process. To achieve this, we developed a portable, interactive platform that enables the user to interact with agents online via manipulating the task difficulty, observing performance, and providing curriculum feedback. Our system is highly parallelizable, making it possible for a human to train large-scale reinforcement learning applications that require millions of samples without a server. The result demonstrates the effectiveness of an interactive curriculum for reinforcement learning involving human-in-the-loop. It shows reinforcement learning performance can successfully adjust in sync with the human desired difficulty level. We believe this research will open new doors for achieving flow and personalized adaptive difficulties. △ Less

Submitted 4 August, 2022; originally announced August 2022.

Comments: 6 pages, 7 figures

ACM Class: I.2.6

arXiv:2206.15469 [pdf, other]

Watch and Match: Supercharging Imitation with Regularized Optimal Transport

Authors: Siddhant Haldar, Vaibhav Mathur, Denis Yarats, Lerrel Pinto

Abstract: Imitation learning holds tremendous promise in learning policies efficiently for complex decision making problems. Current state-of-the-art algorithms often use inverse reinforcement learning (IRL), where given a set of expert demonstrations, an agent alternatively infers a reward function and the associated optimal policy. However, such IRL approaches often require substantial online interactions… ▽ More Imitation learning holds tremendous promise in learning policies efficiently for complex decision making problems. Current state-of-the-art algorithms often use inverse reinforcement learning (IRL), where given a set of expert demonstrations, an agent alternatively infers a reward function and the associated optimal policy. However, such IRL approaches often require substantial online interactions for complex control problems. In this work, we present Regularized Optimal Transport (ROT), a new imitation learning algorithm that builds on recent advances in optimal transport based trajectory-matching. Our key technical insight is that adaptively combining trajectory-matching rewards with behavior cloning can significantly accelerate imitation even with only a few demonstrations. Our experiments on 20 visual control tasks across the DeepMind Control Suite, the OpenAI Robotics Suite, and the Meta-World Benchmark demonstrate an average of 7.8X faster imitation to reach 90% of expert performance compared to prior state-of-the-art methods. On real-world robotic manipulation, with just one demonstration and an hour of online training, ROT achieves an average success rate of 90.1% across 14 tasks. △ Less

Submitted 20 February, 2023; v1 submitted 30 June, 2022; originally announced June 2022.

Comments: Code and robot videos are available on https://rot-robot.github.io/

arXiv:2206.11251 [pdf, other]

Behavior Transformers: Cloning $k$ modes with one stone

Authors: Nur Muhammad Mahi Shafiullah, Zichen Jeff Cui, Ariuntuya Altanzaya, Lerrel Pinto

Abstract: While behavior learning has made impressive progress in recent times, it lags behind computer vision and natural language processing due to its inability to leverage large, human-generated datasets. Human behaviors have wide variance, multiple modes, and human demonstrations typically do not come with reward labels. These properties limit the applicability of current methods in Offline RL and Beha… ▽ More While behavior learning has made impressive progress in recent times, it lags behind computer vision and natural language processing due to its inability to leverage large, human-generated datasets. Human behaviors have wide variance, multiple modes, and human demonstrations typically do not come with reward labels. These properties limit the applicability of current methods in Offline RL and Behavioral Cloning to learn from large, pre-collected datasets. In this work, we present Behavior Transformer (BeT), a new technique to model unlabeled demonstration data with multiple modes. BeT retrofits standard transformer architectures with action discretization coupled with a multi-task action correction inspired by offset prediction in object detection. This allows us to leverage the multi-modal modeling ability of modern transformers to predict multi-modal continuous actions. We experimentally evaluate BeT on a variety of robotic manipulation and self-driving behavior datasets. We show that BeT significantly improves over prior state-of-the-art work on solving demonstrated tasks while capturing the major modes present in the pre-collected datasets. Finally, through an extensive ablation study, we analyze the importance of every crucial component in BeT. Videos of behavior generated by BeT are available at https://notmahi.github.io/bet △ Less

Submitted 11 October, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

Comments: Code and data available at https://github.com/notmahi/bet

arXiv:2203.13251 [pdf, other]

Dexterous Imitation Made Easy: A Learning-Based Framework for Efficient Dexterous Manipulation

Authors: Sridhar Pandian Arunachalam, Sneha Silwal, Ben Evans, Lerrel Pinto

Abstract: Optimizing behaviors for dexterous manipulation has been a longstanding challenge in robotics, with a variety of methods from model-based control to model-free reinforcement learning having been previously explored in literature. Perhaps one of the most powerful techniques to learn complex manipulation strategies is imitation learning. However, collecting and learning from demonstrations in dexter… ▽ More Optimizing behaviors for dexterous manipulation has been a longstanding challenge in robotics, with a variety of methods from model-based control to model-free reinforcement learning having been previously explored in literature. Perhaps one of the most powerful techniques to learn complex manipulation strategies is imitation learning. However, collecting and learning from demonstrations in dexterous manipulation is quite challenging. The complex, high-dimensional action-space involved with multi-finger control often leads to poor sample efficiency of learning-based methods. In this work, we propose 'Dexterous Imitation Made Easy' (DIME) a new imitation learning framework for dexterous manipulation. DIME only requires a single RGB camera to observe a human operator and teleoperate our robotic hand. Once demonstrations are collected, DIME employs standard imitation learning methods to train dexterous manipulation policies. On both simulation and real robot benchmarks we demonstrate that DIME can be used to solve complex, in-hand manipulation tasks such as 'flip**', 'spinning', and 'rotating' objects with the Allegro hand. Our framework along with pre-collected demonstrations is publicly available at https://nyu-robot-learning.github.io/dime. △ Less

Submitted 24 March, 2022; originally announced March 2022.

Comments: The first two authors contributed equally

arXiv:2203.11176 [pdf, other]

One After Another: Learning Incremental Skills for a Changing World

Authors: Nur Muhammad Shafiullah, Lerrel Pinto

Abstract: Reward-free, unsupervised discovery of skills is an attractive alternative to the bottleneck of hand-designing rewards in environments where task supervision is scarce or expensive. However, current skill pre-training methods, like many RL techniques, make a fundamental assumption - stationary environments during training. Traditional methods learn all their skills simultaneously, which makes it d… ▽ More Reward-free, unsupervised discovery of skills is an attractive alternative to the bottleneck of hand-designing rewards in environments where task supervision is scarce or expensive. However, current skill pre-training methods, like many RL techniques, make a fundamental assumption - stationary environments during training. Traditional methods learn all their skills simultaneously, which makes it difficult for them to both quickly adapt to changes in the environment, and to not forget earlier skills after such adaptation. On the other hand, in an evolving or expanding environment, skill learning must be able to adapt fast to new environment situations while not forgetting previously learned skills. These two conditions make it difficult for classic skill discovery to do well in an evolving environment. In this work, we propose a new framework for skill discovery, where skills are learned one after another in an incremental fashion. This framework allows newly learned skills to adapt to new environment or agent dynamics, while the fixed old skills ensure the agent doesn't forget a learned skill. We demonstrate experimentally that in both evolving and static environments, incremental skills significantly outperform current state-of-the-art skill discovery methods on both skill quality and the ability to solve downstream tasks. Videos for learned skills and code are made public on https://notmahi.github.io/disk △ Less

Submitted 21 March, 2022; originally announced March 2022.

Comments: To be published in The International Conference on Learning Representations (ICLR) 2022

arXiv:2203.08098 [pdf, other]

RB2: Robotic Manipulation Benchmarking with a Twist

Authors: Sudeep Dasari, Jianren Wang, Joyce Hong, Shikhar Bahl, Yixin Lin, Austin Wang, Abitha Thankaraj, Karanbir Chahal, Berk Calli, Saurabh Gupta, David Held, Lerrel Pinto, Deepak Pathak, Vikash Kumar, Abhinav Gupta

Abstract: Benchmarks offer a scientific way to compare algorithms using objective performance metrics. Good benchmarks have two features: (a) they should be widely useful for many research groups; (b) and they should produce reproducible findings. In robotic manipulation research, there is a trade-off between reproducibility and broad accessibility. If the benchmark is kept restrictive (fixed hardware, obje… ▽ More Benchmarks offer a scientific way to compare algorithms using objective performance metrics. Good benchmarks have two features: (a) they should be widely useful for many research groups; (b) and they should produce reproducible findings. In robotic manipulation research, there is a trade-off between reproducibility and broad accessibility. If the benchmark is kept restrictive (fixed hardware, objects), the numbers are reproducible but the setup becomes less general. On the other hand, a benchmark could be a loose set of protocols (e.g. object sets) but the underlying variation in setups make the results non-reproducible. In this paper, we re-imagine benchmarking for robotic manipulation as state-of-the-art algorithmic implementations, alongside the usual set of tasks and experimental protocols. The added baseline implementations will provide a way to easily recreate SOTA numbers in a new local robotic setup, thus providing credible relative rankings between existing approaches and new work. However, these local rankings could vary between different setups. To resolve this issue, we build a mechanism for pooling experimental data between labs, and thus we establish a single global ranking for existing (and proposed) SOTA algorithms. Our benchmark, called Ranking-Based Robotics Benchmark (RB2), is evaluated on tasks that are inspired from clinically validated Southampton Hand Assessment Procedures. Our benchmark was run across two different labs and reveals several surprising findings. For example, extremely simple baselines like open-loop behavior cloning, outperform more complicated models (e.g. closed loop, RNN, Offline-RL, etc.) that are preferred by the field. We hope our fellow researchers will use RB2 to improve their research's quality and rigor. △ Less

Submitted 30 October, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

Comments: accepted at the NeurIPS 2021 Datasets and Benchmarks Track

arXiv:2203.05549 [pdf, other]

Context is Everything: Implicit Identification for Dynamics Adaptation

Authors: Ben Evans, Abitha Thankaraj, Lerrel Pinto

Abstract: Understanding environment dynamics is necessary for robots to act safely and optimally in the world. In realistic scenarios, dynamics are non-stationary and the causal variables such as environment parameters cannot necessarily be precisely measured or inferred, even during training. We propose Implicit Identification for Dynamics Adaptation (IIDA), a simple method to allow predictive models to ad… ▽ More Understanding environment dynamics is necessary for robots to act safely and optimally in the world. In realistic scenarios, dynamics are non-stationary and the causal variables such as environment parameters cannot necessarily be precisely measured or inferred, even during training. We propose Implicit Identification for Dynamics Adaptation (IIDA), a simple method to allow predictive models to adapt to changing environment dynamics. IIDA assumes no access to the true variations in the world and instead implicitly infers properties of the environment from a small amount of contextual data. We demonstrate IIDA's ability to perform well in unseen environments through a suite of simulated experiments on MuJoCo environments and a real robot dynamic sliding task. In general, IIDA significantly reduces model error and results in higher task performance over commonly used methods. Our code and robot videos are at https://bennevans.github.io/iida/ △ Less

Submitted 10 March, 2022; originally announced March 2022.

Comments: Accepted at ICRA 2022

arXiv:2202.07909 [pdf, other]

Can Deep Learning be Applied to Model-Based Multi-Object Tracking?

Authors: Juliano Pinto, Georg Hess, William Ljungbergh, Yuxuan Xia, Henk Wymeersch, Lennart Svensson

Abstract: Multi-object tracking (MOT) is the problem of tracking the state of an unknown and time-varying number of objects using noisy measurements, with important applications such as autonomous driving, tracking animal behavior, defense systems, and others. In recent years, deep learning (DL) has been increasingly used in MOT for improving tracking performance, but mostly in settings where the measuremen… ▽ More Multi-object tracking (MOT) is the problem of tracking the state of an unknown and time-varying number of objects using noisy measurements, with important applications such as autonomous driving, tracking animal behavior, defense systems, and others. In recent years, deep learning (DL) has been increasingly used in MOT for improving tracking performance, but mostly in settings where the measurements are high-dimensional and there are no available models of the measurement likelihood and the object dynamics. The model-based setting instead has not attracted as much attention, and it is still unclear if DL methods can outperform traditional model-based Bayesian methods, which are the state of the art (SOTA) in this context. In this paper, we propose a Transformer-based DL tracker and evaluate its performance in the model-based setting, comparing it to SOTA model-based Bayesian methods in a variety of different tasks. Our results show that the proposed DL method can match the performance of the model-based methods in simple tasks, while outperforming them when the task gets more complicated, either due to an increase in the data association complexity, or to stronger nonlinearities of the models of the environment. △ Less

Submitted 16 February, 2022; originally announced February 2022.

arXiv:2201.13425 [pdf, other]

Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning

Authors: Denis Yarats, David Brandfonbrener, Hao Liu, Michael Laskin, Pieter Abbeel, Alessandro Lazaric, Lerrel Pinto

Abstract: Recent progress in deep learning has relied on access to large and diverse datasets. Such data-driven progress has been less evident in offline reinforcement learning (RL), because offline RL data is usually collected to optimize specific target tasks limiting the data's diversity. In this work, we propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL. ExORL first… ▽ More Recent progress in deep learning has relied on access to large and diverse datasets. Such data-driven progress has been less evident in offline reinforcement learning (RL), because offline RL data is usually collected to optimize specific target tasks limiting the data's diversity. In this work, we propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL. ExORL first generates data with unsupervised reward-free exploration, then relabels this data with a downstream reward before training a policy with offline RL. We find that exploratory data allows vanilla off-policy RL algorithms, without any offline-specific modifications, to outperform or match state-of-the-art offline RL algorithms on downstream tasks. Our findings suggest that data generation is as important as algorithmic advances for offline RL and hence requires careful consideration from the community. Code and data can be found at https://github.com/denisyarats/exorl . △ Less

Submitted 5 April, 2022; v1 submitted 31 January, 2022; originally announced January 2022.

arXiv:2112.01511 [pdf, other]

The Surprising Effectiveness of Representation Learning for Visual Imitation

Authors: Jyothish Pari, Nur Muhammad Shafiullah, Sridhar Pandian Arunachalam, Lerrel Pinto

Abstract: While visual imitation learning offers one of the most effective ways of learning from visual demonstrations, generalizing from them requires either hundreds of diverse demonstrations, task specific priors, or large, hard-to-train parametric models. One reason such complexities arise is because standard visual imitation frameworks try to solve two coupled problems at once: learning a succinct but… ▽ More While visual imitation learning offers one of the most effective ways of learning from visual demonstrations, generalizing from them requires either hundreds of diverse demonstrations, task specific priors, or large, hard-to-train parametric models. One reason such complexities arise is because standard visual imitation frameworks try to solve two coupled problems at once: learning a succinct but good representation from the diverse visual data, while simultaneously learning to associate the demonstrated actions with such representations. Such joint learning causes an interdependence between these two problems, which often results in needing large amounts of demonstrations for learning. To address this challenge, we instead propose to decouple representation learning from behavior learning for visual imitation. First, we learn a visual representation encoder from offline data using standard supervised and self-supervised learning methods. Once the representations are trained, we use non-parametric Locally Weighted Regression to predict the actions. We experimentally show that this simple decoupling improves the performance of visual imitation models on both offline demonstration datasets and real-robot door opening compared to prior work in visual imitation. All of our generated data, code, and robot videos are publicly available at https://jyopari.github.io/VINN/. △ Less

Submitted 6 December, 2021; v1 submitted 2 December, 2021; originally announced December 2021.

Comments: The first two authors contributed equally

arXiv:2111.08084 [pdf, ps, other]

Finding the Minimum Norm and Center Density of Cyclic Lattices via Nonlinear Systems

Authors: William Lima da Silva Pinto, Carina Alves

Abstract: Lattices with a circulant generator matrix represent a subclass of cyclic lattices. This subclass can be described by a basis containing a vector and its circular shifts. In this paper, we present certain conditions under which the norm expression of an arbitrary vector of this type of lattice is substantially simplified, and then investigate some of the lattices obtained under these conditions. W… ▽ More Lattices with a circulant generator matrix represent a subclass of cyclic lattices. This subclass can be described by a basis containing a vector and its circular shifts. In this paper, we present certain conditions under which the norm expression of an arbitrary vector of this type of lattice is substantially simplified, and then investigate some of the lattices obtained under these conditions. We exhibit systems of nonlinear equations whose solutions yield lattices as dense as $D_n$ in odd dimensions. As far as even dimensions, we obtain lattices denser than $A_n$ as long as $n \in 2\mathbb{Z} \backslash 4\mathbb{Z}$. △ Less

Submitted 5 July, 2023; v1 submitted 15 November, 2021; originally announced November 2021.

Comments: preprint, 28 pages, 1 figure

MSC Class: 11H31; 52C17; 15A15; 15A03; 90C30

arXiv:2110.15191 [pdf, other]

URLB: Unsupervised Reinforcement Learning Benchmark

Authors: Michael Laskin, Denis Yarats, Hao Liu, Kimin Lee, Albert Zhan, Kevin Lu, Catherine Cang, Lerrel Pinto, Pieter Abbeel

Abstract: Deep Reinforcement Learning (RL) has emerged as a powerful paradigm to solve a range of complex yet specific control tasks. Yet training generalist agents that can quickly adapt to new tasks remains an outstanding challenge. Recent advances in unsupervised RL have shown that pre-training RL agents with self-supervised intrinsic rewards can result in efficient adaptation. However, these algorithms… ▽ More Deep Reinforcement Learning (RL) has emerged as a powerful paradigm to solve a range of complex yet specific control tasks. Yet training generalist agents that can quickly adapt to new tasks remains an outstanding challenge. Recent advances in unsupervised RL have shown that pre-training RL agents with self-supervised intrinsic rewards can result in efficient adaptation. However, these algorithms have been hard to compare and develop due to the lack of a unified benchmark. To this end, we introduce the Unsupervised Reinforcement Learning Benchmark (URLB). URLB consists of two phases: reward-free pre-training and downstream task adaptation with extrinsic rewards. Building on the DeepMind Control Suite, we provide twelve continuous control tasks from three domains for evaluation and open-source code for eight leading unsupervised RL methods. We find that the implemented baselines make progress but are not able to solve URLB and propose directions for future research. △ Less

Submitted 28 October, 2021; originally announced October 2021.

Comments: Code for the Unsupervised Reinforcement Learning Benchmark is available at https://github.com/rll-research/url_benchmark

arXiv:2108.04619 [pdf, other]

doi 10.1109/LSP.2021.3103488

An Uncertainty-Aware Performance Measure for Multi-Object Tracking

Authors: Juliano Pinto, Yuxuan Xia, Lennart Svensson, Henk Wymeersch

Abstract: Evaluating the performance of multi-object tracking (MOT) methods is not straightforward, and existing performance measures fail to consider all the available uncertainty information in the MOT context. This can lead practitioners to select models which produce uncertainty estimates of lower quality, negatively impacting any downstream systems that rely on them. Additionally, most MOT performance… ▽ More Evaluating the performance of multi-object tracking (MOT) methods is not straightforward, and existing performance measures fail to consider all the available uncertainty information in the MOT context. This can lead practitioners to select models which produce uncertainty estimates of lower quality, negatively impacting any downstream systems that rely on them. Additionally, most MOT performance measures have hyperparameters, which makes comparisons of different trackers less straightforward. We propose the use of the negative log-likelihood (NLL) of the multi-object posterior given the set of ground-truth objects as a performance measure. This measure takes into account all available uncertainty information in a sound mathematical manner without hyperparameters. We provide efficient algorithms for approximating the computation of the NLL for several common MOT algorithms, show that in some cases it decomposes and approximates the widely-used GOSPA metric, and provide several illustrative examples highlighting the advantages of the NLL in comparison to other MOT performance measures. △ Less

Submitted 9 September, 2021; v1 submitted 10 August, 2021; originally announced August 2021.

Comments: Accepted to IEEE Signal Processing Letters 2021

arXiv:2107.09645 [pdf, other]

Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning

Authors: Denis Yarats, Rob Fergus, Alessandro Lazaric, Lerrel Pinto

Abstract: We present DrQ-v2, a model-free reinforcement learning (RL) algorithm for visual continuous control. DrQ-v2 builds on DrQ, an off-policy actor-critic approach that uses data augmentation to learn directly from pixels. We introduce several improvements that yield state-of-the-art results on the DeepMind Control Suite. Notably, DrQ-v2 is able to solve complex humanoid locomotion tasks directly from… ▽ More We present DrQ-v2, a model-free reinforcement learning (RL) algorithm for visual continuous control. DrQ-v2 builds on DrQ, an off-policy actor-critic approach that uses data augmentation to learn directly from pixels. We introduce several improvements that yield state-of-the-art results on the DeepMind Control Suite. Notably, DrQ-v2 is able to solve complex humanoid locomotion tasks directly from pixel observations, previously unattained by model-free RL. DrQ-v2 is conceptually simple, easy to implement, and provides significantly better computational footprint compared to prior work, with the majority of tasks taking just 8 hours to train on a single GPU. Finally, we publicly release DrQ-v2's implementation to provide RL practitioners with a strong and computationally efficient baseline. △ Less

Submitted 20 July, 2021; originally announced July 2021.

arXiv:2107.09046 [pdf, other]

Playful Interactions for Representation Learning

Authors: Sarah Young, Jyothish Pari, Pieter Abbeel, Lerrel Pinto

Abstract: One of the key challenges in visual imitation learning is collecting large amounts of expert demonstrations for a given task. While methods for collecting human demonstrations are becoming easier with teleoperation methods and the use of low-cost assistive tools, we often still require 100-1000 demonstrations for every task to learn a visual representation and policy. To address this, we turn to a… ▽ More One of the key challenges in visual imitation learning is collecting large amounts of expert demonstrations for a given task. While methods for collecting human demonstrations are becoming easier with teleoperation methods and the use of low-cost assistive tools, we often still require 100-1000 demonstrations for every task to learn a visual representation and policy. To address this, we turn to an alternate form of data that does not require task-specific demonstrations -- play. Playing is a fundamental method children use to learn a set of skills and behaviors and visual representations in early learning. Importantly, play data is diverse, task-agnostic, and relatively cheap to obtain. In this work, we propose to use playful interactions in a self-supervised manner to learn visual representations for downstream tasks. We collect 2 hours of playful data in 19 diverse environments and use self-predictive learning to extract visual representations. Given these representations, we train policies using imitation learning for two downstream tasks: Pushing and Stacking. We demonstrate that our visual representations generalize better than standard behavior cloning and can achieve similar performance with only half the number of required demonstrations. Our representations, which are trained from scratch, compare favorably against ImageNet pretrained representations. Finally, we provide an experimental analysis on the effects of different pretraining modes on downstream task learning. △ Less

Submitted 19 July, 2021; originally announced July 2021.

arXiv:2107.03856 [pdf, ps, other]

Characterization of biophysical determinants of spatio-temporal calcium dynamics in astrocytes

Authors: Thais Appelt Peres Bartiê, Leonel Teixeira Pinto

Abstract: Most of the functions performed by astrocytes in brain information processing are related to calcium waves. Experimental studies involving calcium waves present discrepant results, leading to gaps in the full understanding of the functions of these cells. The use of mathematical models help to understand the experimental results, identifying chemical mechanisms involved in calcium waves and the li… ▽ More Most of the functions performed by astrocytes in brain information processing are related to calcium waves. Experimental studies involving calcium waves present discrepant results, leading to gaps in the full understanding of the functions of these cells. The use of mathematical models help to understand the experimental results, identifying chemical mechanisms involved in calcium waves and the limits of experimental methods. The model is diffusion-based and uses receptors and channels as boundary conditions. The computer program developed was prepared to allow the study of complex geometries, with several astrocytes, each of them with several branches. The code structure allows easy adaptation to various experimental situations in which the model can be compared. The code was deposited in the ModelDB repository, and will be under number 266795 after publication. A sensitivity analysis showed the relative significance of the parameters and identifies the ideal range of values for each one. We showed that several sets of values can lead to the same calcium signaling dynamics. This encourages the questioning of parameters to model calcium signaling in astrocytes that are commonly used in the literature, and it suggests better experimental planning. In the final part of the work, the effects produced by the endoplasmic reticulum when located close to the extremities of the branches were evaluated. We conclude that when they are located close to the region of the glutamatergic stimulus, they favor local calcium dynamics. By contrast, when they are located at points away from the stimulated region, they accelerate the global spread of signaling. △ Less

Submitted 8 July, 2021; originally announced July 2021.

Comments: 26 pages, 26 figures, research paper with calcium wave in astrocytes: modeling and simulation

MSC Class: 65N06

arXiv:2106.04056 [pdf]

doi 10.4103/jclpca.jclpca_2_21

The use of hyaluronic acid in individuals with cleft lip and palate: Literature review

Authors: Kelly Fernanda Molena, Lidiane de Castro Pinto, Gisele da Silva Dalben

Abstract: Since the Resolution 198/2019 of Brazilian Dental Council, which regulates orofacial harmonization as a dental specialty, and the advent of various uses of facial fillers, such as hyaluronic acid (HA), it is possible to perform both esthetic and functional corrections in individuals. Individuals with cleft lip and palate (CLP) present lip irregularities even after orofacial rehabilitation with an… ▽ More Since the Resolution 198/2019 of Brazilian Dental Council, which regulates orofacial harmonization as a dental specialty, and the advent of various uses of facial fillers, such as hyaluronic acid (HA), it is possible to perform both esthetic and functional corrections in individuals. Individuals with cleft lip and palate (CLP) present lip irregularities even after orofacial rehabilitation with an interdisciplinary team with several corrective surgeries, interfering with the esthetics, which can cause problems in self-esteem and social insertion. Thus, facial filling is an innovation that, together with dentistry, contributes to the individual's esthetics and well-being. Considering the patient safety and health, more research is progressively being conducted to make such procedures less invasive. This work conducted a literature review on the use of HA as a facial filler to correct lip scars in patients with CLP. By a literature and transverse search in Scientific Electronic Library Online and PubMed databases using specific descriptors, the studies that met the inclusion criteria were selected, from 1990 to 2020. It can be concluded that the use of HA as a facial filling material in the correction of lip scars from reparative surgeries related to CLP has been shown to be effective both for correction of facial asymmetry and to improve the quality of life of patients who used the procedure. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Journal ref: J Cleft Lip Palate Craniofac Anomal 2021;8:143-8

arXiv:2106.00639 [pdf, other]

Multi-modal Point-of-Care Diagnostics for COVID-19 Based On Acoustics and Symptoms

Authors: Srikanth Raj Chetupalli, Prashant Krishnan, Neeraj Sharma, Ananya Muguli, Rohit Kumar, Viral Nanda, Lancelot Mark Pinto, Prasanta Kumar Ghosh, Sriram Ganapathy

Abstract: The research direction of identifying acoustic bio-markers of respiratory diseases has received renewed interest following the onset of COVID-19 pandemic. In this paper, we design an approach to COVID-19 diagnostic using crowd-sourced multi-modal data. The data resource, consisting of acoustic signals like cough, breathing, and speech signals, along with the data of symptoms, are recorded using a… ▽ More The research direction of identifying acoustic bio-markers of respiratory diseases has received renewed interest following the onset of COVID-19 pandemic. In this paper, we design an approach to COVID-19 diagnostic using crowd-sourced multi-modal data. The data resource, consisting of acoustic signals like cough, breathing, and speech signals, along with the data of symptoms, are recorded using a web-application over a period of ten months. We investigate the use of statistical descriptors of simple time-frequency features for acoustic signals and binary features for the presence of symptoms. Unlike previous works, we primarily focus on the application of simple linear classifiers like logistic regression and support vector machines for acoustic data while decision tree models are employed on the symptoms data. We show that a multi-modal integration of acoustics and symptoms classifiers achieves an area-under-curve (AUC) of 92.40, a significant improvement over any individual modality. Several ablation experiments are also provided which highlight the acoustic and symptom dimensions that are important for the task of COVID-19 diagnostics. △ Less

Submitted 5 June, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

Comments: The Manuscript is submitted to IEEE-EMBS Journal of Biomedical and Health Informatics on June 1, 2021

arXiv:2104.02844 [pdf, other]

GEM: Group Enhanced Model for Learning Dynamical Control Systems

Authors: Philippe Hansen-Estruch, Wenling Shang, Lerrel Pinto, Pieter Abbeel, Stas Tiomkin

Abstract: Learning the dynamics of a physical system wherein an autonomous agent operates is an important task. Often these systems present apparent geometric structures. For instance, the trajectories of a robotic manipulator can be broken down into a collection of its transitional and rotational motions, fully characterized by the corresponding Lie groups and Lie algebras. In this work, we take advantage… ▽ More Learning the dynamics of a physical system wherein an autonomous agent operates is an important task. Often these systems present apparent geometric structures. For instance, the trajectories of a robotic manipulator can be broken down into a collection of its transitional and rotational motions, fully characterized by the corresponding Lie groups and Lie algebras. In this work, we take advantage of these structures to build effective dynamical models that are amenable to sample-based learning. We hypothesize that learning the dynamics on a Lie algebra vector space is more effective than learning a direct state transition model. To verify this hypothesis, we introduce the Group Enhanced Model (GEM). GEMs significantly outperform conventional transition models on tasks of long-term prediction, planning, and model-based reinforcement learning across a diverse suite of standard continuous-control environments, including Walker, Hopper, Reacher, Half-Cheetah, Inverted Pendulums, Ant, and Humanoid. Furthermore, plugging GEM into existing state of the art systems enhances their performance, which we demonstrate on the PETS system. This work sheds light on a connection between learning of dynamics and Lie group properties, which opens doors for new research directions and practical applications along this direction. Our code is publicly available at: https://tinyurl.com/GEMMBRL. △ Less

Submitted 6 April, 2021; originally announced April 2021.

Comments: 14 pages, 8 figures

arXiv:2104.00734 [pdf, other]

Next Generation Multitarget Trackers: Random Finite Set Methods vs Transformer-based Deep Learning

Authors: Juliano Pinto, Georg Hess, William Ljungbergh, Yuxuan Xia, Lennart Svensson, Henk Wymeersch

Abstract: Multitarget Tracking (MTT) is the problem of tracking the states of an unknown number of objects using noisy measurements, with important applications to autonomous driving, surveillance, robotics, and others. In the model-based Bayesian setting, there are conjugate priors that enable us to express the multi-object posterior in closed form, which could theoretically provide Bayes-optimal estimates… ▽ More Multitarget Tracking (MTT) is the problem of tracking the states of an unknown number of objects using noisy measurements, with important applications to autonomous driving, surveillance, robotics, and others. In the model-based Bayesian setting, there are conjugate priors that enable us to express the multi-object posterior in closed form, which could theoretically provide Bayes-optimal estimates. However, the posterior involves a super-exponential growth of the number of hypotheses over time, forcing state-of-the-art methods to resort to approximations for remaining tractable, which can impact their performance in complex scenarios. Model-free methods based on deep-learning provide an attractive alternative, as they can, in principle, learn the optimal filter from data, but to the best of our knowledge were never compared to current state-of-the-art Bayesian filters, specially not in contexts where accurate models are available. In this paper, we propose a high-performing deep-learning method for MTT based on the Transformer architecture and compare it to two state-of-the-art Bayesian filters, in a setting where we assume the correct model is provided. Although this gives an edge to the model-based filters, it also allows us to generate unlimited training data. We show that the proposed model outperforms state-of-the-art Bayesian filters in complex scenarios, while matching their performance in simpler cases, which validates the applicability of deep-learning also in the model-based regime. The code for all our implementations is made available at https://github.com/JulianoLagana/MT3 . △ Less

Submitted 4 June, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

Comments: 8 pages, 4 figures

arXiv:2103.16732 [pdf, other]

Simultaneous Navigation and Construction Benchmarking Environments

Authors: Wenyu Han, Chen Feng, Haoran Wu, Alexander Gao, Armand Jordana, Dong Liu, Lerrel Pinto, Ludovic Righetti

Abstract: We need intelligent robots for mobile construction, the process of navigating in an environment and modifying its structure according to a geometric design. In this task, a major robot vision and learning challenge is how to exactly achieve the design without GPS, due to the difficulty caused by the bi-directional coupling of accurate robot localization and navigation together with strategic envir… ▽ More We need intelligent robots for mobile construction, the process of navigating in an environment and modifying its structure according to a geometric design. In this task, a major robot vision and learning challenge is how to exactly achieve the design without GPS, due to the difficulty caused by the bi-directional coupling of accurate robot localization and navigation together with strategic environment manipulation. However, many existing robot vision and learning tasks such as visual navigation and robot manipulation address only one of these two coupled aspects. To stimulate the pursuit of a generic and adaptive solution, we reasonably simplify mobile construction as a partially observable Markov decision process (POMDP) in 1/2/3D grid worlds and benchmark the performance of a handcrafted policy with basic localization and planning, and state-of-the-art deep reinforcement learning (RL) methods. Our extensive experiments show that the coupling makes this problem very challenging for those methods, and emphasize the need for novel task-specific solutions. △ Less

Submitted 30 March, 2021; originally announced March 2021.

arXiv:2103.09148 [pdf, other]

DiCOVA Challenge: Dataset, task, and baseline system for COVID-19 diagnosis using acoustics

Authors: Ananya Muguli, Lancelot Pinto, Nirmala R., Neeraj Sharma, Prashant Krishnan, Prasanta Kumar Ghosh, Rohit Kumar, Shrirama Bhat, Srikanth Raj Chetupalli, Sriram Ganapathy, Shreyas Ramoji, Viral Nanda

Abstract: The DiCOVA challenge aims at accelerating research in diagnosing COVID-19 using acoustics (DiCOVA), a topic at the intersection of speech and audio processing, respiratory health diagnosis, and machine learning. This challenge is an open call for researchers to analyze a dataset of sound recordings collected from COVID-19 infected and non-COVID-19 individuals for a two-class classification. These… ▽ More The DiCOVA challenge aims at accelerating research in diagnosing COVID-19 using acoustics (DiCOVA), a topic at the intersection of speech and audio processing, respiratory health diagnosis, and machine learning. This challenge is an open call for researchers to analyze a dataset of sound recordings collected from COVID-19 infected and non-COVID-19 individuals for a two-class classification. These recordings were collected via crowdsourcing from multiple countries, through a website application. The challenge features two tracks, one focusing on cough sounds, and the other on using a collection of breath, sustained vowel phonation, and number counting speech recordings. In this paper, we introduce the challenge and provide a detailed description of the task, and present a baseline system for the task. △ Less

Submitted 17 June, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

Comments: To appear in Proceedings of Interspeech, 2021

arXiv:2102.13192 [pdf, other]

PlaceRAN: Optimal Placement of Virtualized Network Functions in the Next-generation Radio Access Networks

Authors: Fernando Zanferrari Morais, Gabriel Matheus de Almeida, Leizer Pinto, Kleber Vieira Cardoso, Luis M. Contreras, Rodrigo da Rosa Righi, Cristiano Bonato Both

Abstract: The fifth-generation mobile evolution enables several transformations on Next Generation Radio Access Networks (NG-RAN). The RAN protocol stack is splitting into eight possible disaggregated options combined into three network units, i.e., Central, Distributed, and Radio. Besides that, further advances allow the RAN software to be virtualized on top of general-purpose vendor-neutral hardware, deal… ▽ More The fifth-generation mobile evolution enables several transformations on Next Generation Radio Access Networks (NG-RAN). The RAN protocol stack is splitting into eight possible disaggregated options combined into three network units, i.e., Central, Distributed, and Radio. Besides that, further advances allow the RAN software to be virtualized on top of general-purpose vendor-neutral hardware, dealing with the concept of virtualized RAN (vRAN). The disaggregated network units initiatives reach full interoperability based on the Open RAN (O-RAN). The combination of NG-RAN and vRAN results in vNG-RAN, enabling the management of disaggregated units and protocols as a set of radio functions. The placement of these functions is challenging since the best decision can be based on multiple constraints, such as the RAN protocol stack split, routing paths of transport networks with restricted bandwidth and latency requirements, different topologies and link capabilities, asymmetric computational resources, etc. This article proposes the first exact model for the placement optimization of radio functions for vNG-RAN planning, named PlaceRAN. The main objective is to minimize the computing resources and maximize the aggregation of radio functions. The PlaceRAN evaluation considered two realistic network topologies. Our results reveal that the PlaceRAN model achieves an optimized high-performance aggregation level, it is flexible for RAN deployment overcoming the network restrictions, and it is up to date with the most advanced vNG-RAN design and development. △ Less

Submitted 28 March, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

Showing 1–50 of 107 results for author: Pinto, L