Search | arXiv e-print repository

Navigation with Large Language Models: Semantic Guesswork as a Heuristic for Planning

Authors: Dhruv Shah, Michael Equi, Blazej Osinski, Fei Xia, Brian Ichter, Sergey Levine

Abstract: Navigation in unfamiliar environments presents a major challenge for robots: while map** and planning techniques can be used to build up a representation of the world, quickly discovering a path to a desired goal in unfamiliar settings with such methods often requires lengthy map** and exploration. Humans can rapidly navigate new environments, particularly indoor environments that are laid out… ▽ More Navigation in unfamiliar environments presents a major challenge for robots: while map** and planning techniques can be used to build up a representation of the world, quickly discovering a path to a desired goal in unfamiliar settings with such methods often requires lengthy map** and exploration. Humans can rapidly navigate new environments, particularly indoor environments that are laid out logically, by leveraging semantics -- e.g., a kitchen often adjoins a living room, an exit sign indicates the way out, and so forth. Language models can provide robots with such knowledge, but directly using language models to instruct a robot how to reach some destination can also be impractical: while language models might produce a narrative about how to reach some goal, because they are not grounded in real-world observations, this narrative might be arbitrarily wrong. Therefore, in this paper we study how the ``semantic guesswork'' produced by language models can be utilized as a guiding heuristic for planning algorithms. Our method, Language Frontier Guide (LFG), uses the language model to bias exploration of novel real-world environments by incorporating the semantic knowledge stored in language models as a search heuristic for planning with either topological or metric maps. We evaluate LFG in challenging real-world environments and simulated benchmarks, outperforming uninformed exploration and other ways of using language models. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: Videos, code, and an interactive Colab notebook that runs in your browser https://sites.google.com/view/lfg-nav/

arXiv:2207.04429 [pdf, other]

LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action

Authors: Dhruv Shah, Blazej Osinski, Brian Ichter, Sergey Levine

Abstract: Goal-conditioned policies for robotic navigation can be trained on large, unannotated datasets, providing for good generalization to real-world settings. However, particularly in vision-based settings where specifying goals requires an image, this makes for an unnatural interface. Language provides a more convenient modality for communication with robots, but contemporary methods typically require… ▽ More Goal-conditioned policies for robotic navigation can be trained on large, unannotated datasets, providing for good generalization to real-world settings. However, particularly in vision-based settings where specifying goals requires an image, this makes for an unnatural interface. Language provides a more convenient modality for communication with robots, but contemporary methods typically require expensive supervision, in the form of trajectories annotated with language descriptions. We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories, while still providing a high-level interface to the user. Instead of utilizing a labeled instruction following dataset, we show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language modeling (GPT-3), without requiring any fine-tuning or language-annotated robot data. We instantiate LM-Nav on a real-world mobile robot and demonstrate long-horizon navigation through complex, outdoor environments from natural language instructions. For videos of our experiments, code release, and an interactive Colab notebook that runs in your browser, please check out our project page https://sites.google.com/view/lmnav △ Less

Submitted 26 July, 2022; v1 submitted 10 July, 2022; originally announced July 2022.

Comments: Project page https://sites.google.com/view/lmnav

arXiv:2203.13948 [pdf]

AI-augmented histopathologic review using image analysis to optimize DNA yield and tumor purity from FFPE slides

Authors: Bolesław L. Osinski, Aïcha BenTaieb, Irvin Ho, Ryan D. Jones, Rohan P. Joshi, Andrew Westley, Michael Carlson, Caleb Willis, Luke Schleicher, Brett M. Mahon, Martin C. Stumpe

Abstract: To achieve minimum DNA input and tumor purity requirements for next-generation sequencing (NGS), pathologists visually estimate macrodissection and slide count decisions. Misestimation may cause tissue waste and increased laboratory costs. We developed an AI-augmented smart pathology review system (SmartPath) to empower pathologists with quantitative metrics for determining tissue extraction param… ▽ More To achieve minimum DNA input and tumor purity requirements for next-generation sequencing (NGS), pathologists visually estimate macrodissection and slide count decisions. Misestimation may cause tissue waste and increased laboratory costs. We developed an AI-augmented smart pathology review system (SmartPath) to empower pathologists with quantitative metrics for determining tissue extraction parameters. Using digitized H&E-stained FFPE slides as inputs, SmartPath segments tumors, extracts cell-based features, and suggests macrodissection areas. To predict DNA yield per slide, the extracted features are correlated with known DNA yields. Then, a pathologist-defined target yield divided by the predicted DNA yield/slide gives the number of slides to scrape. Following model development, an internal validation trial was conducted within the Tempus Labs molecular sequencing laboratory. We evaluated our system on 501 clinical colorectal cancer slides, where half received SmartPath-augmented review and half traditional pathologist review. The SmartPath cohort had 25% more DNA yields within a desired target range of 100-2000ng. The SmartPath system recommended fewer slides to scrape for large tissue sections, saving tissue in these cases. Conversely, SmartPath recommended more slides to scrape for samples with scant tissue sections, hel** prevent costly re-extraction due to insufficient extraction yield. A statistical analysis was performed to measure the impact of covariates on the results, offering insights on how to improve future applications of SmartPath. Overall, the study demonstrated that AI-augmented histopathologic review using SmartPath could decrease tissue waste, sequencing time, and laboratory costs by optimizing DNA yields and tumor purity. △ Less

Submitted 7 April, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

arXiv:2203.10062 [pdf]

Imaging-based histological features are predictive of MET alterations in Non-Small Cell Lung Cancer

Authors: Rohan P. Joshi, Bolesław L. Osinski, Niha Beig, Lingdao Sha, Kshitij Ingale, Martin C. Stumpe

Abstract: MET is a proto-oncogene whose somatic activation in non-small cell lung cancer leads to increased cell growth and tumor progression. The two major classes of MET alterations are gene amplification and exon 14 deletion, both of which are therapeutic targets and detectable using existing molecular assays. However, existing tests are limited by their consumption of valuable tissue, cost and complexit… ▽ More MET is a proto-oncogene whose somatic activation in non-small cell lung cancer leads to increased cell growth and tumor progression. The two major classes of MET alterations are gene amplification and exon 14 deletion, both of which are therapeutic targets and detectable using existing molecular assays. However, existing tests are limited by their consumption of valuable tissue, cost and complexity that prevent widespread use. MET alterations could have an effect on cell morphology, and quantifying these associations could open new avenues for research and development of morphology-based screening tools. Using H&E-stained whole slide images (WSIs), we investigated the association of distinct cell-morphological features with MET amplifications and MET exon 14 deletions. We found that cell shape, color, grayscale intensity and texture-based features from both tumor infiltrating lymphocytes and tumor cells distinguished MET wild-type from MET amplified or MET exon 14 deletion cases. The association of individual cell features with MET alterations suggested a predictive model could distinguish MET wild-type from MET amplification or MET exon 14 deletion. We therefore developed an L1-penalized logistic regression model, achieving a mean Area Under the Receiver Operating Characteristic Curve (ROC-AUC) of 0.77 +/- 0.05sd in cross-validation and 0.77 on an independent holdout test set. A sparse set of 43 features differentiated these classes, which included features similar to what was found in the univariate analysis as well as the percent of tumor cells in the tissue. Our study demonstrates that MET alterations result in a detectable morphological signal in tumor cells and lymphocytes. These results suggest that development of low-cost predictive models based on H&E-stained WSIs may improve screening for MET altered tumors. △ Less

Submitted 29 March, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

Comments: 30 pages, 4 figures

arXiv:2111.11229 [pdf, other]

Off-Policy Correction For Multi-Agent Reinforcement Learning

Authors: Michał Zawalski, Błażej Osiński, Henryk Michalewski, Piotr Miłoś

Abstract: Multi-agent reinforcement learning (MARL) provides a framework for problems involving multiple interacting agents. Despite apparent similarity to the single-agent case, multi-agent problems are often harder to train and analyze theoretically. In this work, we propose MA-Trace, a new on-policy actor-critic algorithm, which extends V-Trace to the MARL setting. The key advantage of our algorithm is i… ▽ More Multi-agent reinforcement learning (MARL) provides a framework for problems involving multiple interacting agents. Despite apparent similarity to the single-agent case, multi-agent problems are often harder to train and analyze theoretically. In this work, we propose MA-Trace, a new on-policy actor-critic algorithm, which extends V-Trace to the MARL setting. The key advantage of our algorithm is its high scalability in a multi-worker setting. To this end, MA-Trace utilizes importance sampling as an off-policy correction method, which allows distributing the computations with no impact on the quality of training. Furthermore, our algorithm is theoretically grounded - we prove a fixed-point theorem that guarantees convergence. We evaluate the algorithm extensively on the StarCraft Multi-Agent Challenge, a standard benchmark for multi-agent algorithms. MA-Trace achieves high performance on all its tasks and exceeds state-of-the-art results on some of them. △ Less

Submitted 3 April, 2024; v1 submitted 22 November, 2021; originally announced November 2021.

ACM Class: I.2

arXiv:2109.13602 [pdf, other]

SafetyNet: Safe planning for real-world self-driving vehicles using machine-learned policies

Authors: Matt Vitelli, Yan Chang, Yawei Ye, Maciej Wołczyk, Błażej Osiński, Moritz Niendorf, Hugo Grimmett, Qiangui Huang, Ashesh Jain, Peter Ondruska

Abstract: In this paper we present the first safe system for full control of self-driving vehicles trained from human demonstrations and deployed in challenging, real-world, urban environments. Current industry-standard solutions use rule-based systems for planning. Although they perform reasonably well in common scenarios, the engineering complexity renders this approach incompatible with human-level perfo… ▽ More In this paper we present the first safe system for full control of self-driving vehicles trained from human demonstrations and deployed in challenging, real-world, urban environments. Current industry-standard solutions use rule-based systems for planning. Although they perform reasonably well in common scenarios, the engineering complexity renders this approach incompatible with human-level performance. On the other hand, the performance of machine-learned (ML) planning solutions can be improved by simply adding more exemplar data. However, ML methods cannot offer safety guarantees and sometimes behave unpredictably. To combat this, our approach uses a simple yet effective rule-based fallback layer that performs sanity checks on an ML planner's decisions (e.g. avoiding collision, assuring physical feasibility). This allows us to leverage ML to handle complex situations while still assuring the safety, reducing ML planner-only collisions by 95%. We train our ML planner on 300 hours of expert driving demonstrations using imitation learning and deploy it along with the fallback layer in downtown San Francisco, where it takes complete control of a real vehicle and navigates a wide variety of challenging urban driving scenarios. △ Less

Submitted 28 September, 2021; originally announced September 2021.

arXiv:2109.13333 [pdf, other]

Urban Driver: Learning to Drive from Real-world Demonstrations Using Policy Gradients

Authors: Oliver Scheel, Luca Bergamini, Maciej Wołczyk, Błażej Osiński, Peter Ondruska

Abstract: In this work we are the first to present an offline policy gradient method for learning imitative policies for complex urban driving from a large corpus of real-world demonstrations. This is achieved by building a differentiable data-driven simulator on top of perception outputs and high-fidelity HD maps of the area. It allows us to synthesize new driving experiences from existing demonstrations u… ▽ More In this work we are the first to present an offline policy gradient method for learning imitative policies for complex urban driving from a large corpus of real-world demonstrations. This is achieved by building a differentiable data-driven simulator on top of perception outputs and high-fidelity HD maps of the area. It allows us to synthesize new driving experiences from existing demonstrations using mid-level representations. Using this simulator we then train a policy network in closed-loop employing policy gradients. We train our proposed method on 100 hours of expert demonstrations on urban roads and show that it learns complex driving policies that generalize well and can perform a variety of driving maneuvers. We demonstrate this in simulation as well as deploy our model to self-driving vehicles in the real-world. Our method outperforms previously demonstrated state-of-the-art for urban driving scenarios -- all this without the need for complex state perturbations or collecting additional on-policy data during training. We make code and data publicly available. △ Less

Submitted 27 September, 2021; originally announced September 2021.

Comments: CoRL 2021

arXiv:2105.12337 [pdf, other]

What data do we need for training an AV motion planner?

Authors: Long Chen, Lukas Platinsky, Stefanie Speichert, Blazej Osinski, Oliver Scheel, Yawei Ye, Hugo Grimmett, Luca del Pero, Peter Ondruska

Abstract: We investigate what grade of sensor data is required for training an imitation-learning-based AV planner on human expert demonstration. Machine-learned planners are very hungry for training data, which is usually collected using vehicles equipped with the same sensors used for autonomous operation. This is costly and non-scalable. If cheaper sensors could be used for collection instead, data avail… ▽ More We investigate what grade of sensor data is required for training an imitation-learning-based AV planner on human expert demonstration. Machine-learned planners are very hungry for training data, which is usually collected using vehicles equipped with the same sensors used for autonomous operation. This is costly and non-scalable. If cheaper sensors could be used for collection instead, data availability would go up, which is crucial in a field where data volume requirements are large and availability is small. We present experiments using up to 1000 hours worth of expert demonstration and find that training with 10x lower-quality data outperforms 1x AV-grade data in terms of planner performance. The important implication of this is that cheaper sensors can indeed be used. This serves to improve data access and democratize the field of imitation-based motion planning. Alongside this, we perform a sensitivity analysis of planner performance as a function of perception range, field-of-view, accuracy, and data volume, and the reason why lower-quality data still provide good planning results. △ Less

Submitted 26 May, 2021; originally announced May 2021.

Comments: Published at 2021 International Conference on Robotics and Automation (ICRA2021)

arXiv:2105.12332 [pdf, other]

SimNet: Learning Reactive Self-driving Simulations from Real-world Observations

Authors: Luca Bergamini, Yawei Ye, Oliver Scheel, Long Chen, Chih Hu, Luca Del Pero, Blazej Osinski, Hugo Grimmett, Peter Ondruska

Abstract: In this work, we present a simple end-to-end trainable machine learning system capable of realistically simulating driving experiences. This can be used for the verification of self-driving system performance without relying on expensive and time-consuming road testing. In particular, we frame the simulation problem as a Markov Process, leveraging deep neural networks to model both state distribut… ▽ More In this work, we present a simple end-to-end trainable machine learning system capable of realistically simulating driving experiences. This can be used for the verification of self-driving system performance without relying on expensive and time-consuming road testing. In particular, we frame the simulation problem as a Markov Process, leveraging deep neural networks to model both state distribution and transition function. These are trainable directly from the existing raw observations without the need for any handcrafting in the form of plant or kinematic models. All that is needed is a dataset of historical traffic episodes. Our formulation allows the system to construct never seen scenes that unfold realistically reacting to the self-driving car's behaviour. We train our system directly from 1,000 hours of driving logs and measure both realism, reactivity of the simulation as the two key properties of the simulation. At the same time, we apply the method to evaluate the performance of a recently proposed state-of-the-art ML planning system trained from human driving logs. We discover this planning system is prone to previously unreported causal confusion issues that are difficult to test by non-reactive simulation. To the best of our knowledge, this is the first work that directly merges highly realistic data-driven simulations with a closed-loop evaluation for self-driving vehicles. We make the data, code, and pre-trained models publicly available to further stimulate simulation development. △ Less

Submitted 26 May, 2021; originally announced May 2021.

Comments: Published at 2021 International Conference on Robotics and Automation (ICRA2021)

arXiv:2012.11329 [pdf, other]

CARLA Real Traffic Scenarios -- novel training ground and benchmark for autonomous driving

Authors: Błażej Osiński, Piotr Miłoś, Adam Jakubowski, Paweł Zięcina, Michał Martyniak, Christopher Galias, Antonia Breuer, Silviu Homoceanu, Henryk Michalewski

Abstract: This work introduces interactive traffic scenarios in the CARLA simulator, which are based on real-world traffic. We concentrate on tactical tasks lasting several seconds, which are especially challenging for current control methods. The CARLA Real Traffic Scenarios (CRTS) is intended to be a training and testing ground for autonomous driving systems. To this end, we open-source the code under a p… ▽ More This work introduces interactive traffic scenarios in the CARLA simulator, which are based on real-world traffic. We concentrate on tactical tasks lasting several seconds, which are especially challenging for current control methods. The CARLA Real Traffic Scenarios (CRTS) is intended to be a training and testing ground for autonomous driving systems. To this end, we open-source the code under a permissive license and present a set of baseline policies. CRTS combines the realism of traffic scenarios and the flexibility of simulation. We use it to train agents using a reinforcement learning algorithm. We show how to obtain competitive polices and evaluate experimentally how observation types and reward schemes affect the training process and the resulting agent's behavior. △ Less

Submitted 19 September, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

arXiv:1911.12905 [pdf, other]

Simulation-based reinforcement learning for real-world autonomous driving

Authors: Błażej Osiński, Adam Jakubowski, Piotr Miłoś, Paweł Zięcina, Christopher Galias, Silviu Homoceanu, Henryk Michalewski

Abstract: We use reinforcement learning in simulation to obtain a driving system controlling a full-size real-world vehicle. The driving policy takes RGB images from a single camera and their semantic segmentation as input. We use mostly synthetic data, with labelled real-world data appearing only in the training of the segmentation network. Using reinforcement learning in simulation and synthetic data is… ▽ More We use reinforcement learning in simulation to obtain a driving system controlling a full-size real-world vehicle. The driving policy takes RGB images from a single camera and their semantic segmentation as input. We use mostly synthetic data, with labelled real-world data appearing only in the training of the segmentation network. Using reinforcement learning in simulation and synthetic data is motivated by lowering costs and engineering effort. In real-world experiments we confirm that we achieved successful sim-to-real policy transfer. Based on the extensive evaluation, we analyze how design decisions about perception, control, and training impact the real-world performance. △ Less

Submitted 3 April, 2024; v1 submitted 28 November, 2019; originally announced November 2019.

arXiv:1903.00374 [pdf, other]

Model-Based Reinforcement Learning for Atari

Authors: Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski

Abstract: Model-free reinforcement learning (RL) can be used to learn effective policies for complex tasks, such as Atari games, even from image observations. However, this typically requires very large amounts of interaction -- substantially more, in fact, than a human would need to learn the same games. How can people learn so quickly? Part of the answer may be that people can learn how the game works and… ▽ More Model-free reinforcement learning (RL) can be used to learn effective policies for complex tasks, such as Atari games, even from image observations. However, this typically requires very large amounts of interaction -- substantially more, in fact, than a human would need to learn the same games. How can people learn so quickly? Part of the answer may be that people can learn how the game works and predict which actions will lead to desirable outcomes. In this paper, we explore how video prediction models can similarly enable agents to solve Atari games with fewer interactions than model-free methods. We describe Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models and present a comparison of several model architectures, including a novel architecture that yields the best results in our setting. Our experiments evaluate SimPLe on a range of Atari games in low data regime of 100k interactions between the agent and the environment, which corresponds to two hours of real-time play. In most games SimPLe outperforms state-of-the-art model-free algorithms, in some games by over an order of magnitude. △ Less

Submitted 3 April, 2024; v1 submitted 1 March, 2019; originally announced March 2019.

arXiv:1804.00361 [pdf, other]

Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments

Authors: Łukasz Kidziński, Sharada Prasanna Mohanty, Carmichael Ong, Zhewei Huang, Shuchang Zhou, Anton Pechenko, Adam Stelmaszczyk, Piotr Jarosik, Mikhail Pavlov, Sergey Kolesnikov, Sergey Plis, Zhibo Chen, Zhizheng Zhang, Jiale Chen, Jun Shi, Zhuobin Zheng, Chun Yuan, Zhihui Lin, Henryk Michalewski, Piotr Miłoś, Błażej Osiński, Andrew Melnik, Malte Schilling, Helge Ritter, Sean Carroll , et al. (4 additional authors not shown)

Abstract: In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient… ▽ More In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region Policy Optimization. Many solutions use similar relaxations and heuristics, such as reward sha**, frame skip**, discretization of the action space, symmetry, and policy blending. However, each of the eight teams implemented different modifications of the known algorithms. △ Less

Submitted 1 April, 2018; originally announced April 2018.

Comments: 27 pages, 17 figures

Showing 1–13 of 13 results for author: Osiński, B