Search | arXiv e-print repository

Information-Theoretic Safe Bayesian Optimization

Authors: Alessandro G. Bottero, Carlos E. Luis, Julia Vinogradska, Felix Berkenkamp, Jan Peters

Abstract: We consider a sequential decision making task, where the goal is to optimize an unknown function without evaluating parameters that violate an a~priori unknown (safety) constraint. A common approach is to place a Gaussian process prior on the unknown functions and allow evaluations only in regions that are safe with high probability. Most current methods rely on a discretization of the domain and… ▽ More We consider a sequential decision making task, where the goal is to optimize an unknown function without evaluating parameters that violate an a~priori unknown (safety) constraint. A common approach is to place a Gaussian process prior on the unknown functions and allow evaluations only in regions that are safe with high probability. Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case. Moreover, the way in which they exploit regularity assumptions about the constraint introduces an additional critical hyperparameter. In this paper, we propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate. The combination of this exploration criterion with a well known Bayesian optimization acquisition function yields a novel safe Bayesian optimization selection criterion. Our approach is naturally applicable to continuous domains and does not require additional explicit hyperparameters. We theoretically analyze the method and show that we do not violate the safety constraint with high probability and that we learn about the value of the safe optimum up to arbitrary precision. Empirical evaluations demonstrate improved data-efficiency and scalability. △ Less

Submitted 10 May, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

Comments: arXiv admin note: text overlap with arXiv:2212.04914

arXiv:2312.04386 [pdf, other]

Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization

Authors: Carlos E. Luis, Alessandro G. Bottero, Julia Vinogradska, Felix Berkenkamp, Jan Peters

Abstract: We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation (UBE), but the over-approximation may result in inefficient… ▽ More We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation (UBE), but the over-approximation may result in inefficient exploration. We propose a new UBE whose solution converges to the true posterior variance over values and leads to lower regret in tabular exploration problems. We identify challenges to apply the UBE theory beyond tabular problems and propose a suitable approximation. Based on this approximation, we introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC), that can be applied for either risk-seeking or risk-averse policy optimization with minimal changes. Experiments in both online and offline RL demonstrate improved performance compared to other uncertainty estimation methods. △ Less

Submitted 13 December, 2023; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2302.12526

arXiv:2308.06590 [pdf, other]

Value-Distributional Model-Based Reinforcement Learning

Authors: Carlos E. Luis, Alessandro G. Bottero, Julia Vinogradska, Felix Berkenkamp, Jan Peters

Abstract: Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks. We study the problem from a model-based Bayesian reinforcement learning perspective, where the goal is to learn the posterior distribution over value functions induced by parameter (epistemic) uncertainty of the Markov decision process. Previous work restricts the analysis to a fe… ▽ More Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks. We study the problem from a model-based Bayesian reinforcement learning perspective, where the goal is to learn the posterior distribution over value functions induced by parameter (epistemic) uncertainty of the Markov decision process. Previous work restricts the analysis to a few moments of the distribution over values or imposes a particular distribution shape, e.g., Gaussians. Inspired by distributional reinforcement learning, we introduce a Bellman operator whose fixed-point is the value distribution function. Based on our theory, we propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function that can be used for policy optimization. Evaluation across several continuous-control tasks shows performance benefits with respect to established model-based and model-free algorithms. △ Less

Submitted 12 August, 2023; originally announced August 2023.

arXiv:2302.12526 [pdf, other]

Model-Based Uncertainty in Value Functions

Authors: Carlos E. Luis, Alessandro G. Bottero, Julia Vinogradska, Felix Berkenkamp, Jan Peters

Abstract: We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation, but the over-approximation may result in inefficient explo… ▽ More We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation, but the over-approximation may result in inefficient exploration. We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values and explicitly characterizes the gap in previous work. Moreover, our uncertainty quantification technique is easily integrated into common exploration strategies and scales naturally beyond the tabular setting by using standard deep reinforcement learning architectures. Experiments in difficult exploration tasks, both in tabular and continuous control settings, show that our sharper uncertainty estimates improve sample-efficiency. △ Less

Submitted 7 March, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

Comments: AISTATS 2023

arXiv:2212.04914 [pdf, other]

Information-Theoretic Safe Exploration with Gaussian Processes

Authors: Alessandro G. Bottero, Carlos E. Luis, Julia Vinogradska, Felix Berkenkamp, Jan Peters

Abstract: We consider a sequential decision making task where we are not allowed to evaluate parameters that violate an a priori unknown (safety) constraint. A common approach is to place a Gaussian process prior on the unknown constraint and allow evaluations only in regions that are safe with high probability. Most current methods rely on a discretization of the domain and cannot be directly extended to t… ▽ More We consider a sequential decision making task where we are not allowed to evaluate parameters that violate an a priori unknown (safety) constraint. A common approach is to place a Gaussian process prior on the unknown constraint and allow evaluations only in regions that are safe with high probability. Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case. Moreover, the way in which they exploit regularity assumptions about the constraint introduces an additional critical hyperparameter. In this paper, we propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate. Our approach is naturally applicable to continuous domains and does not require additional hyperparameters. We theoretically analyze the method and show that we do not violate the safety constraint with high probability and that we explore by learning about the constraint up to arbitrary precision. Empirical evaluations demonstrate improved data-efficiency and scalability. △ Less

Submitted 9 December, 2022; originally announced December 2022.

Comments: Submitted to NeurIPS 2022

arXiv:1909.05150 [pdf, other]

doi 10.1109/LRA.2020.2964159

Online Trajectory Generation with Distributed Model Predictive Control for Multi-Robot Motion Planning

Authors: Carlos E. Luis, Marijan Vukosavljev, Angela P. Schoellig

Abstract: We present a distributed model predictive control (DMPC) algorithm to generate trajectories in real-time for multiple robots. We adopted the \textit{on-demand collision avoidance} method presented in previous work to efficiently compute non-colliding trajectories in transition tasks. An event-triggered replanning strategy is proposed to account for disturbances. Our simulation results show that th… ▽ More We present a distributed model predictive control (DMPC) algorithm to generate trajectories in real-time for multiple robots. We adopted the \textit{on-demand collision avoidance} method presented in previous work to efficiently compute non-colliding trajectories in transition tasks. An event-triggered replanning strategy is proposed to account for disturbances. Our simulation results show that the proposed collision avoidance method can reduce, on average, around 50% of the travel time required to complete a multi-agent point-to-point transition when compared to the well-studied Buffered Voronoi Cells (BVC) approach. Additionally, it shows a higher success rate in transition tasks with a high density of agents, with more than 90% success rate with 30 palm-sized quadrotor agents in a 18 m^3 arena. The approach was experimentally validated with a swarm of up to 20 drones flying in close proximity. △ Less

Submitted 24 January, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

Comments: 8 pages, 8 figures

Journal ref: IEEE Robotics and Automation Letters, vol. 5, iss. 2, pp. 604-611, 2020

arXiv:1810.03572 [pdf, other]

Fast and In Sync: Periodic Swarm Patterns for Quadrotors

Authors: Xintong Du, Carlos E. Luis, Marijan Vukosavljev, Angela P. Schoellig

Abstract: This paper aims to design quadrotor swarm performances, where the swarm acts as an integrated, coordinated unit embodying moving and deforming objects. We divide the task of creating a choreography into three basic steps: designing swarm motion primitives, transitioning between those movements, and synchronizing the motion of the drones. The result is a flexible framework for designing choreograph… ▽ More This paper aims to design quadrotor swarm performances, where the swarm acts as an integrated, coordinated unit embodying moving and deforming objects. We divide the task of creating a choreography into three basic steps: designing swarm motion primitives, transitioning between those movements, and synchronizing the motion of the drones. The result is a flexible framework for designing choreographies comprised of a wide variety of motions. The motion primitives can be intuitively designed using few parameters, providing a rich library for choreography design. Moreover, we combine and adapt existing goal assignment and trajectory generation algorithms to maximize the smoothness of the transitions between motion primitives. Finally, we propose a correction algorithm to compensate for motion delays and synchronize the motion of the drones to a desired periodic motion pattern. The proposed methodology was validated experimentally by generating and executing choreographies on a swarm of 25 quadrotors. △ Less

Submitted 2 May, 2019; v1 submitted 8 October, 2018; originally announced October 2018.

Comments: This work was accepted to ICRA 2019. It is a finalist nominated for the Best Paper Award on Multi-Robot Systems and the Best Paper Award on Uncrewed Aerial Vehicles

arXiv:1809.04230 [pdf, other]

doi 10.1109/LRA.2018.2890572

Trajectory Generation for Multiagent Point-To-Point Transitions via Distributed Model Predictive Control

Authors: Carlos E. Luis, Angela P. Schoellig

Abstract: This paper introduces a novel algorithm for multiagent offline trajectory generation based on distributed model predictive control. Central to the algorithm's scalability and success is the development of an on-demand collision avoidance strategy. By predicting future states and sharing this information with their neighbors, the agents are able to detect and avoid collisions while moving toward th… ▽ More This paper introduces a novel algorithm for multiagent offline trajectory generation based on distributed model predictive control. Central to the algorithm's scalability and success is the development of an on-demand collision avoidance strategy. By predicting future states and sharing this information with their neighbors, the agents are able to detect and avoid collisions while moving toward their goals. The proposed algorithm can be implemented in a distributed fashion and reduces the computation time by more than 85% compared to previous optimization approaches based on sequential convex programming, while only having a small impact on the optimality of the plans. The approach was validated both through extensive simulations and experimentally with teams of up to 25 quadrotors flying in confined indoor spaces. △ Less

Submitted 15 January, 2019; v1 submitted 11 September, 2018; originally announced September 2018.

Comments: 8 pages, 7 figures

Journal ref: IEEE Robotics and Automation Letters, vol. 4, iss. 2, pp. 375-382, 2019

Showing 1–8 of 8 results for author: Luis, C E