-
The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret
Authors:
Lukas Fluri,
Leon Lang,
Alessandro Abate,
Patrick Forré,
David Krueger,
Joar Skalse
Abstract:
In reinforcement learning, specifying reward functions that capture the intended task can be very challenging. Reward learning aims to address this issue by learning the reward function. However, a learned reward model may have a low error on the training distribution, and yet subsequently produce a policy with large regret. We say that such a reward model has an error-regret mismatch. The main so…
▽ More
In reinforcement learning, specifying reward functions that capture the intended task can be very challenging. Reward learning aims to address this issue by learning the reward function. However, a learned reward model may have a low error on the training distribution, and yet subsequently produce a policy with large regret. We say that such a reward model has an error-regret mismatch. The main source of an error-regret mismatch is the distributional shift that commonly occurs during policy optimization. In this paper, we mathematically show that a sufficiently low expected test error of the reward model guarantees low worst-case regret, but that for any fixed expected test error, there exist realistic data distributions that allow for error-regret mismatch to occur. We then show that similar problems persist even when using policy regularization techniques, commonly employed in methods such as RLHF. Our theoretical results highlight the importance of develo** new ways to measure the quality of learned reward models.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Deep Bayesian Active Learning for Preference Modeling in Large Language Models
Authors:
Luckeciano C. Melo,
Panagiotis Tigas,
Alessandro Abate,
Yarin Gal
Abstract:
Leveraging human preferences for steering the behavior of Large Language Models (LLMs) has demonstrated notable success in recent years. Nonetheless, data selection and labeling are still a bottleneck for these systems, particularly at large scale. Hence, selecting the most informative points for acquiring human feedback may considerably reduce the cost of preference labeling and unleash the furth…
▽ More
Leveraging human preferences for steering the behavior of Large Language Models (LLMs) has demonstrated notable success in recent years. Nonetheless, data selection and labeling are still a bottleneck for these systems, particularly at large scale. Hence, selecting the most informative points for acquiring human feedback may considerably reduce the cost of preference labeling and unleash the further development of LLMs. Bayesian Active Learning provides a principled framework for addressing this challenge and has demonstrated remarkable success in diverse settings. However, previous attempts to employ it for Preference Modeling did not meet such expectations. In this work, we identify that naive epistemic uncertainty estimation leads to the acquisition of redundant samples. We address this by proposing the Bayesian Active Learner for Preference Modeling (BAL-PM), a novel stochastic acquisition policy that not only targets points of high epistemic uncertainty according to the preference model but also seeks to maximize the entropy of the acquired prompt distribution in the feature space spanned by the employed LLM. Notably, our experiments demonstrate that BAL-PM requires 33% to 68% fewer preference labels in two popular human preference datasets and exceeds previous stochastic Bayesian acquisition policies.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Stochastic Omega-Regular Verification and Control with Supermartingales
Authors:
Alessandro Abate,
Mirco Giacobbe,
Diptarko Roy
Abstract:
We present for the first time a supermartingale certificate for $ω$-regular specifications. We leverage the Robbins & Siegmund convergence theorem to characterize supermartingale certificates for the almost-sure acceptance of Streett conditions on general stochastic processes, which we call Streett supermartingales. This enables effective verification and control of discrete-time stochastic dynami…
▽ More
We present for the first time a supermartingale certificate for $ω$-regular specifications. We leverage the Robbins & Siegmund convergence theorem to characterize supermartingale certificates for the almost-sure acceptance of Streett conditions on general stochastic processes, which we call Streett supermartingales. This enables effective verification and control of discrete-time stochastic dynamical models with infinite state space under $ω$-regular and linear temporal logic specifications. Our result generalises reachability, safety, reach-avoid, persistence and recurrence specifications; our contribution applies to discrete-time stochastic dynamical models and probabilistic programs with discrete and continuous state spaces and distributions, and carries over to deterministic models and programs. We provide a synthesis algorithm for control policies and Streett supermartingales as proof certificates for $ω$-regular objectives, which is sound and complete for supermartingales and control policies with polynomial templates and any stochastic dynamical model whose post-expectation is expressible as a polynomial. We additionally provide an optimisation of our algorithm that reduces the problem to satisfiability modulo theories, under the assumption that templates and post-expectation are in piecewise linear form. We have built a prototype and have demonstrated the efficacy of our approach on several exemplar $ω$-regular verification and control synthesis problems.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Bisimulation Learning
Authors:
Alessandro Abate,
Mirco Giacobbe,
Yannik Schnitzer
Abstract:
We introduce a data-driven approach to computing finite bisimulations for state transition systems with very large, possibly infinite state space. Our novel technique computes stutter-insensitive bisimulations of deterministic systems, which we characterize as the problem of learning a state classifier together with a ranking function for each class. Our procedure learns a candidate state classifi…
▽ More
We introduce a data-driven approach to computing finite bisimulations for state transition systems with very large, possibly infinite state space. Our novel technique computes stutter-insensitive bisimulations of deterministic systems, which we characterize as the problem of learning a state classifier together with a ranking function for each class. Our procedure learns a candidate state classifier and candidate ranking functions from a finite dataset of sample states; then, it checks whether these generalise to the entire state space using satisfiability modulo theory solving. Upon the affirmative answer, the procedure concludes that the classifier constitutes a valid stutter-insensitive bisimulation of the system. Upon a negative answer, the solver produces a counterexample state for which the classifier violates the claim, adds it to the dataset, and repeats learning and checking in a counterexample-guided inductive synthesis loop until a valid bisimulation is found. We demonstrate on a range of benchmarks from reactive verification and software model checking that our method yields faster verification results than alternative state-of-the-art tools in practice. Our method produces succinct abstractions that enable an effective verification of linear temporal logic without next operator, and are interpretable for system diagnostics.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Data-driven memory-dependent abstractions of dynamical systems via a Cantor-Kantorovich metric
Authors:
Adrien Banse,
Licio Romao,
Alessandro Abate,
Raphaël M. Jungers
Abstract:
Abstractions of dynamical systems enable their verification and the design of feedback controllers using simpler, usually discrete, models. In this paper, we propose a data-driven abstraction mechanism based on a novel metric between Markov models. Our approach is based purely on observing output labels of the underlying dynamics, thus opening the road for a fully data-driven approach to construct…
▽ More
Abstractions of dynamical systems enable their verification and the design of feedback controllers using simpler, usually discrete, models. In this paper, we propose a data-driven abstraction mechanism based on a novel metric between Markov models. Our approach is based purely on observing output labels of the underlying dynamics, thus opening the road for a fully data-driven approach to construct abstractions. Another feature of the proposed approach is the use of memory to better represent the dynamics in a given region of the state space. We show through numerical examples the usefulness of the proposed methodology.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Distributionally Robust Aggregation of Electric Vehicle Flexibility
Authors:
Karan Mukhi,
Chengrui Qu,
Pengcheng You,
Alessandro Abate
Abstract:
We address the problem of characterising the aggregate flexibility in populations of electric vehicles (EVs) with uncertain charging requirements. Building on previous results that provide exact characterisations of the aggregate flexibility in populations with known charging requirements, in this paper we extend the aggregation methods so that charging requirements are uncertain, but sampled from…
▽ More
We address the problem of characterising the aggregate flexibility in populations of electric vehicles (EVs) with uncertain charging requirements. Building on previous results that provide exact characterisations of the aggregate flexibility in populations with known charging requirements, in this paper we extend the aggregation methods so that charging requirements are uncertain, but sampled from a given distribution. In particular, we construct \textit{distributionally robust aggregate flexibility sets}, sets of aggregate charging profiles over which we can provide probabilistic guarantees that actual realised populations will be able to track. By leveraging measure concentration results that establish powerful finite sample guarantees, we are able to give tight bounds on these robust flexibility sets, even in low sample regimes that are well suited for aggregating small populations of EVs. We detail explicit methods of calculating these sets and provide numerical results that validate the theory developed here.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Authors:
David "davidad" Dalrymple,
Joar Skalse,
Yoshua Bengio,
Stuart Russell,
Max Tegmark,
Sanjit Seshia,
Steve Omohundro,
Christian Szegedy,
Ben Goldhaber,
Nora Ammann,
Alessandro Abate,
Joe Halpern,
Clark Barrett,
Ding Zhao,
Tan Zhi-Xuan,
Jeannette Wing,
Joshua Tenenbaum
Abstract:
Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these appro…
▽ More
Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. This is achieved by the interplay of three core components: a world model (which provides a mathematical description of how the AI system affects the outside world), a safety specification (which is a mathematical description of what effects are acceptable), and a verifier (which provides an auditable proof certificate that the AI satisfies the safety specification relative to the world model). We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them. We also argue for the necessity of this approach to AI safety, and for the inadequacy of the main alternative approaches.
△ Less
Submitted 17 May, 2024; v1 submitted 10 May, 2024;
originally announced May 2024.
-
Safe Reach Set Computation via Neural Barrier Certificates
Authors:
Alessandro Abate,
Sergiy Bogomolov,
Alec Edwards,
Kostiantyn Potomkin,
Sadegh Soudjani,
Paolo Zuliani
Abstract:
We present a novel technique for online safety verification of autonomous systems, which performs reachability analysis efficiently for both bounded and unbounded horizons by employing neural barrier certificates. Our approach uses barrier certificates given by parameterized neural networks that depend on a given initial set, unsafe sets, and time horizon. Such networks are trained efficiently off…
▽ More
We present a novel technique for online safety verification of autonomous systems, which performs reachability analysis efficiently for both bounded and unbounded horizons by employing neural barrier certificates. Our approach uses barrier certificates given by parameterized neural networks that depend on a given initial set, unsafe sets, and time horizon. Such networks are trained efficiently offline using system simulations sampled from regions of the state space. We then employ a meta-neural network to generalize the barrier certificates to state space regions that are outside the training set. These certificates are generated and validated online as sound over-approximations of the reachable states, thus either ensuring system safety or activating appropriate alternative actions in unsafe scenarios. We demonstrate our technique on case studies from linear models to nonlinear control-dependent models for online autonomous driving scenarios.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Data-driven Interval MDP for Robust Control Synthesis
Authors:
Rudi Coppola,
Andrea Peruffo,
Licio Romao,
Alessandro Abate,
Manuel Mazo Jr
Abstract:
The abstraction of dynamical systems is a powerful tool that enables the design of feedback controllers using a correct-by-design framework. We investigate a novel scheme to obtain data-driven abstractions of discrete-time stochastic processes in terms of richer discrete stochastic models, whose actions lead to nondeterministic transitions over the space of probability measures. The data-driven co…
▽ More
The abstraction of dynamical systems is a powerful tool that enables the design of feedback controllers using a correct-by-design framework. We investigate a novel scheme to obtain data-driven abstractions of discrete-time stochastic processes in terms of richer discrete stochastic models, whose actions lead to nondeterministic transitions over the space of probability measures. The data-driven component of the proposed methodology lies in the fact that we only assume samples from an unknown probability distribution. We also rely on the model of the underlying dynamics to build our abstraction through backward reachability computations. The nondeterminism in the probability space is captured by a collection of Markov Processes, and we identify how this model can improve upon existing abstraction techniques in terms of satisfying temporal properties, such as safety or reach-avoid. The connection between the discrete and the underlying dynamics is made formal through the use of the scenario approach theory. Numerical experiments illustrate the advantages and main limitations of the proposed techniques with respect to existing approaches.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Learning to Bid in Forward Electricity Markets Using a No-Regret Algorithm
Authors:
Arega Getaneh Abate,
Dorsa Majdi,
Jalal Kazempour,
Maryam Kamgarpour
Abstract:
It is a common practice in the current literature of electricity markets to use game-theoretic approaches for strategic price bidding. However, they generally rely on the assumption that the strategic bidders have prior knowledge of rival bids, either perfectly or with some uncertainty. This is not necessarily a realistic assumption. This paper takes a different approach by relaxing such an assump…
▽ More
It is a common practice in the current literature of electricity markets to use game-theoretic approaches for strategic price bidding. However, they generally rely on the assumption that the strategic bidders have prior knowledge of rival bids, either perfectly or with some uncertainty. This is not necessarily a realistic assumption. This paper takes a different approach by relaxing such an assumption and exploits a no-regret learning algorithm for repeated games. In particular, by using the \emph{a posteriori} information about rivals' bids, a learner can implement a no-regret algorithm to optimize her/his decision making. Given this information, we utilize a multiplicative weight-update algorithm, adapting bidding strategies over multiple rounds of an auction to minimize her/his regret. Our numerical results show that when the proposed learning approach is used the social cost and the market-clearing prices can be higher than those corresponding to the classical game-theoretic approaches. The takeaway for market regulators is that electricity markets might be exposed to greater market power of suppliers than what classical analysis shows.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
A Stability-Based Abstraction Framework for Reach-Avoid Control of Stochastic Dynamical Systems with Unknown Noise Distributions
Authors:
Thom Badings,
Licio Romao,
Alessandro Abate,
Nils Jansen
Abstract:
Finite-state abstractions are widely studied for the automated synthesis of correct-by-construction controllers for stochastic dynamical systems. However, existing abstraction methods often lead to prohibitively large finite-state models. To address this issue, we propose a novel abstraction scheme for stochastic linear systems that exploits the system's stability to obtain significantly smaller a…
▽ More
Finite-state abstractions are widely studied for the automated synthesis of correct-by-construction controllers for stochastic dynamical systems. However, existing abstraction methods often lead to prohibitively large finite-state models. To address this issue, we propose a novel abstraction scheme for stochastic linear systems that exploits the system's stability to obtain significantly smaller abstract models. As a unique feature, we first stabilize the open-loop dynamics using a linear feedback gain. We then use a model-based approach to abstract a known part of the stabilized dynamics while using a data-driven method to account for the stochastic uncertainty. We formalize abstractions as Markov decision processes (MDPs) with intervals of transition probabilities. By stabilizing the dynamics, we can further constrain the control input modeled in the abstraction, which leads to smaller abstract models while retaining the correctness of controllers. Moreover, when the stabilizing feedback controller is aligned with the property of interest, then a good trade-off is achieved between the reduction in the abstraction size and the performance loss. The experiments show that our approach can reduce the size of the graph of abstractions by up to 90% with negligible performance loss.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
An International and Multidisciplinary Teaching Experience with Real Industrial Team Project Development
Authors:
Martin Mellado,
Eduardo Vendrell,
Filomena Ferrucci,
Andrea Abate,
Detlef Zuhlke,
Bernard Riera
Abstract:
This paper presents the design, objectives, experiences, and results of an international cooperation project funded by the European Commission in the context of the Erasmus Intensive Programme (IP, for short) designed to improve students' curricula. An IP is a short programme of study (minimum 2 weeks) that brings together university students and staff from at least three countries in order to enc…
▽ More
This paper presents the design, objectives, experiences, and results of an international cooperation project funded by the European Commission in the context of the Erasmus Intensive Programme (IP, for short) designed to improve students' curricula. An IP is a short programme of study (minimum 2 weeks) that brings together university students and staff from at least three countries in order to encourage efficient and multinational teaching of specialist topics, which might otherwise not be taught at all. This project lasted for 6 years, covering two different editions, each one with three year duration. This project lasted for 6 years, covering two different editions, each one with three year duration. The first edition, named SAVRO (Simulation and Virtual Reality in Robotics for Industrial Assembly Processes) was held in the period 2008-2010, with the participation of three Universities, namely the Universitat Politecnica de Valencia (Spain), acting as IP coordinator, the Technische Universitat Kaiserslautern (Germany), and the Universita degli Studi di Salerno (Italy). The Universite de Reims Champagne-Ardenne (France) participated as a new partner in the subsequent edition (2011-2013) of the IP, renamed as HUMAIN (Human-Machine Interaction). Both editions of the teaching project were characterized by the same objectives and organizational aspects, aiming to provide educational initiatives based on active teaching through collaborative works between international institutions, involving industrial partners too. The aim of the paper is to illustrate the best practices that characterized the organization of our experience as well as to present some general recommendations and suggestions on how to devise computing academic curricula.
△ Less
Submitted 17 February, 2024;
originally announced March 2024.
-
Enhanced Electron Extraction in Co-Doped TiO2 Quantified by Drift-Diffusion Simulation for Stable CsPbI3 Solar Cells
Authors:
Thomas W. Gries,
Davide Regaldo,
Hans Koebler,
Titan Noor Hartono Putri,
Gennaro V. Sannino,
Emilio Gutierrez Partida,
Roberto Felix,
Elif Huesam,
Ahmed Saleh,
Regan G. Wilks,
Zafar Iqbal,
Zahra Loghman Nia,
Florian Ruske,
Martin Stolterfoht,
Dieter Neher,
Marcus Baer,
Stefan A. Weber,
Paola Delli Veneri,
Philip Schulz,
Jean-Baptiste Puel,
Jean-Paul Kleider,
Qiong Wang,
Eva Unger,
Artem Musiienko,
Antonio Abate
Abstract:
Solar cells based on inorganic perovskite CsPbI3 are promising candidates to resolve the challenge of operational stability in the field of perovskite photovoltaics. For stable operation, however, it is crucial to thoroughly understand the extractive and recombinative processes occurring at the interfaces of perovskite and the charge-selective layers. In this study, we focus on the electronic prop…
▽ More
Solar cells based on inorganic perovskite CsPbI3 are promising candidates to resolve the challenge of operational stability in the field of perovskite photovoltaics. For stable operation, however, it is crucial to thoroughly understand the extractive and recombinative processes occurring at the interfaces of perovskite and the charge-selective layers. In this study, we focus on the electronic properties of (doped) TiO2 as an electron-selective contact. We show via KPFM that co-do** of TiO2 with Nb(V) and Sn(IV) reduces the materials work function by 270 meV, giving it stronger n-type characteristics compared to Nb(V) mono-doped TiO2. The altered electronic alignment with CsPbI3 translates to enhanced electron extraction, as demonstrated with ssPL, trPL and trSPV in triad. Importantly, we extract crucial parameters, such as the concentration of extracted electrons and the interface hole recombination velocity, from the SPV transients via 2D drift-diffusion simulations. When implementing the co-doped TiO2 into full n-i-p solar cells, the operational stability is enhanced to 32000 h of projected TS80 lifetime. This study provides fundamental understanding of interfacial charge extraction and its correlation with operational stability of perovskite solar cells, which can be transferred to other charge-selective contacts.
△ Less
Submitted 24 April, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification
Authors:
Joar Skalse,
Alessandro Abate
Abstract:
Inverse reinforcement learning (IRL) aims to infer an agent's preferences (represented as a reward function $R$) from their behaviour (represented as a policy $π$). To do this, we need a behavioural model of how $π$ relates to $R$. In the current literature, the most common behavioural models are optimality, Boltzmann-rationality, and causal entropy maximisation. However, the true relationship bet…
▽ More
Inverse reinforcement learning (IRL) aims to infer an agent's preferences (represented as a reward function $R$) from their behaviour (represented as a policy $π$). To do this, we need a behavioural model of how $π$ relates to $R$. In the current literature, the most common behavioural models are optimality, Boltzmann-rationality, and causal entropy maximisation. However, the true relationship between a human's preferences and their behaviour is much more complex than any of these behavioural models. This means that the behavioural models are misspecified, which raises the concern that they may lead to systematic errors if applied to real data. In this paper, we analyse how sensitive the IRL problem is to misspecification of the behavioural model. Specifically, we provide necessary and sufficient conditions that completely characterise how the observed data may differ from the assumed behavioural model without incurring an error above a given threshold. In addition to this, we also characterise the conditions under which a behavioural model is robust to small perturbations of the observed policy, and we analyse how robust many behavioural models are to misspecification of their parameter values (such as e.g.\ the discount rate). Our analysis suggests that the IRL problem is highly sensitive to misspecification, in the sense that very mild misspecification can lead to very large errors in the inferred reward function.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Distributed Markov Chain Monte Carlo Sampling based on the Alternating Direction Method of Multipliers
Authors:
Alexandros E. Tzikas,
Licio Romao,
Mert Pilanci,
Alessandro Abate,
Mykel J. Kochenderfer
Abstract:
Many machine learning applications require operating on a spatially distributed dataset. Despite technological advances, privacy considerations and communication constraints may prevent gathering the entire dataset in a central unit. In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers, which is commonly used in the optimization literatur…
▽ More
Many machine learning applications require operating on a spatially distributed dataset. Despite technological advances, privacy considerations and communication constraints may prevent gathering the entire dataset in a central unit. In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers, which is commonly used in the optimization literature due to its fast convergence. In contrast to distributed optimization, distributed sampling allows for uncertainty quantification in Bayesian inference tasks. We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art. For our theoretical results, we use convex optimization tools to establish a fundamental inequality on the generated local sample iterates. This inequality enables us to show convergence of the distribution associated with these iterates to the underlying target distribution in Wasserstein distance. In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
On the Limitations of Markovian Rewards to Express Multi-Objective, Risk-Sensitive, and Modal Tasks
Authors:
Joar Skalse,
Alessandro Abate
Abstract:
In this paper, we study the expressivity of scalar, Markovian reward functions in Reinforcement Learning (RL), and identify several limitations to what they can express. Specifically, we look at three classes of RL tasks; multi-objective RL, risk-sensitive RL, and modal RL. For each class, we derive necessary and sufficient conditions that describe when a problem in this class can be expressed usi…
▽ More
In this paper, we study the expressivity of scalar, Markovian reward functions in Reinforcement Learning (RL), and identify several limitations to what they can express. Specifically, we look at three classes of RL tasks; multi-objective RL, risk-sensitive RL, and modal RL. For each class, we derive necessary and sufficient conditions that describe when a problem in this class can be expressed using a scalar, Markovian reward. Moreover, we find that scalar, Markovian rewards are unable to express most of the instances in each of these three classes. We thereby contribute to a more complete understanding of what standard reward functions can and cannot express. In addition to this, we also call attention to modal problems as a new class of problems, since they have so far not been given any systematic treatment in the RL literature. We also briefly outline some approaches for solving some of the problems we discuss, by means of bespoke RL algorithms.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
Policy Evaluation in Distributional LQR (Extended Version)
Authors:
Zifan Wang,
Yulong Gao,
Siyi Wang,
Michael M. Zavlanos,
Alessandro Abate,
Karl H. Johansson
Abstract:
Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard reinforcement learning. Meanwhile, a challenge in DRL is that the policy evaluation typically relies on the representation of the return distribution, which needs to be c…
▽ More
Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard reinforcement learning. Meanwhile, a challenge in DRL is that the policy evaluation typically relies on the representation of the return distribution, which needs to be carefully designed. In this paper, we address this challenge for the special class of DRL problems that rely on a discounted linear quadratic regulator (LQR), which we call \emph{distributional LQR}. Specifically, we provide a closed-form expression for the distribution of the random return, which is applicable for all types of exogenous disturbance as long as it is independent and identically distributed (i.i.d.). We show that the variance of the random return is bounded if the fourth moment of the exogenous disturbance is bounded. Furthermore, we investigate the sensitivity of the return distribution to model perturbations. While the proposed exact return distribution consists of infinitely many random variables, we show that this distribution can be well approximated by a finite number of random variables. The associated approximation error can be analytically bounded under mild assumptions. When the model is unknown, we propose a model-free approach for estimating the return distribution, supported by sample complexity guarantees. Finally, we extend our approach to partially observable linear systems. Numerical experiments are provided to illustrate the theoretical results.
△ Less
Submitted 23 March, 2024; v1 submitted 28 November, 2023;
originally announced January 2024.
-
Safeguarded Progress in Reinforcement Learning: Safe Bayesian Exploration for Control Policy Synthesis
Authors:
Rohan Mitta,
Hosein Hasanbeig,
Jun Wang,
Daniel Kroening,
Yiannis Kantaros,
Alessandro Abate
Abstract:
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL), such that the safety constraint violations are bounded at any point during learning. In a variety of RL applications the safety of the agent is particularly important, e.g. autonomous platforms or robots that work in proximity of humans. As enforcing safety during training might severely limit th…
▽ More
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL), such that the safety constraint violations are bounded at any point during learning. In a variety of RL applications the safety of the agent is particularly important, e.g. autonomous platforms or robots that work in proximity of humans. As enforcing safety during training might severely limit the agent's exploration, we propose here a new architecture that handles the trade-off between efficient progress and safety during exploration. As the exploration progresses, we update via Bayesian inference Dirichlet-Categorical models of the transition probabilities of the Markov decision process that describes the environment dynamics. This paper proposes a way to approximate moments of belief about the risk associated to the action selection policy. We construct those approximations, and prove the convergence results. We propose a novel method for leveraging the expectation approximations to derive an approximate bound on the confidence that the risk is below a certain level. This approach can be easily interleaved with RL and we present experimental results to showcase the performance of the overall architecture.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Stability Analysis of Switched Linear Systems with Neural Lyapunov Functions
Authors:
Virginie Debauche,
Alec Edwards,
Raphael M. Jungers,
Alessandro Abate
Abstract:
Neural-based, data-driven analysis and control of dynamical systems have been recently investigated and have shown great promise, e.g. for safety verification or stability analysis. Indeed, not only do neural networks allow for an entirely model-free, data-driven approach, but also for handling arbitrary complex functions via their power of representation (as opposed to, e.g. algebraic optimizatio…
▽ More
Neural-based, data-driven analysis and control of dynamical systems have been recently investigated and have shown great promise, e.g. for safety verification or stability analysis. Indeed, not only do neural networks allow for an entirely model-free, data-driven approach, but also for handling arbitrary complex functions via their power of representation (as opposed to, e.g. algebraic optimization techniques that are restricted to polynomial functions). Whilst classical Lyapunov techniques allow to provide a formal and robust guarantee of stability of a switched dynamical system, very little is yet known about correctness guarantees for Neural Lyapunov functions, nor about their performance (amount of data needed for a certain accuracy). We thus formally introduce neural Lyapunov functions for the stability analysis of switched linear systems: we benchmark them on this paradigmatic problem, which is notoriously difficult (and in general Turing-undecidable), but which admits recently-developed technologies and theoretical results. Inspired by switched systems theory, we provide theoretical guarantees on the representative power of neural networks, leveraging recent results from the ML community. We additionally experimentally display how neural Lyapunov functions compete with state-of-the-art results and techniques, while admitting a wide range of improvement, both in theory and in practice. This study intends to improve our understanding of the opportunities and current limitations of neural-based data-driven analysis and control of complex dynamical systems.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Learning Robust Policies for Uncertain Parametric Markov Decision Processes
Authors:
Luke Rickard,
Alessandro Abate,
Kostas Margellos
Abstract:
Synthesising verifiably correct controllers for dynamical systems is crucial for safety-critical problems. To achieve this, it is important to account for uncertainty in a robust manner, while at the same time it is often of interest to avoid being overly conservative with the view of achieving a better cost. We propose a method for verifiably safe policy synthesis for a class of finite state mode…
▽ More
Synthesising verifiably correct controllers for dynamical systems is crucial for safety-critical problems. To achieve this, it is important to account for uncertainty in a robust manner, while at the same time it is often of interest to avoid being overly conservative with the view of achieving a better cost. We propose a method for verifiably safe policy synthesis for a class of finite state models, under the presence of structural uncertainty. In particular, we consider uncertain parametric Markov decision processes (upMDPs), a special class of Markov decision processes, with parameterised transition functions, where such parameters are drawn from a (potentially) unknown distribution. Our framework leverages recent advancements in the so-called scenario approach theory, where we represent the uncertainty by means of scenarios, and provide guarantees on synthesised policies satisfying probabilistic computation tree logic (PCTL) formulae. We consider several common benchmarks/problems and compare our work to recent developments for verifying upMDPs.
△ Less
Submitted 15 May, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
Fossil 2.0: Formal Certificate Synthesis for the Verification and Control of Dynamical Models
Authors:
Alec Edwards,
Andrea Peruffo,
Alessandro Abate
Abstract:
This paper presents Fossil 2.0, a new major release of a software tool for the synthesis of certificates (e.g., Lyapunov and barrier functions) for dynamical systems modelled as ordinary differential and difference equations. Fossil 2.0 is much improved from its original release, including new interfaces, a significantly expanded certificate portfolio, controller synthesis and enhanced extensibili…
▽ More
This paper presents Fossil 2.0, a new major release of a software tool for the synthesis of certificates (e.g., Lyapunov and barrier functions) for dynamical systems modelled as ordinary differential and difference equations. Fossil 2.0 is much improved from its original release, including new interfaces, a significantly expanded certificate portfolio, controller synthesis and enhanced extensibility. We present these new features as part of this tool paper. Fossil implements a counterexample-guided inductive synthesis (CEGIS) loop ensuring the soundness of the method. Our tool uses neural networks as templates to generate candidate functions, which are then formally proven by an SMT solver acting as an assertion verifier. Improvements with respect to the first release include a wider range of certificates, synthesis of control laws, and support for discrete-time models.
△ Less
Submitted 16 April, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Correct-by-Construction Control for Stochastic and Uncertain Dynamical Models via Formal Abstractions
Authors:
Thom Badings,
Nils Jansen,
Licio Romao,
Alessandro Abate
Abstract:
Automated synthesis of correct-by-construction controllers for autonomous systems is crucial for their deployment in safety-critical scenarios. Such autonomous systems are naturally modeled as stochastic dynamical models. The general problem is to compute a controller that provably satisfies a given task, represented as a probabilistic temporal logic specification. However, factors such as stochas…
▽ More
Automated synthesis of correct-by-construction controllers for autonomous systems is crucial for their deployment in safety-critical scenarios. Such autonomous systems are naturally modeled as stochastic dynamical models. The general problem is to compute a controller that provably satisfies a given task, represented as a probabilistic temporal logic specification. However, factors such as stochastic uncertainty, imprecisely known parameters, and hybrid features make this problem challenging. We have developed an abstraction framework that can be used to solve this problem under various modeling assumptions. Our approach is based on a robust finite-state abstraction of the stochastic dynamical model in the form of a Markov decision process with intervals of probabilities (iMDP). We use state-of-the-art verification techniques to compute an optimal policy on the iMDP with guarantees for satisfying the given specification. We then show that, by construction, we can refine this policy into a feedback controller for which these guarantees carry over to the dynamical model. In this short paper, we survey our recent research in this area and highlight two challenges (related to scalability and dealing with nonlinear dynamics) that we aim to address with our ongoing research.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Probabilistic Reach-Avoid for Bayesian Neural Networks
Authors:
Matthew Wicker,
Luca Laurenti,
Andrea Patane,
Nicola Paoletti,
Alessandro Abate,
Marta Kwiatkowska
Abstract:
Model-based reinforcement learning seeks to simultaneously learn the dynamics of an unknown stochastic environment and synthesise an optimal policy for acting in it. Ensuring the safety and robustness of sequential decisions made through a policy in such an environment is a key challenge for policies intended for safety-critical scenarios. In this work, we investigate two complementary problems: f…
▽ More
Model-based reinforcement learning seeks to simultaneously learn the dynamics of an unknown stochastic environment and synthesise an optimal policy for acting in it. Ensuring the safety and robustness of sequential decisions made through a policy in such an environment is a key challenge for policies intended for safety-critical scenarios. In this work, we investigate two complementary problems: first, computing reach-avoid probabilities for iterative predictions made with dynamical models, with dynamics described by Bayesian neural network (BNN); second, synthesising control policies that are optimal with respect to a given reach-avoid specification (reaching a "target" state, while avoiding a set of "unsafe" states) and a learned BNN model. Our solution leverages interval propagation and backward recursion techniques to compute lower bounds for the probability that a policy's sequence of actions leads to satisfying the reach-avoid specification. Such computed lower bounds provide safety certification for the given policy and BNN model. We then introduce control synthesis algorithms to derive policies maximizing said lower bounds on the safety probability. We demonstrate the effectiveness of our method on a series of control benchmarks characterized by learned BNN dynamics models. On our most challenging benchmark, compared to purely data-driven policies the optimal synthesis algorithm is able to provide more than a four-fold increase in the number of certifiable states and more than a three-fold increase in the average guaranteed reach-avoid probability.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
STARC: A General Framework For Quantifying Differences Between Reward Functions
Authors:
Joar Skalse,
Lucy Farnik,
Sumeet Ramesh Motwani,
Erik Jenner,
Adam Gleave,
Alessandro Abate
Abstract:
In order to solve a task using reinforcement learning, it is necessary to first formalise the goal of that task as a reward function. However, for many real-world tasks, it is very difficult to manually specify a reward function that never incentivises undesirable behaviour. As a result, it is increasingly popular to use \emph{reward learning algorithms}, which attempt to \emph{learn} a reward fun…
▽ More
In order to solve a task using reinforcement learning, it is necessary to first formalise the goal of that task as a reward function. However, for many real-world tasks, it is very difficult to manually specify a reward function that never incentivises undesirable behaviour. As a result, it is increasingly popular to use \emph{reward learning algorithms}, which attempt to \emph{learn} a reward function from data. However, the theoretical foundations of reward learning are not yet well-developed. In particular, it is typically not known when a given reward learning algorithm with high probability will learn a reward function that is safe to optimise. This means that reward learning algorithms generally must be evaluated empirically, which is expensive, and that their failure modes are difficult to anticipate in advance. One of the roadblocks to deriving better theoretical guarantees is the lack of good methods for quantifying the difference between reward functions. In this paper we provide a solution to this problem, in the form of a class of pseudometrics on the space of all reward functions that we call STARC (STAndardised Reward Comparison) metrics. We show that STARC metrics induce both an upper and a lower bound on worst-case regret, which implies that our metrics are tight, and that any metric with the same properties must be bilipschitz equivalent to ours. Moreover, we also identify a number of issues with reward metrics proposed by earlier works. Finally, we evaluate our metrics empirically, to demonstrate their practical efficacy. STARC metrics can be used to make both theoretical and empirical analysis of reward learning algorithms both easier and more principled.
△ Less
Submitted 11 March, 2024; v1 submitted 26 September, 2023;
originally announced September 2023.
-
A General Verification Framework for Dynamical and Control Models via Certificate Synthesis
Authors:
Alec Edwards,
Andrea Peruffo,
Alessandro Abate
Abstract:
An emerging branch of control theory specialises in certificate learning, concerning the specification of a desired (possibly complex) system behaviour for an autonomous or control model, which is then analytically verified by means of a function-based proof. However, the synthesis of controllers abiding by these complex requirements is in general a non-trivial task and may elude the most expert c…
▽ More
An emerging branch of control theory specialises in certificate learning, concerning the specification of a desired (possibly complex) system behaviour for an autonomous or control model, which is then analytically verified by means of a function-based proof. However, the synthesis of controllers abiding by these complex requirements is in general a non-trivial task and may elude the most expert control engineers. This results in a need for automatic techniques that are able to design controllers and to analyse a wide range of elaborate specifications. In this paper, we provide a general framework to encode system specifications and define corresponding certificates, and we present an automated approach to formally synthesise controllers and certificates. Our approach contributes to the broad field of safe learning for control, exploiting the flexibility of neural networks to provide candidate control and certificate functions, whilst using SMT-solvers to offer a formal guarantee of correctness. We test our framework by develo** a prototype software tool, and assess its efficacy at verification via control and certificate synthesis over a large and varied suite of benchmarks.
△ Less
Submitted 1 July, 2024; v1 submitted 12 September, 2023;
originally announced September 2023.
-
Photovoltaic potential of tin perovskites revealed through layer-by-layer investigation of optoelectronic and charge transport properties
Authors:
Mahmoud H. Aldamasy,
Artem Musiienko,
Marin Rusu,
Davide Regaldo,
Shengnan Zho,
Hannes Hampel,
Chiara Frasca,
Zafar Iqbal,
Thomas W. Gries,
Guixiang Li,
Ece Aktas,
Giuseppe Nasti,
Meng Li,
Jorge Pascual,
Noor Titan Putri Hartono,
Qiong Wang,
Thomas Unold,
Antonio Abate
Abstract:
Tin perovskites are the most promising environmentally friendly alternative to lead perovskites. Among tin perovskites, FASnI3 (CH4N2SnI3) shows optimum band gap, and easy processability. However, the performance of FASnI3 based solar cells is incomparable to lead perovskites for several reasons, including energy band mismatch between the perovskite absorber film and the charge transporting layers…
▽ More
Tin perovskites are the most promising environmentally friendly alternative to lead perovskites. Among tin perovskites, FASnI3 (CH4N2SnI3) shows optimum band gap, and easy processability. However, the performance of FASnI3 based solar cells is incomparable to lead perovskites for several reasons, including energy band mismatch between the perovskite absorber film and the charge transporting layers (CTLs) for both types of carriers, i.e., for electrons (ETLs) and holes (HTLs). However, the band diagrams in the literature are inconsistent, and the charge extraction dynamics are poorly understood. In this paper, we study the energy band positions of FASnI3 based perovskites using Kelvin probe (KP) and photoelectron yield spectroscopy (PYS) to provide a precise band diagram of the most used device stack. In addition, we analyze the defects within the current energetic landscape of tin perovskites. We uncover the role of bathocuproine (BCP) in enhancing the electron extraction at the fullerene C60/BCP interface. Furthermore, we used transient surface photovoltage (tr-SPV) for the first time for tin perovskites to understand the charge extraction dynamics of the most reported HTLs such as NiOx and PEDOT, and ETLs such as C60, ICBA, and PCBM. Finally, we used Hall effect, KP, and time-resolved photoluminescence (TRPL) to estimate an accurate value of the p-do** concentration in FASnI3 and showed a consistent result of 1.5 * 1017 cm-3. Our findings prove that the energetic system of tin halide perovskites is deformed and should be redesigned independently from lead perovskites to unlock the full potential of tin perovskites.
△ Less
Submitted 7 October, 2023; v1 submitted 11 September, 2023;
originally announced September 2023.
-
Formal Analysis and Verification of Max-Plus Linear Systems
Authors:
Muhammad Syifa'ul Mufid,
Andrea Micheli,
Alessandro Abate,
Alessandro Cimatti
Abstract:
Max-Plus Linear (MPL) systems are an algebraic formalism with practical applications in transportation networks, manufacturing and biological systems. In this paper, we investigate the problem of automatically analyzing the properties of MPL, taking into account both structural properties such as transient and cyclicity, and the open problem of user-defined temporal properties. We propose Time-Dif…
▽ More
Max-Plus Linear (MPL) systems are an algebraic formalism with practical applications in transportation networks, manufacturing and biological systems. In this paper, we investigate the problem of automatically analyzing the properties of MPL, taking into account both structural properties such as transient and cyclicity, and the open problem of user-defined temporal properties. We propose Time-Difference LTL (TDLTL), a logic that encompasses the delays between the discrete time events governed by an MPL system, and characterize the problem of model checking TDLTL over MPL. We first consider a framework based on the verification of infinite-state transition systems, and propose an approach based on an encoding into model checking. Then, we leverage the specific features of MPL systems to devise a highly optimized, combinational approach based on Satisfiability Modulo Theory (SMT). We experimentally evaluate the features of the proposed approaches on a large set of benchmarks. The results show that the proposed approach substantially outperforms the state of the art competitors in expressiveness and effectiveness, and demonstrate the superiority of the combinational approach over the reduction to model checking.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
On the Trade-off Between Efficiency and Precision of Neural Abstraction
Authors:
Alec Edwards,
Mirco Giacobbe,
Alessandro Abate
Abstract:
Neural abstractions have been recently introduced as formal approximations of complex, nonlinear dynamical models. They comprise a neural ODE and a certified upper bound on the error between the abstract neural network and the concrete dynamical model. So far neural abstractions have exclusively been obtained as neural networks consisting entirely of $ReLU$ activation functions, resulting in neura…
▽ More
Neural abstractions have been recently introduced as formal approximations of complex, nonlinear dynamical models. They comprise a neural ODE and a certified upper bound on the error between the abstract neural network and the concrete dynamical model. So far neural abstractions have exclusively been obtained as neural networks consisting entirely of $ReLU$ activation functions, resulting in neural ODE models that have piecewise affine dynamics, and which can be equivalently interpreted as linear hybrid automata. In this work, we observe that the utility of an abstraction depends on its use: some scenarios might require coarse abstractions that are easier to analyse, whereas others might require more complex, refined abstractions. We therefore consider neural abstractions of alternative shapes, namely either piecewise constant or nonlinear non-polynomial (specifically, obtained via sigmoidal activations). We employ formal inductive synthesis procedures to generate neural abstractions that result in dynamical models with these semantics. Empirically, we demonstrate the trade-off that these different neural abstraction templates have vis-a-vis their precision and synthesis time, as well as the time required for their safety verification (done via reachability computation). We improve existing synthesis techniques to enable abstraction of higher-dimensional models, and additionally discuss the abstraction of complex neural ODEs to improve the efficiency of reachability analysis for these models.
△ Less
Submitted 2 October, 2023; v1 submitted 28 July, 2023;
originally announced July 2023.
-
Interface Modification for Energy Levels Alignment and Charge Extraction in CsPbI$_3$ Perovskite Solar Cells
Authors:
Zafar Iqbal,
Fengshuo Zu,
Artem Musiienko,
Emilio Gutierrez Partida,
Hans Kobler,
Thomas W. Gries,
Gennaro V. Sannino,
Laura Canil,
Norbert Koch,
Martin Stolterfoht,
Dieter Neher,
Michele Pavone,
Ana Belen Munoz-Garcia,
Antonio Abate,
Qiong Wang
Abstract:
In perovskite solar cells (PSCs) energy levels alignment and charge extraction at the interfaces are the essential factors directly affecting the device performance. In this work, we present a modified interface between all-inorganic CsPbI$_3$ perovskite and its hole selective contact (Spiro-OMeTAD), realized by a dipole molecule trioctylphosphine oxide (TOPO), to align the energy levels. On a pas…
▽ More
In perovskite solar cells (PSCs) energy levels alignment and charge extraction at the interfaces are the essential factors directly affecting the device performance. In this work, we present a modified interface between all-inorganic CsPbI$_3$ perovskite and its hole selective contact (Spiro-OMeTAD), realized by a dipole molecule trioctylphosphine oxide (TOPO), to align the energy levels. On a passivated perovskite film, by n-Octyl ammonium Iodide (OAI), we created an upward surface band-bending at the interface by TOPO treatment. This improved interface by the dipole molecule induces a better energy level alignment and enhances the charge extraction of holes from the perovskite layer to the hole transport material. Consequently, a Voc of 1.2 V and high-power conversion efficiency (PCE) of over 19% were achieved for inorganic CsPbI$_3$ perovskite solar cells. Further, to demonstrate the effect of the TOPO dipole molecule, we present a layer-by-layer charge extraction study by transient surface photovoltage technique (trSPV) accomplished by charge transport simulation.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
On Imperfect Recall in Multi-Agent Influence Diagrams
Authors:
James Fox,
Matt MacDermott,
Lewis Hammond,
Paul Harrenstein,
Alessandro Abate,
Michael Wooldridge
Abstract:
Multi-agent influence diagrams (MAIDs) are a popular game-theoretic model based on Bayesian networks. In some settings, MAIDs offer significant advantages over extensive-form game representations. Previous work on MAIDs has assumed that agents employ behavioural policies, which set independent conditional probability distributions over actions for each of their decisions. In settings with imperfec…
▽ More
Multi-agent influence diagrams (MAIDs) are a popular game-theoretic model based on Bayesian networks. In some settings, MAIDs offer significant advantages over extensive-form game representations. Previous work on MAIDs has assumed that agents employ behavioural policies, which set independent conditional probability distributions over actions for each of their decisions. In settings with imperfect recall, however, a Nash equilibrium in behavioural policies may not exist. We overcome this by showing how to solve MAIDs with forgetful and absent-minded agents using mixed policies and two types of correlated equilibrium. We also analyse the computational complexity of key decision problems in MAIDs, and explore tractable cases. Finally, we describe applications of MAIDs to Markov games and team situations, where imperfect recall is often unavoidable.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
An Exact Characterisation of Flexibility in Populations of Electric Vehicles
Authors:
Karan Mukhi,
Alessandro Abate
Abstract:
Increasing penetrations of electric vehicles (EVs) presents a large source of flexibility, which can be used to assist balancing the power grid. The flexibility of an individual EV can be quantified as a convex polytope and the flexibility of a population of EVs is the Minkowski sum of these polytopes. In general computing the exact Minkowski sum is intractable. However, exploiting symmetry in a r…
▽ More
Increasing penetrations of electric vehicles (EVs) presents a large source of flexibility, which can be used to assist balancing the power grid. The flexibility of an individual EV can be quantified as a convex polytope and the flexibility of a population of EVs is the Minkowski sum of these polytopes. In general computing the exact Minkowski sum is intractable. However, exploiting symmetry in a restricted but significant case, enables an efficient computation of the aggregate flexibility. This results in a polytope with exponentially many vertices and facets with respect to the time horizon. We show how to use a lifting procedure to provide a representation of this polytope with a reduced number of facets, which makes optimising over more tractable. Finally, a disaggregation procedure that takes an aggregate signal and computes dispatch instructions for each EV in the population is presented. The complexity of the algorithms presented is independent of the size of the population and polynomial in the length of the time horizon. We evaluate this work against existing methods in the literature, and show how this method guarantees optimality with lower computational burden than existing methods.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Networked Communication for Decentralised Agents in Mean-Field Games
Authors:
Patrick Benjamin,
Alessandro Abate
Abstract:
We introduce networked communication to the mean-field game framework, in particular to oracle-free settings where $N$ decentralised agents learn along a single, non-episodic run of the empirical system. We prove that our architecture, with only a few reasonable assumptions about network structure, has sample guarantees bounded between those of the centralised- and independent-learning cases. We d…
▽ More
We introduce networked communication to the mean-field game framework, in particular to oracle-free settings where $N$ decentralised agents learn along a single, non-episodic run of the empirical system. We prove that our architecture, with only a few reasonable assumptions about network structure, has sample guarantees bounded between those of the centralised- and independent-learning cases. We discuss how the sample guarantees of the three theoretical algorithms do not actually result in practical convergence. We therefore show that in practical settings where the theoretical parameters are not observed (leading to poor estimation of the Q-function), our communication scheme significantly accelerates convergence over the independent case (and often even the centralised case), without relying on the assumption of a centralised learner. We contribute further practical enhancements to all three theoretical algorithms, allowing us to present their first empirical demonstrations. Our experiments confirm that we can remove several of the theoretical assumptions of the algorithms, and display the empirical convergence benefits brought by our new networked communication. We additionally show that the networked approach has significant advantages, over both the centralised and independent alternatives, in terms of robustness to unexpected learning failures and to changes in population size.
△ Less
Submitted 28 June, 2024; v1 submitted 5 June, 2023;
originally announced June 2023.
-
Abstracting Linear Stochastic Systems via Knowledge Filtering
Authors:
Maico Hendrikus Wilhelmus Engelaar,
Licio Romao,
Yulong Gao,
Mircea Lazar,
Alessandro Abate,
Sofie Haesaert
Abstract:
In this paper, we propose a new model reduction technique for linear stochastic systems that builds upon knowledge filtering and utilizes optimal Kalman filtering techniques. This new technique will reduce the dimension of the noise disturbance and will allow any controller designed for the reduced model to be refined into a controller for the original stochastic system, while preserving any speci…
▽ More
In this paper, we propose a new model reduction technique for linear stochastic systems that builds upon knowledge filtering and utilizes optimal Kalman filtering techniques. This new technique will reduce the dimension of the noise disturbance and will allow any controller designed for the reduced model to be refined into a controller for the original stochastic system, while preserving any specification on the output. Although initially the reduced model will be time-varying, a method will be provided with which the reduced model can become time-invariant if it satisfies some minor technical conditions. We present our theoretical findings with an example that supports the proposed framework and illustrates how model reduction and controller refinement of stochastic systems can be achieved. We finish the paper by considering specific examples to analyze both completeness with respect to controller synthesis and model order reduction with respect to the state.
△ Less
Submitted 15 September, 2023; v1 submitted 12 April, 2023;
originally announced April 2023.
-
Learning-based Rigid Tube Model Predictive Control
Authors:
Yulong Gao,
Shuhao Yan,
Jian Zhou,
Mark Cannon,
Alessandro Abate,
Karl H. Johansson
Abstract:
This paper is concerned with model predictive control (MPC) of discrete-time linear systems subject to bounded additive disturbance and mixed constraints on the state and input, whereas the true disturbance set is unknown. Unlike most existing work on robust MPC, we propose an algorithm incorporating online learning that builds on prior knowledge of the disturbance, i.e., a known but conservative…
▽ More
This paper is concerned with model predictive control (MPC) of discrete-time linear systems subject to bounded additive disturbance and mixed constraints on the state and input, whereas the true disturbance set is unknown. Unlike most existing work on robust MPC, we propose an algorithm incorporating online learning that builds on prior knowledge of the disturbance, i.e., a known but conservative disturbance set. We approximate the true disturbance set at each time step with a parameterised set, which is referred to as a quantified disturbance set, using disturbance realisations. A key novelty is that the parameterisation of these quantified disturbance sets enjoys desirable properties such that the quantified disturbance set and its corresponding rigid tube bounding disturbance propagation can be efficiently updated online. We provide statistical gaps between the true and quantified disturbance sets, based on which, probabilistic recursive feasibility of MPC optimisation problems is discussed. Numerical simulations are provided to demonstrate the effectiveness of our proposed algorithm and compare with conventional robust MPC algorithms.
△ Less
Submitted 21 May, 2024; v1 submitted 11 April, 2023;
originally announced April 2023.
-
Inner approximations of stochastic programs for data-driven stochastic barrier function design
Authors:
Frederik Baymler Mathiesen,
Licio Romao,
Simeon C. Calvert,
Alessandro Abate,
Luca Laurenti
Abstract:
This paper proposes a new framework to compute finite-horizon safety guarantees for discrete-time piece-wise affine systems with stochastic noise of unknown distributions. The approach is based on a novel approach to synthesise a stochastic barrier function (SBF) from noisy data and rely on the scenario optimization theory. In particular, we show that the stochastic program to synthesize a SBF can…
▽ More
This paper proposes a new framework to compute finite-horizon safety guarantees for discrete-time piece-wise affine systems with stochastic noise of unknown distributions. The approach is based on a novel approach to synthesise a stochastic barrier function (SBF) from noisy data and rely on the scenario optimization theory. In particular, we show that the stochastic program to synthesize a SBF can be relaxed into a chance-constrained optimisation problem on which scenario approach theory applies. We further show that the resulting program can be reduced to a linear programming problem, thus guaranteeing efficiency. In contrast to existing approaches, this method is data efficient as it only requires the number of data to be proportional to the logarithm in the negative inverse of the confidence level and is computationally efficient due to its reduction to linear programming. The efficacy of the method is empirically evaluated on various verification benchmarks. Experiments show a significant improvement with respect to state-of-the-art, obtaining tighter certificates with a confidence that is several orders of magnitude higher.
△ Less
Submitted 10 September, 2023; v1 submitted 10 April, 2023;
originally announced April 2023.
-
Distributionally Robust Optimal and Safe Control of Stochastic Systems via Kernel Conditional Mean Embedding
Authors:
Licio Romao,
Ashish R. Hota,
Alessandro Abate
Abstract:
We present a novel distributionally robust framework for dynamic programming that uses kernel methods to design feedback control policies. Specifically, we leverage kernel mean embedding to map the transition probabilities governing the state evolution into an associated repreducing kernel Hilbert space. Our key idea lies in combining conditional mean embedding with the maximum mean discrepancy di…
▽ More
We present a novel distributionally robust framework for dynamic programming that uses kernel methods to design feedback control policies. Specifically, we leverage kernel mean embedding to map the transition probabilities governing the state evolution into an associated repreducing kernel Hilbert space. Our key idea lies in combining conditional mean embedding with the maximum mean discrepancy distance to construct an ambiguity set, and then design a robust control policy using techniques from distributionally robust optimization. The main theoretical contribution of this paper is to leverage functional analytic tools to prove that optimal policies for this infinite-dimensional min-max problem are Markovian and deterministic. Additionally, we discuss approximation schemes based on state and input discretization to make the approach computationally tractable. To validate the theoretical findings, we conduct an experiment on safe control for thermostatically controlled loads (TCL).
△ Less
Submitted 18 December, 2023; v1 submitted 2 April, 2023;
originally announced April 2023.
-
Data-driven abstractions via adaptive refinements and a Kantorovich metric [extended version]
Authors:
Adrien Banse,
Licio Romao,
Alessandro Abate,
Raphaël M. Jungers
Abstract:
We introduce an adaptive refinement procedure for smart, and scalable abstraction of dynamical systems. Our technique relies on partitioning the state space depending on the observation of future outputs. However, this knowledge is dynamically constructed in an adaptive, asymmetric way. In order to learn the optimal structure, we define a Kantorovich-inspired metric between Markov chains, and we u…
▽ More
We introduce an adaptive refinement procedure for smart, and scalable abstraction of dynamical systems. Our technique relies on partitioning the state space depending on the observation of future outputs. However, this knowledge is dynamically constructed in an adaptive, asymmetric way. In order to learn the optimal structure, we define a Kantorovich-inspired metric between Markov chains, and we use it as a loss function. Our technique is prone to data-driven frameworks, but not restricted to.
We also study properties of the above mentioned metric between Markov chains, which we believe could be of application for wider purpose. We propose an algorithm to approximate it, and we show that our method yields a much better computational complexity than using classical linear programming techniques.
△ Less
Submitted 30 October, 2023; v1 submitted 30 March, 2023;
originally announced March 2023.
-
Policy Evaluation in Distributional LQR
Authors:
Zifan Wang,
Yulong Gao,
Siyi Wang,
Michael M. Zavlanos,
Alessandro Abate,
Karl H. Johansson
Abstract:
Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard RL. At the same time, a main challenge in DRL is that policy evaluation in DRL typically relies on the representation of the return distribution, which needs to be carefu…
▽ More
Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard RL. At the same time, a main challenge in DRL is that policy evaluation in DRL typically relies on the representation of the return distribution, which needs to be carefully designed. In this paper, we address this challenge for a special class of DRL problems that rely on linear quadratic regulator (LQR) for control, advocating for a new distributional approach to LQR, which we call \emph{distributional LQR}. Specifically, we provide a closed-form expression of the distribution of the random return which, remarkably, is applicable to all exogenous disturbances on the dynamics, as long as they are independent and identically distributed (i.i.d.). While the proposed exact return distribution consists of infinitely many random variables, we show that this distribution can be approximated by a finite number of random variables, and the associated approximation error can be analytically bounded under mild assumptions. Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR using the Conditional Value at Risk (CVaR) as a measure of risk. Numerical experiments are provided to illustrate our theoretical results.
△ Less
Submitted 23 March, 2023;
originally announced March 2023.
-
k-Prize Weighted Voting Games
Authors:
Wei-Chen Lee,
David Hyland,
Alessandro Abate,
Edith Elkind,
Jiarui Gan,
Julian Gutierrez,
Paul Harrenstein,
Michael Wooldridge
Abstract:
We introduce a natural variant of weighted voting games, which we refer to as k-Prize Weighted Voting Games. Such games consist of n players with weights, and k prizes, of possibly differing values. The players form coalitions, and the i-th largest coalition (by the sum of weights of its members) wins the i-th largest prize, which is then shared among its members. We present four solution concepts…
▽ More
We introduce a natural variant of weighted voting games, which we refer to as k-Prize Weighted Voting Games. Such games consist of n players with weights, and k prizes, of possibly differing values. The players form coalitions, and the i-th largest coalition (by the sum of weights of its members) wins the i-th largest prize, which is then shared among its members. We present four solution concepts to analyse the games in this class, and characterise the existence of stable outcomes in games with three players and two prizes, and in games with uniform prizes. We then explore the efficiency of stable outcomes in terms of Pareto optimality and utilitarian social welfare. Finally, we study the computational complexity of finding stable outcomes.
△ Less
Submitted 2 March, 2023; v1 submitted 27 February, 2023;
originally announced February 2023.
-
Neural Abstractions
Authors:
Alessandro Abate,
Alec Edwards,
Mirco Giacobbe
Abstract:
We present a novel method for the safety verification of nonlinear dynamical models that uses neural networks to represent abstractions of their dynamics. Neural networks have extensively been used before as approximators; in this work, we make a step further and use them for the first time as abstractions. For a given dynamical model, our method synthesises a neural network that overapproximates…
▽ More
We present a novel method for the safety verification of nonlinear dynamical models that uses neural networks to represent abstractions of their dynamics. Neural networks have extensively been used before as approximators; in this work, we make a step further and use them for the first time as abstractions. For a given dynamical model, our method synthesises a neural network that overapproximates its dynamics by ensuring an arbitrarily tight, formally certified bound on the approximation error. For this purpose, we employ a counterexample-guided inductive synthesis procedure. We show that this produces a neural ODE with non-deterministic disturbances that constitutes a formal abstraction of the concrete model under analysis. This guarantees a fundamental property: if the abstract model is safe, i.e., free from any initialised trajectory that reaches an undesirable state, then the concrete model is also safe. By using neural ODEs with ReLU activation functions as abstractions, we cast the safety verification problem for nonlinear dynamical models into that of hybrid automata with affine dynamics, which we verify using SpaceEx. We demonstrate that our approach performs comparably to the mature tool Flow* on existing benchmark nonlinear models. We additionally demonstrate and that it is effective on models that do not exhibit local Lipschitz continuity, which are out of reach to the existing technologies.
△ Less
Submitted 27 January, 2023;
originally announced January 2023.
-
Quantitative Verification with Neural Networks
Authors:
Alessandro Abate,
Alec Edwards,
Mirco Giacobbe,
Hashan Punchihewa,
Diptarko Roy
Abstract:
We present a data-driven approach to the quantitative verification of probabilistic programs and stochastic dynamical models. Our approach leverages neural networks to compute tight and sound bounds for the probability that a stochastic process hits a target condition within finite time. This problem subsumes a variety of quantitative verification questions, from the reachability and safety analys…
▽ More
We present a data-driven approach to the quantitative verification of probabilistic programs and stochastic dynamical models. Our approach leverages neural networks to compute tight and sound bounds for the probability that a stochastic process hits a target condition within finite time. This problem subsumes a variety of quantitative verification questions, from the reachability and safety analysis of discrete-time stochastic dynamical models, to the study of assertion-violation and termination analysis of probabilistic programs. We rely on neural networks to represent supermartingale certificates that yield such probability bounds, which we compute using a counterexample-guided inductive synthesis loop: we train the neural certificate while tightening the probability bound over samples of the state space using stochastic optimisation, and then we formally check the certificate's validity over every possible state using satisfiability modulo theories; if we receive a counterexample, we add it to our set of samples and repeat the loop until validity is confirmed. We demonstrate on a diverse set of benchmarks that, thanks to the expressive power of neural networks, our method yields smaller or comparable probability bounds than existing symbolic methods in all cases, and that our approach succeeds on models that are entirely beyond the reach of such alternative techniques.
△ Less
Submitted 11 March, 2024; v1 submitted 15 January, 2023;
originally announced January 2023.
-
Reasoning about Causality in Games
Authors:
Lewis Hammond,
James Fox,
Tom Everitt,
Ryan Carey,
Alessandro Abate,
Michael Wooldridge
Abstract:
Causal reasoning and game-theoretic reasoning are fundamental topics in artificial intelligence, among many other disciplines: this paper is concerned with their intersection. Despite their importance, a formal framework that supports both these forms of reasoning has, until now, been lacking. We offer a solution in the form of (structural) causal games, which can be seen as extending Pearl's caus…
▽ More
Causal reasoning and game-theoretic reasoning are fundamental topics in artificial intelligence, among many other disciplines: this paper is concerned with their intersection. Despite their importance, a formal framework that supports both these forms of reasoning has, until now, been lacking. We offer a solution in the form of (structural) causal games, which can be seen as extending Pearl's causal hierarchy to the game-theoretic domain, or as extending Koller and Milch's multi-agent influence diagrams to the causal domain. We then consider three key questions: i) How can the (causal) dependencies in games - either between variables, or between strategies - be modelled in a uniform, principled manner? ii) How may causal queries be computed in causal games, and what assumptions does this require? iii) How do causal games compare to existing formalisms? To address question i), we introduce mechanised games, which encode dependencies between agents' decision rules and the distributions governing the game. In response to question ii), we present definitions of predictions, interventions, and counterfactuals, and discuss the assumptions required for each. Regarding question iii), we describe correspondences between causal games and other formalisms, and explain how causal games can be used to answer queries that other causal or game-theoretic models do not support. Finally, we highlight possible applications of causal games, aided by an extensive open-source Python library.
△ Less
Submitted 17 April, 2023; v1 submitted 5 January, 2023;
originally announced January 2023.
-
Robust Control for Dynamical Systems With Non-Gaussian Noise via Formal Abstractions
Authors:
Thom Badings,
Licio Romao,
Alessandro Abate,
David Parker,
Hasan A. Poonawala,
Marielle Stoelinga,
Nils Jansen
Abstract:
Controllers for dynamical systems that operate in safety-critical settings must account for stochastic disturbances. Such disturbances are often modeled as process noise in a dynamical system, and common assumptions are that the underlying distributions are known and/or Gaussian. In practice, however, these assumptions may be unrealistic and can lead to poor approximations of the true noise distri…
▽ More
Controllers for dynamical systems that operate in safety-critical settings must account for stochastic disturbances. Such disturbances are often modeled as process noise in a dynamical system, and common assumptions are that the underlying distributions are known and/or Gaussian. In practice, however, these assumptions may be unrealistic and can lead to poor approximations of the true noise distribution. We present a novel controller synthesis method that does not rely on any explicit representation of the noise distributions. In particular, we address the problem of computing a controller that provides probabilistic guarantees on safely reaching a target, while also avoiding unsafe regions of the state space. First, we abstract the continuous control system into a finite-state model that captures noise by probabilistic transitions between discrete states. As a key contribution, we adapt tools from the scenario approach to compute probably approximately correct (PAC) bounds on these transition probabilities, based on a finite number of samples of the noise. We capture these bounds in the transition probability intervals of a so-called interval Markov decision process (iMDP). This iMDP is, with a user-specified confidence probability, robust against uncertainty in the transition probabilities, and the tightness of the probability intervals can be controlled through the number of samples. We use state-of-the-art verification techniques to provide guarantees on the iMDP and compute a controller for which these guarantees carry over to the original control system. In addition, we develop a tailored computational scheme that reduces the complexity of the synthesis of these guarantees on the iMDP. Benchmarks on realistic control systems show the practical applicability of our method, even when the iMDP has hundreds of millions of transitions.
△ Less
Submitted 4 January, 2023;
originally announced January 2023.
-
Lexicographic Multi-Objective Reinforcement Learning
Authors:
Joar Skalse,
Lewis Hammond,
Charlie Griffin,
Alessandro Abate
Abstract:
In this work we introduce reinforcement learning techniques for solving lexicographic multi-objective problems. These are problems that involve multiple reward signals, and where the goal is to learn a policy that maximises the first reward signal, and subject to this constraint also maximises the second reward signal, and so on. We present a family of both action-value and policy gradient algorit…
▽ More
In this work we introduce reinforcement learning techniques for solving lexicographic multi-objective problems. These are problems that involve multiple reward signals, and where the goal is to learn a policy that maximises the first reward signal, and subject to this constraint also maximises the second reward signal, and so on. We present a family of both action-value and policy gradient algorithms that can be used to solve such problems, and prove that they converge to policies that are lexicographically optimal. We evaluate the scalability and performance of these algorithms empirically, demonstrating their practical applicability. As a more specific application, we show how our algorithms can be used to impose safety constraints on the behaviour of an agent, and compare their performance in this context with that of other constrained reinforcement learning algorithms.
△ Less
Submitted 28 December, 2022;
originally announced December 2022.
-
Misspecification in Inverse Reinforcement Learning
Authors:
Joar Skalse,
Alessandro Abate
Abstract:
The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function $R$ from a policy $π$. To do this, we need a model of how $π$ relates to $R$. In the current literature, the most common models are optimality, Boltzmann rationality, and causal entropy maximisation. One of the primary motivations behind IRL is to infer human preferences from human behaviour. However, the true relationsh…
▽ More
The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function $R$ from a policy $π$. To do this, we need a model of how $π$ relates to $R$. In the current literature, the most common models are optimality, Boltzmann rationality, and causal entropy maximisation. One of the primary motivations behind IRL is to infer human preferences from human behaviour. However, the true relationship between human preferences and human behaviour is much more complex than any of the models currently used in IRL. This means that they are misspecified, which raises the worry that they might lead to unsound inferences if applied to real-world data. In this paper, we provide a mathematical analysis of how robust different IRL models are to misspecification, and answer precisely how the demonstrator policy may differ from each of the standard models before that model leads to faulty inferences about the reward function $R$. We also introduce a framework for reasoning about misspecification in IRL, together with formal tools that can be used to easily derive the misspecification robustness of new IRL models.
△ Less
Submitted 24 March, 2023; v1 submitted 6 December, 2022;
originally announced December 2022.
-
Data-driven memory-dependent abstractions of dynamical systems
Authors:
Adrien Banse,
Licio Romao,
Alessandro Abate,
Raphaël M. Jungers
Abstract:
We propose a sample-based, sequential method to abstract a (potentially black-box) dynamical system with a sequence of memory-dependent Markov chains of increasing size. We show that this approximation allows to alleviating a correlation bias that has been observed in sample-based abstractions. We further propose a methodology to detect on the fly the memory length resulting in an abstraction with…
▽ More
We propose a sample-based, sequential method to abstract a (potentially black-box) dynamical system with a sequence of memory-dependent Markov chains of increasing size. We show that this approximation allows to alleviating a correlation bias that has been observed in sample-based abstractions. We further propose a methodology to detect on the fly the memory length resulting in an abstraction with sufficient accuracy. We prove that under reasonable assumptions, the method converges to a sound abstraction in some precise sense, and we showcase it on two case studies.
△ Less
Submitted 4 December, 2022;
originally announced December 2022.
-
Formal Controller Synthesis for Markov Jump Linear Systems with Uncertain Dynamics
Authors:
Luke Rickard,
Thom Badings,
Licio Romao,
Alessandro Abate
Abstract:
Automated synthesis of provably correct controllers for cyber-physical systems is crucial for deployment in safety-critical scenarios. However, hybrid features and stochastic or unknown behaviours make this problem challenging. We propose a method for synthesising controllers for Markov jump linear systems (MJLSs), a class of discrete-time models for cyber-physical systems, so that they certifiabl…
▽ More
Automated synthesis of provably correct controllers for cyber-physical systems is crucial for deployment in safety-critical scenarios. However, hybrid features and stochastic or unknown behaviours make this problem challenging. We propose a method for synthesising controllers for Markov jump linear systems (MJLSs), a class of discrete-time models for cyber-physical systems, so that they certifiably satisfy probabilistic computation tree logic (PCTL) formulae. An MJLS consists of a finite set of stochastic linear dynamics and discrete jumps between these dynamics that are governed by a Markov decision process (MDP). We consider the cases where the transition probabilities of this MDP are either known up to an interval or completely unknown. Our approach is based on a finite-state abstraction that captures both the discrete (mode-jum**) and continuous (stochastic linear) behaviour of the MJLS. We formalise this abstraction as an interval MDP (iMDP) for which we compute intervals of transition probabilities using sampling techniques from the so-called 'scenario approach', resulting in a probabilistically sound approximation. We apply our method to multiple realistic benchmark problems, in particular, a temperature control and an aerial vehicle delivery problem.
△ Less
Submitted 4 August, 2023; v1 submitted 1 December, 2022;
originally announced December 2022.
-
Probabilities Are Not Enough: Formal Controller Synthesis for Stochastic Dynamical Models with Epistemic Uncertainty
Authors:
Thom Badings,
Licio Romao,
Alessandro Abate,
Nils Jansen
Abstract:
Capturing uncertainty in models of complex dynamical systems is crucial to designing safe controllers. Stochastic noise causes aleatoric uncertainty, whereas imprecise knowledge of model parameters leads to epistemic uncertainty. Several approaches use formal abstractions to synthesize policies that satisfy temporal specifications related to safety and reachability. However, the underlying models…
▽ More
Capturing uncertainty in models of complex dynamical systems is crucial to designing safe controllers. Stochastic noise causes aleatoric uncertainty, whereas imprecise knowledge of model parameters leads to epistemic uncertainty. Several approaches use formal abstractions to synthesize policies that satisfy temporal specifications related to safety and reachability. However, the underlying models exclusively capture aleatoric but not epistemic uncertainty, and thus require that model parameters are known precisely. Our contribution to overcoming this restriction is a novel abstraction-based controller synthesis method for continuous-state models with stochastic noise and uncertain parameters. By sampling techniques and robust analysis, we capture both aleatoric and epistemic uncertainty, with a user-specified confidence level, in the transition probability intervals of a so-called interval Markov decision process (iMDP). We synthesize an optimal policy on this iMDP, which translates (with the specified confidence level) to a feedback controller for the continuous model with the same performance guarantees. Our experimental benchmarks confirm that accounting for epistemic uncertainty leads to controllers that are more robust against variations in parameter values.
△ Less
Submitted 7 December, 2022; v1 submitted 12 October, 2022;
originally announced October 2022.
-
Bounded Robustness in Reinforcement Learning via Lexicographic Objectives
Authors:
Daniel Jarne Ornia,
Licio Romao,
Lewis Hammond,
Manuel Mazo Jr.,
Alessandro Abate
Abstract:
Policy robustness in Reinforcement Learning may not be desirable at any cost: the alterations caused by robustness requirements from otherwise optimal policies should be explainable, quantifiable and formally verifiable. In this work we study how policies can be maximally robust to arbitrary observational noise by analysing how they are altered by this noise through a stochastic linear operator in…
▽ More
Policy robustness in Reinforcement Learning may not be desirable at any cost: the alterations caused by robustness requirements from otherwise optimal policies should be explainable, quantifiable and formally verifiable. In this work we study how policies can be maximally robust to arbitrary observational noise by analysing how they are altered by this noise through a stochastic linear operator interpretation of the disturbances, and establish connections between robustness and properties of the noise kernel and of the underlying MDPs. Then, we construct sufficient conditions for policy robustness, and propose a robustness-inducing scheme, applicable to any policy gradient algorithm, that formally trades off expected policy utility for robustness through lexicographic optimisation, while preserving convergence and sub-optimality in the policy synthesis.
△ Less
Submitted 11 December, 2023; v1 submitted 30 September, 2022;
originally announced September 2022.
-
LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning
Authors:
Hosein Hasanbeig,
Daniel Kroening,
Alessandro Abate
Abstract:
LCRL is a software tool that implements model-free Reinforcement Learning (RL) algorithms over unknown Markov Decision Processes (MDPs), synthesising policies that satisfy a given linear temporal specification with maximal probability. LCRL leverages partially deterministic finite-state machines known as Limit Deterministic Buchi Automata (LDBA) to express a given linear temporal specification. A…
▽ More
LCRL is a software tool that implements model-free Reinforcement Learning (RL) algorithms over unknown Markov Decision Processes (MDPs), synthesising policies that satisfy a given linear temporal specification with maximal probability. LCRL leverages partially deterministic finite-state machines known as Limit Deterministic Buchi Automata (LDBA) to express a given linear temporal specification. A reward function for the RL algorithm is shaped on-the-fly, based on the structure of the LDBA. Theoretical guarantees under proper assumptions ensure the convergence of the RL algorithm to an optimal policy that maximises the satisfaction probability. We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL. Owing to the LDBA-guided exploration and LCRL model-free architecture, we observe robust performance, which also scales well when compared to standard RL approaches (whenever applicable to LTL specifications). Full instructions on how to execute all the case studies in this paper are provided on a GitHub page that accompanies the LCRL distribution www.github.com/grockious/lcrl.
△ Less
Submitted 21 September, 2022;
originally announced September 2022.