Search | arXiv e-print repository

Leveraging Approximate Model-based Shielding for Probabilistic Safety Guarantees in Continuous Environments

Authors: Alexander W. Goodall, Francesco Belardinelli

Abstract: Shielding is a popular technique for achieving safe reinforcement learning (RL). However, classical shielding approaches come with quite restrictive assumptions making them difficult to deploy in complex environments, particularly those with continuous state or action spaces. In this paper we extend the more versatile approximate model-based shielding (AMBS) framework to the continuous setting. In… ▽ More Shielding is a popular technique for achieving safe reinforcement learning (RL). However, classical shielding approaches come with quite restrictive assumptions making them difficult to deploy in complex environments, particularly those with continuous state or action spaces. In this paper we extend the more versatile approximate model-based shielding (AMBS) framework to the continuous setting. In particular we use Safety Gym as our test-bed, allowing for a more direct comparison of AMBS with popular constrained RL algorithms. We also provide strong probabilistic safety guarantees for the continuous setting. In addition, we propose two novel penalty techniques that directly modify the policy gradient, which empirically provide more stable convergence in our experiments. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: Accepted as an Extended Abstract at AAMAS 2024

arXiv:2308.14920 [pdf, other]

Matbench Discovery -- A framework to evaluate machine learning crystal stability predictions

Authors: Janosh Riebesell, Rhys E. A. Goodall, Philipp Benner, Yuan Chiang, Bowen Deng, Alpha A. Lee, Anubhav Jain, Kristin A. Persson

Abstract: Matbench Discovery simulates the deployment of machine learning (ML) energy models in a high-throughput search for stable inorganic crystals. We address the disconnect between (i) thermodynamic stability and formation energy and (ii) in-domain vs out-of-distribution performance. Alongside this paper, we publish a Python package to aid with future model submissions and a growing online leaderboard… ▽ More Matbench Discovery simulates the deployment of machine learning (ML) energy models in a high-throughput search for stable inorganic crystals. We address the disconnect between (i) thermodynamic stability and formation energy and (ii) in-domain vs out-of-distribution performance. Alongside this paper, we publish a Python package to aid with future model submissions and a growing online leaderboard with further insights into trade-offs between various performance metrics. To answer the question which ML methodology performs best at materials discovery, our initial release explores a variety of models including random forests, graph neural networks (GNN), one-shot predictors, iterative Bayesian optimizers and universal interatomic potentials (UIP). Ranked best-to-worst by their test set F1 score on thermodynamic stability prediction, we find CHGNet > M3GNet > MACE > ALIGNN > MEGNet > CGCNN > CGCNN+P > Wrenformer > BOWSR > Voronoi tessellation fingerprints with random forest. The top 3 models are UIPs, the winning methodology for ML-guided materials discovery, achieving F1 scores of ~0.6 for crystal stability classification and discovery acceleration factors (DAF) of up to 5x on the first 10k most stable predictions compared to dummy selection from our test set. We also highlight a sharp disconnect between commonly used global regression metrics and more task-relevant classification metrics. Accurate regressors are susceptible to unexpectedly high false-positive rates if those accurate predictions lie close to the decision boundary at 0 eV/atom above the convex hull where most materials are. Our results highlight the need to focus on classification metrics that actually correlate with improved stability hit rate. △ Less

Submitted 4 February, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: 31 pages, 18 figures, 4 tables

arXiv:2308.00707 [pdf, other]

doi 10.3233/FAIA230357

Approximate Model-Based Shielding for Safe Reinforcement Learning

Authors: Alexander W. Goodall, Francesco Belardinelli

Abstract: Reinforcement learning (RL) has shown great potential for solving complex tasks in a variety of domains. However, applying RL to safety-critical systems in the real-world is not easy as many algorithms are sample-inefficient and maximising the standard RL objective comes with no guarantees on worst-case performance. In this paper we propose approximate model-based shielding (AMBS), a principled lo… ▽ More Reinforcement learning (RL) has shown great potential for solving complex tasks in a variety of domains. However, applying RL to safety-critical systems in the real-world is not easy as many algorithms are sample-inefficient and maximising the standard RL objective comes with no guarantees on worst-case performance. In this paper we propose approximate model-based shielding (AMBS), a principled look-ahead shielding algorithm for verifying the performance of learned RL policies w.r.t. a set of given safety constraints. Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system. We provide a strong theoretical justification for AMBS and demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels. △ Less

Submitted 27 July, 2023; originally announced August 2023.

Comments: Accepted at ECAI 2023 (main technical track)

arXiv:2305.03107 [pdf, other]

Homomorphisms between graphs embedded on surfaces

Authors: Delia Garijo, Andrew Goodall, Lluís Vena

Abstract: We extend the notion of graph homomorphism to cellularly embedded graphs (maps) by designing operations on vertices and edges that respect the surface topology; we thus obtain the first definition of map homomorphism that preserves both the combinatorial structure (as a graph homomorphism) and the topological structure of the surface (in particular, orientability and genus). Notions such as the co… ▽ More We extend the notion of graph homomorphism to cellularly embedded graphs (maps) by designing operations on vertices and edges that respect the surface topology; we thus obtain the first definition of map homomorphism that preserves both the combinatorial structure (as a graph homomorphism) and the topological structure of the surface (in particular, orientability and genus). Notions such as the core of a graph and the homomorphism order on cores are then extended to maps. We also develop a purely combinatorial framework for various topological features of a map such as the contractibility of closed walks, which in particular allows us to characterize map cores. We then show that the poset of map cores ordered by the existence of a homomorphism is connected and, in contrast to graph homomorphisms, does not contain any dense interval (so it is not universal for countable posets). Finally, we give examples of a pair of cores with an infinite number of cores between them, an infinite chain of gaps, and arbitrarily large antichains with a common homomorphic image. △ Less

Submitted 4 May, 2023; originally announced May 2023.

Comments: 46 pages, 11 figures

arXiv:2304.11104 [pdf, other]

Approximate Shielding of Atari Agents for Safe Exploration

Authors: Alexander W. Goodall, Francesco Belardinelli

Abstract: Balancing exploration and conservatism in the constrained setting is an important problem if we are to use reinforcement learning for meaningful tasks in the real world. In this paper, we propose a principled algorithm for safe exploration based on the concept of shielding. Previous approaches to shielding assume access to a safety-relevant abstraction of the environment or a high-fidelity simulat… ▽ More Balancing exploration and conservatism in the constrained setting is an important problem if we are to use reinforcement learning for meaningful tasks in the real world. In this paper, we propose a principled algorithm for safe exploration based on the concept of shielding. Previous approaches to shielding assume access to a safety-relevant abstraction of the environment or a high-fidelity simulator. Instead, our work is based on latent shielding - another approach that leverages world models to verify policy roll-outs in the latent space of a learned dynamics model. Our novel algorithm builds on this previous work, using safety critics and other additional features to improve the stability and farsightedness of the algorithm. We demonstrate the effectiveness of our approach by running experiments on a small set of Atari games with state dependent safety labels. We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations, and in some cases improves the speed of convergence and quality of the final agent. △ Less

Submitted 21 April, 2023; originally announced April 2023.

Comments: Accepted for presentation at the ALA workshop as part of AAMAS 2023

arXiv:2302.08436 [pdf, other]

Trieste: Efficiently Exploring The Depths of Black-box Functions with TensorFlow

Authors: Victor Picheny, Joel Berkeley, Henry B. Moss, Hrvoje Stojic, Uri Granta, Sebastian W. Ober, Artem Artemev, Khurram Ghani, Alexander Goodall, Andrei Paleyes, Sattar Vakili, Sergio Pascual-Diaz, Stratis Markou, Jixiang Qing, Nasrulloh R. B. S Loka, Ivo Couckuyt

Abstract: We present Trieste, an open-source Python package for Bayesian optimization and active learning benefiting from the scalability and efficiency of TensorFlow. Our library enables the plug-and-play of popular TensorFlow-based models within sequential decision-making loops, e.g. Gaussian processes from GPflow or GPflux, or neural networks from Keras. This modular mindset is central to the package and… ▽ More We present Trieste, an open-source Python package for Bayesian optimization and active learning benefiting from the scalability and efficiency of TensorFlow. Our library enables the plug-and-play of popular TensorFlow-based models within sequential decision-making loops, e.g. Gaussian processes from GPflow or GPflux, or neural networks from Keras. This modular mindset is central to the package and extends to our acquisition functions and the internal dynamics of the decision-making loop, both of which can be tailored and extended by researchers or engineers when tackling custom use cases. Trieste is a research-friendly and production-ready toolkit backed by a comprehensive test suite, extensive documentation, and available at https://github.com/secondmind-labs/trieste. △ Less

Submitted 16 February, 2023; originally announced February 2023.

arXiv:1910.00617 [pdf, other]

doi 10.1038/s41467-020-19964-7

Predicting materials properties without crystal structure: Deep representation learning from stoichiometry

Authors: Rhys E. A. Goodall, Alpha A. Lee

Abstract: Machine learning has the potential to accelerate materials discovery by accurately predicting materials properties at a low computational cost. However, the model inputs remain a key stumbling block. Current methods typically use descriptors constructed from knowledge of either the full crystal structure -- therefore only applicable to materials with already characterised structures -- or structur… ▽ More Machine learning has the potential to accelerate materials discovery by accurately predicting materials properties at a low computational cost. However, the model inputs remain a key stumbling block. Current methods typically use descriptors constructed from knowledge of either the full crystal structure -- therefore only applicable to materials with already characterised structures -- or structure-agnostic fixed-length representations hand-engineered from the stoichiometry. We develop a machine learning approach that takes only the stoichiometry as input and automatically learns appropriate and systematically improvable descriptors from data. Our key insight is to treat the stoichiometric formula as a dense weighted graph between elements. Compared to the state of the art for structure-agnostic methods, our approach achieves lower errors with less data. △ Less

Submitted 23 September, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

Comments: A working implementation of our model is available at https://github.com/CompRhys/roost

arXiv:1701.06639 [pdf, other]

On the complexity of generalized chromatic polynomials

Authors: A. Goodall, M. Hermann, T. Kotek, J. A. Makowsky, S. D. Noble

Abstract: J. Makowsky and B. Zilber (2004) showed that many variations of graph colorings, called CP-colorings in the sequel, give rise to graph polynomials. This is true in particular for harmonious colorings, convex colorings, mcc_t-colorings, and rainbow colorings, and many more. N. Linial (1986) showed that the chromatic polynomial $χ(G;X)$ is #P-hard to evaluate for all but three values X=0,1,2, where… ▽ More J. Makowsky and B. Zilber (2004) showed that many variations of graph colorings, called CP-colorings in the sequel, give rise to graph polynomials. This is true in particular for harmonious colorings, convex colorings, mcc_t-colorings, and rainbow colorings, and many more. N. Linial (1986) showed that the chromatic polynomial $χ(G;X)$ is #P-hard to evaluate for all but three values X=0,1,2, where evaluation is in P. This dichotomy includes evaluation at real or complex values, and has the further property that the set of points for which evaluation is in P is finite. We investigate how the complexity of evaluating univariate graph polynomials that arise from CP-colorings varies for different evaluation points. We show that for some CP-colorings (harmonious, convex) the complexity of evaluation follows a similar pattern to the chromatic polynomial. However, in other cases (proper edge colorings, mcc_t-colorings, H-free colorings) we could only obtain a dichotomy for evaluations at non-negative integer points. We also discuss some CP-colorings where we only have very partial results. △ Less

Submitted 23 January, 2017; originally announced January 2017.

Comments: 33 pages, 2 figures, 3 tables

MSC Class: 05C15; 05C31; 05C85; 68Q17; 68W05

arXiv:0810.2042 [pdf, ps, other]

Counting cocircuits and convex two-colourings is #P-complete

Authors: Andrew J. Goodall, Steven D. Noble

Abstract: We prove that the problem of counting the number of colourings of the vertices of a graph with at most two colours, such that the colour classes induce connected subgraphs is #P-complete. We also show that the closely related problem of counting the number of cocircuits of a graph is #P-complete. We prove that the problem of counting the number of colourings of the vertices of a graph with at most two colours, such that the colour classes induce connected subgraphs is #P-complete. We also show that the closely related problem of counting the number of cocircuits of a graph is #P-complete. △ Less

Submitted 11 October, 2008; originally announced October 2008.

Comments: 5 pages

MSC Class: 05C15; 68R10; 68Q17

Showing 1–9 of 9 results for author: Goodall, A