-
Leveraging Approximate Model-based Shielding for Probabilistic Safety Guarantees in Continuous Environments
Authors:
Alexander W. Goodall,
Francesco Belardinelli
Abstract:
Shielding is a popular technique for achieving safe reinforcement learning (RL). However, classical shielding approaches come with quite restrictive assumptions making them difficult to deploy in complex environments, particularly those with continuous state or action spaces. In this paper we extend the more versatile approximate model-based shielding (AMBS) framework to the continuous setting. In…
▽ More
Shielding is a popular technique for achieving safe reinforcement learning (RL). However, classical shielding approaches come with quite restrictive assumptions making them difficult to deploy in complex environments, particularly those with continuous state or action spaces. In this paper we extend the more versatile approximate model-based shielding (AMBS) framework to the continuous setting. In particular we use Safety Gym as our test-bed, allowing for a more direct comparison of AMBS with popular constrained RL algorithms. We also provide strong probabilistic safety guarantees for the continuous setting. In addition, we propose two novel penalty techniques that directly modify the policy gradient, which empirically provide more stable convergence in our experiments.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Matbench Discovery -- A framework to evaluate machine learning crystal stability predictions
Authors:
Janosh Riebesell,
Rhys E. A. Goodall,
Philipp Benner,
Yuan Chiang,
Bowen Deng,
Alpha A. Lee,
Anubhav Jain,
Kristin A. Persson
Abstract:
Matbench Discovery simulates the deployment of machine learning (ML) energy models in a high-throughput search for stable inorganic crystals. We address the disconnect between (i) thermodynamic stability and formation energy and (ii) in-domain vs out-of-distribution performance. Alongside this paper, we publish a Python package to aid with future model submissions and a growing online leaderboard…
▽ More
Matbench Discovery simulates the deployment of machine learning (ML) energy models in a high-throughput search for stable inorganic crystals. We address the disconnect between (i) thermodynamic stability and formation energy and (ii) in-domain vs out-of-distribution performance. Alongside this paper, we publish a Python package to aid with future model submissions and a growing online leaderboard with further insights into trade-offs between various performance metrics. To answer the question which ML methodology performs best at materials discovery, our initial release explores a variety of models including random forests, graph neural networks (GNN), one-shot predictors, iterative Bayesian optimizers and universal interatomic potentials (UIP). Ranked best-to-worst by their test set F1 score on thermodynamic stability prediction, we find CHGNet > M3GNet > MACE > ALIGNN > MEGNet > CGCNN > CGCNN+P > Wrenformer > BOWSR > Voronoi tessellation fingerprints with random forest. The top 3 models are UIPs, the winning methodology for ML-guided materials discovery, achieving F1 scores of ~0.6 for crystal stability classification and discovery acceleration factors (DAF) of up to 5x on the first 10k most stable predictions compared to dummy selection from our test set. We also highlight a sharp disconnect between commonly used global regression metrics and more task-relevant classification metrics. Accurate regressors are susceptible to unexpectedly high false-positive rates if those accurate predictions lie close to the decision boundary at 0 eV/atom above the convex hull where most materials are. Our results highlight the need to focus on classification metrics that actually correlate with improved stability hit rate.
△ Less
Submitted 4 February, 2024; v1 submitted 28 August, 2023;
originally announced August 2023.
-
Approximate Model-Based Shielding for Safe Reinforcement Learning
Authors:
Alexander W. Goodall,
Francesco Belardinelli
Abstract:
Reinforcement learning (RL) has shown great potential for solving complex tasks in a variety of domains. However, applying RL to safety-critical systems in the real-world is not easy as many algorithms are sample-inefficient and maximising the standard RL objective comes with no guarantees on worst-case performance. In this paper we propose approximate model-based shielding (AMBS), a principled lo…
▽ More
Reinforcement learning (RL) has shown great potential for solving complex tasks in a variety of domains. However, applying RL to safety-critical systems in the real-world is not easy as many algorithms are sample-inefficient and maximising the standard RL objective comes with no guarantees on worst-case performance. In this paper we propose approximate model-based shielding (AMBS), a principled look-ahead shielding algorithm for verifying the performance of learned RL policies w.r.t. a set of given safety constraints. Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system. We provide a strong theoretical justification for AMBS and demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
△ Less
Submitted 27 July, 2023;
originally announced August 2023.
-
Homomorphisms between graphs embedded on surfaces
Authors:
Delia Garijo,
Andrew Goodall,
Lluís Vena
Abstract:
We extend the notion of graph homomorphism to cellularly embedded graphs (maps) by designing operations on vertices and edges that respect the surface topology; we thus obtain the first definition of map homomorphism that preserves both the combinatorial structure (as a graph homomorphism) and the topological structure of the surface (in particular, orientability and genus). Notions such as the co…
▽ More
We extend the notion of graph homomorphism to cellularly embedded graphs (maps) by designing operations on vertices and edges that respect the surface topology; we thus obtain the first definition of map homomorphism that preserves both the combinatorial structure (as a graph homomorphism) and the topological structure of the surface (in particular, orientability and genus). Notions such as the core of a graph and the homomorphism order on cores are then extended to maps. We also develop a purely combinatorial framework for various topological features of a map such as the contractibility of closed walks, which in particular allows us to characterize map cores. We then show that the poset of map cores ordered by the existence of a homomorphism is connected and, in contrast to graph homomorphisms, does not contain any dense interval (so it is not universal for countable posets). Finally, we give examples of a pair of cores with an infinite number of cores between them, an infinite chain of gaps, and arbitrarily large antichains with a common homomorphic image.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
Approximate Shielding of Atari Agents for Safe Exploration
Authors:
Alexander W. Goodall,
Francesco Belardinelli
Abstract:
Balancing exploration and conservatism in the constrained setting is an important problem if we are to use reinforcement learning for meaningful tasks in the real world. In this paper, we propose a principled algorithm for safe exploration based on the concept of shielding. Previous approaches to shielding assume access to a safety-relevant abstraction of the environment or a high-fidelity simulat…
▽ More
Balancing exploration and conservatism in the constrained setting is an important problem if we are to use reinforcement learning for meaningful tasks in the real world. In this paper, we propose a principled algorithm for safe exploration based on the concept of shielding. Previous approaches to shielding assume access to a safety-relevant abstraction of the environment or a high-fidelity simulator. Instead, our work is based on latent shielding - another approach that leverages world models to verify policy roll-outs in the latent space of a learned dynamics model. Our novel algorithm builds on this previous work, using safety critics and other additional features to improve the stability and farsightedness of the algorithm. We demonstrate the effectiveness of our approach by running experiments on a small set of Atari games with state dependent safety labels. We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations, and in some cases improves the speed of convergence and quality of the final agent.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Trieste: Efficiently Exploring The Depths of Black-box Functions with TensorFlow
Authors:
Victor Picheny,
Joel Berkeley,
Henry B. Moss,
Hrvoje Stojic,
Uri Granta,
Sebastian W. Ober,
Artem Artemev,
Khurram Ghani,
Alexander Goodall,
Andrei Paleyes,
Sattar Vakili,
Sergio Pascual-Diaz,
Stratis Markou,
Jixiang Qing,
Nasrulloh R. B. S Loka,
Ivo Couckuyt
Abstract:
We present Trieste, an open-source Python package for Bayesian optimization and active learning benefiting from the scalability and efficiency of TensorFlow. Our library enables the plug-and-play of popular TensorFlow-based models within sequential decision-making loops, e.g. Gaussian processes from GPflow or GPflux, or neural networks from Keras. This modular mindset is central to the package and…
▽ More
We present Trieste, an open-source Python package for Bayesian optimization and active learning benefiting from the scalability and efficiency of TensorFlow. Our library enables the plug-and-play of popular TensorFlow-based models within sequential decision-making loops, e.g. Gaussian processes from GPflow or GPflux, or neural networks from Keras. This modular mindset is central to the package and extends to our acquisition functions and the internal dynamics of the decision-making loop, both of which can be tailored and extended by researchers or engineers when tackling custom use cases. Trieste is a research-friendly and production-ready toolkit backed by a comprehensive test suite, extensive documentation, and available at https://github.com/secondmind-labs/trieste.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
Predicting materials properties without crystal structure: Deep representation learning from stoichiometry
Authors:
Rhys E. A. Goodall,
Alpha A. Lee
Abstract:
Machine learning has the potential to accelerate materials discovery by accurately predicting materials properties at a low computational cost. However, the model inputs remain a key stumbling block. Current methods typically use descriptors constructed from knowledge of either the full crystal structure -- therefore only applicable to materials with already characterised structures -- or structur…
▽ More
Machine learning has the potential to accelerate materials discovery by accurately predicting materials properties at a low computational cost. However, the model inputs remain a key stumbling block. Current methods typically use descriptors constructed from knowledge of either the full crystal structure -- therefore only applicable to materials with already characterised structures -- or structure-agnostic fixed-length representations hand-engineered from the stoichiometry. We develop a machine learning approach that takes only the stoichiometry as input and automatically learns appropriate and systematically improvable descriptors from data. Our key insight is to treat the stoichiometric formula as a dense weighted graph between elements. Compared to the state of the art for structure-agnostic methods, our approach achieves lower errors with less data.
△ Less
Submitted 23 September, 2020; v1 submitted 1 October, 2019;
originally announced October 2019.
-
On the complexity of generalized chromatic polynomials
Authors:
A. Goodall,
M. Hermann,
T. Kotek,
J. A. Makowsky,
S. D. Noble
Abstract:
J. Makowsky and B. Zilber (2004) showed that many variations of graph colorings, called CP-colorings in the sequel, give rise to graph polynomials. This is true in particular for harmonious colorings, convex colorings, mcc_t-colorings, and rainbow colorings, and many more. N. Linial (1986) showed that the chromatic polynomial $χ(G;X)$ is #P-hard to evaluate for all but three values X=0,1,2, where…
▽ More
J. Makowsky and B. Zilber (2004) showed that many variations of graph colorings, called CP-colorings in the sequel, give rise to graph polynomials. This is true in particular for harmonious colorings, convex colorings, mcc_t-colorings, and rainbow colorings, and many more. N. Linial (1986) showed that the chromatic polynomial $χ(G;X)$ is #P-hard to evaluate for all but three values X=0,1,2, where evaluation is in P. This dichotomy includes evaluation at real or complex values, and has the further property that the set of points for which evaluation is in P is finite. We investigate how the complexity of evaluating univariate graph polynomials that arise from CP-colorings varies for different evaluation points. We show that for some CP-colorings (harmonious, convex) the complexity of evaluation follows a similar pattern to the chromatic polynomial. However, in other cases (proper edge colorings, mcc_t-colorings, H-free colorings) we could only obtain a dichotomy for evaluations at non-negative integer points. We also discuss some CP-colorings where we only have very partial results.
△ Less
Submitted 23 January, 2017;
originally announced January 2017.
-
Counting cocircuits and convex two-colourings is #P-complete
Authors:
Andrew J. Goodall,
Steven D. Noble
Abstract:
We prove that the problem of counting the number of colourings of the vertices of a graph with at most two colours, such that the colour classes induce connected subgraphs is #P-complete. We also show that the closely related problem of counting the number of cocircuits of a graph is #P-complete.
We prove that the problem of counting the number of colourings of the vertices of a graph with at most two colours, such that the colour classes induce connected subgraphs is #P-complete. We also show that the closely related problem of counting the number of cocircuits of a graph is #P-complete.
△ Less
Submitted 11 October, 2008;
originally announced October 2008.