-
Structured Q-learning For Antibody Design
Authors:
Alexander I. Cowen-Rivers,
Philip John Gorinski,
Aivar Sootla,
Asif Khan,
Liu Furui,
Jun Wang,
Jan Peters,
Haitham Bou Ammar
Abstract:
Optimizing combinatorial structures is core to many real-world problems, such as those encountered in life sciences. For example, one of the crucial steps involved in antibody design is to find an arrangement of amino acids in a protein sequence that improves its binding with a pathogen. Combinatorial optimization of antibodies is difficult due to extremely large search spaces and non-linear objec…
▽ More
Optimizing combinatorial structures is core to many real-world problems, such as those encountered in life sciences. For example, one of the crucial steps involved in antibody design is to find an arrangement of amino acids in a protein sequence that improves its binding with a pathogen. Combinatorial optimization of antibodies is difficult due to extremely large search spaces and non-linear objectives. Even for modest antibody design problems, where proteins have a sequence length of eleven, we are faced with searching over 2.05 x 10^14 structures. Applying traditional Reinforcement Learning algorithms such as Q-learning to combinatorial optimization results in poor performance. We propose Structured Q-learning (SQL), an extension of Q-learning that incorporates structural priors for combinatorial optimization. Using a molecular docking simulator, we demonstrate that SQL finds high binding energy sequences and performs favourably against baselines on eight challenging antibody design tasks, including designing antibodies for SARS-COV.
△ Less
Submitted 13 September, 2022; v1 submitted 10 September, 2022;
originally announced September 2022.
-
Effects of Safety State Augmentation on Safe Exploration
Authors:
Aivar Sootla,
Alexander I. Cowen-Rivers,
Jun Wang,
Haitham Bou Ammar
Abstract:
Safe exploration is a challenging and important problem in model-free reinforcement learning (RL). Often the safety cost is sparse and unknown, which unavoidably leads to constraint violations -- a phenomenon ideally to be avoided in safety-critical applications. We tackle this problem by augmenting the state-space with a safety state, which is nonnegative if and only if the constraint is satisfie…
▽ More
Safe exploration is a challenging and important problem in model-free reinforcement learning (RL). Often the safety cost is sparse and unknown, which unavoidably leads to constraint violations -- a phenomenon ideally to be avoided in safety-critical applications. We tackle this problem by augmenting the state-space with a safety state, which is nonnegative if and only if the constraint is satisfied. The value of this state also serves as a distance toward constraint violation, while its initial value indicates the available safety budget. This idea allows us to derive policies for scheduling the safety budget during training. We call our approach Simmer (Safe policy IMproveMEnt for RL) to reflect the careful nature of these schedules. We apply this idea to two safe RL problems: RL with constraints imposed on an average cost, and RL with constraints imposed on a cost with probability one. Our experiments suggest that "simmering, a safe algorithm can improve safety during training for both settings. We further show that Simmer can stabilize training and improve the performance of safe RL with average constraints.
△ Less
Submitted 12 October, 2022; v1 submitted 6 June, 2022;
originally announced June 2022.
-
Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation
Authors:
Aivar Sootla,
Alexander I. Cowen-Rivers,
Taher Jafferjee,
Ziyan Wang,
David Mguni,
Jun Wang,
Haitham Bou-Ammar
Abstract:
Satisfying safety constraints almost surely (or with probability one) can be critical for the deployment of Reinforcement Learning (RL) in real-life applications. For example, plane landing and take-off should ideally occur with probability one. We address the problem by introducing Safety Augmented (Saute) Markov Decision Processes (MDPs), where the safety constraints are eliminated by augmenting…
▽ More
Satisfying safety constraints almost surely (or with probability one) can be critical for the deployment of Reinforcement Learning (RL) in real-life applications. For example, plane landing and take-off should ideally occur with probability one. We address the problem by introducing Safety Augmented (Saute) Markov Decision Processes (MDPs), where the safety constraints are eliminated by augmenting them into the state-space and resha** the objective. We show that Saute MDP satisfies the Bellman equation and moves us closer to solving Safe RL with constraints satisfied almost surely. We argue that Saute MDP allows viewing the Safe RL problem from a different perspective enabling new features. For instance, our approach has a plug-and-play nature, i.e., any RL algorithm can be "Sauteed". Additionally, state augmentation allows for policy generalization across safety constraints. We finally show that Saute RL algorithms can outperform their state-of-the-art counterparts when constraint satisfaction is of high importance.
△ Less
Submitted 22 June, 2022; v1 submitted 14 February, 2022;
originally announced February 2022.
-
AntBO: Towards Real-World Automated Antibody Design with Combinatorial Bayesian Optimisation
Authors:
Asif Khan,
Alexander I. Cowen-Rivers,
Antoine Grosnit,
Derrick-Goh-Xin Deik,
Philippe A. Robert,
Victor Greiff,
Eva Smorodina,
Puneet Rawat,
Kamil Dreczkowski,
Rahmad Akbar,
Rasul Tutunov,
Dany Bou-Ammar,
Jun Wang,
Amos Storkey,
Haitham Bou-Ammar
Abstract:
Antibodies are canonically Y-shaped multimeric proteins capable of highly specific molecular recognition. The CDRH3 region located at the tip of variable chains of an antibody dominates antigen-binding specificity. Therefore, it is a priority to design optimal antigen-specific CDRH3 regions to develop therapeutic antibodies. However, the combinatorial nature of CDRH3 sequence space makes it imposs…
▽ More
Antibodies are canonically Y-shaped multimeric proteins capable of highly specific molecular recognition. The CDRH3 region located at the tip of variable chains of an antibody dominates antigen-binding specificity. Therefore, it is a priority to design optimal antigen-specific CDRH3 regions to develop therapeutic antibodies. However, the combinatorial nature of CDRH3 sequence space makes it impossible to search for an optimal binding sequence exhaustively and efficiently using computational approaches. Here, we present \texttt{AntBO}: a combinatorial Bayesian optimisation framework enabling efficient \textit{in silico} design of the CDRH3 region. Ideally, antibodies are expected to have high target specificity and developability. We introduce a CDRH3 trust region that restricts the search to sequences with favourable developability scores to achieve this goal. For benchmarking, \texttt{AntBO} uses the \texttt{Absolut!} software suite as a black-box oracle to score the target specificity and affinity of designed antibodies \textit{in silico} in an unconstrained fashion~\citep{robert2021one}. The experiments performed for $159$ discretised antigens used in \texttt{Absolut!} demonstrate the benefit of \texttt{AntBO} in designing CDRH3 regions with diverse biophysical properties. In under $200$ calls to black-box oracle, \texttt{AntBO} can suggest antibody sequences that outperform the best binding sequence drawn from 6.9 million experimentally obtained CDRH3s and a commonly used genetic algorithm baseline. Additionally, \texttt{AntBO} finds very-high affinity CDRH3 sequences in only 38 protein designs whilst requiring no domain knowledge. We conclude \texttt{AntBO} brings automated antibody design methods closer to what is practically viable for in vitro experimentation.
△ Less
Submitted 14 October, 2022; v1 submitted 29 January, 2022;
originally announced January 2022.
-
Learning Geometric Constraints in Task and Motion Planning
Authors:
Tianyu Ren,
Alexander Imani Cowen-Rivers,
Haitham Bou Ammar,
Jan Peters
Abstract:
Searching for bindings of geometric parameters in task and motion planning (TAMP) is a finite-horizon stochastic planning problem with high-dimensional decision spaces. A robot manipulator can only move in a subspace of its whole range that is subjected to many geometric constraints. A TAMP solver usually takes many explorations before finding a feasible binding set for each task. It is favorable…
▽ More
Searching for bindings of geometric parameters in task and motion planning (TAMP) is a finite-horizon stochastic planning problem with high-dimensional decision spaces. A robot manipulator can only move in a subspace of its whole range that is subjected to many geometric constraints. A TAMP solver usually takes many explorations before finding a feasible binding set for each task. It is favorable to learn those constraints once and then transfer them over different tasks within the same workspace. We address this problem by representing constraint knowledge with transferable primitives and using Bayesian optimization (BO) based on these primitives to guide binding search in further tasks. Via semantic and geometric backtracking in TAMP, we construct constraint primitives to encode the geometric constraints respectively in a reusable form. Then we devise a BO approach to efficiently utilize the accumulated constraints for guiding node expansion of an MCTS-based binding planner. We further compose a transfer mechanism to enable free knowledge flow between TAMP tasks. Results indicate that our approach reduces the expensive exploration calls in binding search by 43.60to 71.69 when compared to the baseline unguided planner.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
High-Dimensional Bayesian Optimisation with Variational Autoencoders and Deep Metric Learning
Authors:
Antoine Grosnit,
Rasul Tutunov,
Alexandre Max Maraval,
Ryan-Rhys Griffiths,
Alexander I. Cowen-Rivers,
Lin Yang,
Lin Zhu,
Wenlong Lyu,
Zhitang Chen,
Jun Wang,
Jan Peters,
Haitham Bou-Ammar
Abstract:
We introduce a method combining variational autoencoders (VAEs) and deep metric learning to perform Bayesian optimisation (BO) over high-dimensional and structured input spaces. By adapting ideas from deep metric learning, we use label guidance from the blackbox function to structure the VAE latent space, facilitating the Gaussian process fit and yielding improved BO performance. Importantly for B…
▽ More
We introduce a method combining variational autoencoders (VAEs) and deep metric learning to perform Bayesian optimisation (BO) over high-dimensional and structured input spaces. By adapting ideas from deep metric learning, we use label guidance from the blackbox function to structure the VAE latent space, facilitating the Gaussian process fit and yielding improved BO performance. Importantly for BO problem settings, our method operates in semi-supervised regimes where only few labelled data points are available. We run experiments on three real-world tasks, achieving state-of-the-art results on the penalised logP molecule generation benchmark using just 3% of the labelled data required by previous approaches. As a theoretical contribution, we present a proof of vanishing regret for VAE BO.
△ Less
Submitted 1 November, 2021; v1 submitted 7 June, 2021;
originally announced June 2021.
-
Are we Forgetting about Compositional Optimisers in Bayesian Optimisation?
Authors:
Antoine Grosnit,
Alexander I. Cowen-Rivers,
Rasul Tutunov,
Ryan-Rhys Griffiths,
Jun Wang,
Haitham Bou-Ammar
Abstract:
Bayesian optimisation presents a sample-efficient methodology for global optimisation. Within this framework, a crucial performance-determining subroutine is the maximisation of the acquisition function, a task complicated by the fact that acquisition functions tend to be non-convex and thus nontrivial to optimise. In this paper, we undertake a comprehensive empirical study of approaches to maximi…
▽ More
Bayesian optimisation presents a sample-efficient methodology for global optimisation. Within this framework, a crucial performance-determining subroutine is the maximisation of the acquisition function, a task complicated by the fact that acquisition functions tend to be non-convex and thus nontrivial to optimise. In this paper, we undertake a comprehensive empirical study of approaches to maximise the acquisition function. Additionally, by deriving novel, yet mathematically equivalent, compositional forms for popular acquisition functions, we recast the maximisation task as a compositional optimisation problem, allowing us to benefit from the extensive literature in this field. We highlight the empirical advantages of the compositional approach to acquisition function maximisation across 3958 individual experiments comprising synthetic optimisation tasks as well as tasks from Bayesmark. Given the generality of the acquisition function maximisation subroutine, we posit that the adoption of compositional optimisers has the potential to yield performance improvements across all domains in which Bayesian optimisation is currently being applied.
△ Less
Submitted 17 December, 2020; v1 submitted 15 December, 2020;
originally announced December 2020.
-
HEBO Pushing The Limits of Sample-Efficient Hyperparameter Optimisation
Authors:
Alexander I. Cowen-Rivers,
Wenlong Lyu,
Rasul Tutunov,
Zhi Wang,
Antoine Grosnit,
Ryan Rhys Griffiths,
Alexandre Max Maraval,
Hao Jianye,
Jun Wang,
Jan Peters,
Haitham Bou Ammar
Abstract:
In this work we rigorously analyse assumptions inherent to black-box optimisation hyper-parameter tuning tasks. Our results on the Bayesmark benchmark indicate that heteroscedasticity and non-stationarity pose significant challenges for black-box optimisers. Based on these findings, we propose a Heteroscedastic and Evolutionary Bayesian Optimisation solver (HEBO). HEBO performs non-linear input an…
▽ More
In this work we rigorously analyse assumptions inherent to black-box optimisation hyper-parameter tuning tasks. Our results on the Bayesmark benchmark indicate that heteroscedasticity and non-stationarity pose significant challenges for black-box optimisers. Based on these findings, we propose a Heteroscedastic and Evolutionary Bayesian Optimisation solver (HEBO). HEBO performs non-linear input and output war**, admits exact marginal log-likelihood optimisation and is robust to the values of learned parameters. We demonstrate HEBO's empirical efficacy on the NeurIPS 2020 Black-Box Optimisation challenge, where HEBO placed first. Upon further analysis, we observe that HEBO significantly outperforms existing black-box optimisers on 108 machine learning hyperparameter tuning tasks comprising the Bayesmark benchmark. Our findings indicate that the majority of hyper-parameter tuning tasks exhibit heteroscedasticity and non-stationarity, multi-objective acquisition ensembles with Pareto front solutions improve queried configurations, and robust acquisition maximisers afford empirical advantages relative to their non-robust counterparts. We hope these findings may serve as guiding principles for practitioners of Bayesian optimisation. All code is made available at https://github.com/huawei-noah/HEBO.
△ Less
Submitted 25 May, 2022; v1 submitted 7 December, 2020;
originally announced December 2020.
-
SAMBA: Safe Model-Based & Active Reinforcement Learning
Authors:
Alexander I. Cowen-Rivers,
Daniel Palenicek,
Vincent Moens,
Mohammed Abdullah,
Aivar Sootla,
Jun Wang,
Haitham Ammar
Abstract:
In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel(semi-)metrics for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We…
▽ More
In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel(semi-)metrics for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. Our results show orders of magnitude reductions in samples and violations compared to state-of-the-art methods. Lastly, we provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
△ Less
Submitted 12 June, 2020;
originally announced June 2020.
-
Emergent Communication with World Models
Authors:
Alexander I. Cowen-Rivers,
Jason Naradowsky
Abstract:
We introduce Language World Models, a class of language-conditional generative model which interpret natural language messages by predicting latent codes of future observations. This provides a visual grounding of the message, similar to an enhanced observation of the world, which may include objects outside of the listening agent's field-of-view. We incorporate this "observation" into a persisten…
▽ More
We introduce Language World Models, a class of language-conditional generative model which interpret natural language messages by predicting latent codes of future observations. This provides a visual grounding of the message, similar to an enhanced observation of the world, which may include objects outside of the listening agent's field-of-view. We incorporate this "observation" into a persistent memory state, and allow the listening agent's policy to condition on it, akin to the relationship between memory and controller in a World Model. We show this improves effective communication and task success in 2D gridworld speaker-listener navigation tasks. In addition, we develop two losses framed specifically for our model-based formulation to promote positive signalling and positive listening. Finally, because messages are interpreted in a generative model, we can visualize the model beliefs to gain insight into how the communication channel is utilized.
△ Less
Submitted 21 February, 2020;
originally announced February 2020.
-
Compositional ADAM: An Adaptive Compositional Solver
Authors:
Rasul Tutunov,
Minne Li,
Alexander I. Cowen-Rivers,
Jun Wang,
Haitham Bou-Ammar
Abstract:
In this paper, we present C-ADAM, the first adaptive solver for compositional problems involving a non-linear functional nesting of expected values. We proof that C-ADAM converges to a stationary point in $\mathcal{O}(δ^{-2.25})$ with $δ$ being a precision parameter. Moreover, we demonstrate the importance of our results by bridging, for the first time, model-agnostic meta-learning (MAML) and comp…
▽ More
In this paper, we present C-ADAM, the first adaptive solver for compositional problems involving a non-linear functional nesting of expected values. We proof that C-ADAM converges to a stationary point in $\mathcal{O}(δ^{-2.25})$ with $δ$ being a precision parameter. Moreover, we demonstrate the importance of our results by bridging, for the first time, model-agnostic meta-learning (MAML) and compositional optimisation showing fastest known rates for deep network adaptation to-date. Finally, we validate our findings in a set of experiments from portfolio optimisation and meta-learning. Our results manifest significant sample complexity reductions compared to both standard and compositional solvers.
△ Less
Submitted 24 April, 2020; v1 submitted 10 February, 2020;
originally announced February 2020.
-
Neural Variational Inference For Estimating Uncertainty in Knowledge Graph Embeddings
Authors:
Alexander I. Cowen-Rivers,
Pasquale Minervini,
Tim Rocktaschel,
Matko Bosnjak,
Sebastian Riedel,
Jun Wang
Abstract:
Recent advances in Neural Variational Inference allowed for a renaissance in latent variable models in a variety of domains involving high-dimensional data. While traditional variational methods derive an analytical approximation for the intractable distribution over the latent variables, here we construct an inference network conditioned on the symbolic representation of entities and relation typ…
▽ More
Recent advances in Neural Variational Inference allowed for a renaissance in latent variable models in a variety of domains involving high-dimensional data. While traditional variational methods derive an analytical approximation for the intractable distribution over the latent variables, here we construct an inference network conditioned on the symbolic representation of entities and relation types in the Knowledge Graph, to provide the variational distributions. The new framework results in a highly-scalable method. Under a Bernoulli sampling framework, we provide an alternative justification for commonly used techniques in large-scale stochastic variational inference, which drastically reduce training time at a cost of an additional approximation to the variational lower bound. We introduce two models from this highly scalable probabilistic framework, namely the Latent Information and Latent Fact models, for reasoning over knowledge graph-based representations. Our Latent Information and Latent Fact models improve upon baseline performance under certain conditions. We use the learnt embedding variance to estimate predictive uncertainty during link prediction, and discuss the quality of these learnt uncertainty estimates. Our source code and datasets are publicly available online at https://github.com/alexanderimanicowenrivers/Neural-Variational-Knowledge-Graphs.
△ Less
Submitted 18 August, 2019; v1 submitted 12 June, 2019;
originally announced June 2019.
-
Infer Your Enemies and Know Yourself, Learning in Real-Time Bidding with Partially Observable Opponents
Authors:
Manxing Du,
Alexander I. Cowen-Rivers,
Ying Wen,
Phu Sakulwongtana,
Jun Wang,
Mats Brorsson,
Radu State
Abstract:
Real-time bidding, as one of the most popular mechanisms for selling online ad slots, facilitates advertisers to reach their potential customers. The goal of bidding optimization is to maximize the advertisers' return on investment (ROI) under a certain budget setting. A straightforward solution is to model the bidding function in an explicit form. However, the static functional solutions lack gen…
▽ More
Real-time bidding, as one of the most popular mechanisms for selling online ad slots, facilitates advertisers to reach their potential customers. The goal of bidding optimization is to maximize the advertisers' return on investment (ROI) under a certain budget setting. A straightforward solution is to model the bidding function in an explicit form. However, the static functional solutions lack generality in practice and are insensitive to the stochastic behaviour of other bidders in the environment. In this paper, we propose a general multi-agent framework with actor-critic solutions facing against playing imperfect information games. We firstly introduce a novel Deep Attentive Survival Analysis (DASA) model to infer the censored data in the second price auctions which outperforms start-of-the-art survival analysis. Furthermore, our approach introduces the DASA model as the opponent model into the policy learning process for each agent and develop a mean field equilibrium analysis of the second price auctions. The experiments have shown that with the inference of the market, the market converges to the equilibrium much faster while playing against both fixed strategy agents and dynamic learning agents.
△ Less
Submitted 28 February, 2019;
originally announced February 2019.
-
Towards Incremental Cylindrical Algebraic Decomposition in Maple
Authors:
Alexander Imani Cowen-Rivers,
Matthew England
Abstract:
Cylindrical Algebraic Decomposition (CAD) is an important tool within computational real algebraic geometry, capable of solving many problems for polynomial systems over the reals. It has long been studied by the Symbolic Computation community and has found recent interest in the Satisfiability Checking community. The present report describes a proof of concept implementation of an Incremental CAD…
▽ More
Cylindrical Algebraic Decomposition (CAD) is an important tool within computational real algebraic geometry, capable of solving many problems for polynomial systems over the reals. It has long been studied by the Symbolic Computation community and has found recent interest in the Satisfiability Checking community. The present report describes a proof of concept implementation of an Incremental CAD algorithm in Maple, where CADs are built and then refined as additional polynomial constraints are added. The aim is to make CAD suitable for use as a theory solver for SMT tools who search for solutions by continually reformulating logical formula and querying whether a logical solution is admissible. We describe experiments for the proof of concept, which clearly display the computational advantages compared to iterated re-computation. In addition, the project implemented this work under the recently verified Lazard projection scheme (with corresponding Lazard valuation).
△ Less
Submitted 24 May, 2018;
originally announced May 2018.
-
Summer Research Report: Towards Incremental Lazard Cylindrical Algebraic Decomposition
Authors:
Alexander I. Cowen-Rivers,
Matthew England
Abstract:
Cylindrical Algebraic Decomposition (CAD) is an important tool within computational real algebraic geometry, capable of solving many problems to do with polynomial systems over the reals, but known to have worst-case computational complexity doubly exponential in the number of variables. It has long been studied by the Symbolic Computation community and is implemented in a variety of computer alge…
▽ More
Cylindrical Algebraic Decomposition (CAD) is an important tool within computational real algebraic geometry, capable of solving many problems to do with polynomial systems over the reals, but known to have worst-case computational complexity doubly exponential in the number of variables. It has long been studied by the Symbolic Computation community and is implemented in a variety of computer algebra systems, however, it has also found recent interest in the Satisfiability Checking community for use with SMT-solvers. The SCSC Project seeks to build bridges between these communities.
The present report describes progress made during a Research Internship in Summer 2017 funded by the EU H2020 SCSC CSA. We describe a proof of concept implementation of an Incremental CAD algorithm in Maple, where CADs are built and refined incrementally by polynomial constraint, in contrast to the usual approach of a single computation from a single input. This advance would make CAD of use to SMT-solvers who search for solutions by constantly reformulating logical formula and querying solvers like CAD for whether a logical solution is admissible. We describe experiments for the proof of concept, which clearly display the computational advantages when compared to iterated re-computation. In addition, the project implemented this work under the recently verified Lazard projection scheme (with corresponding Lazard evaluation). That is the minimal complete CAD method in theory, and this is the first documented implementation.
△ Less
Submitted 23 April, 2018;
originally announced April 2018.