Search | arXiv e-print repository

Structured Q-learning For Antibody Design

Authors: Alexander I. Cowen-Rivers, Philip John Gorinski, Aivar Sootla, Asif Khan, Liu Furui, Jun Wang, Jan Peters, Haitham Bou Ammar

Abstract: Optimizing combinatorial structures is core to many real-world problems, such as those encountered in life sciences. For example, one of the crucial steps involved in antibody design is to find an arrangement of amino acids in a protein sequence that improves its binding with a pathogen. Combinatorial optimization of antibodies is difficult due to extremely large search spaces and non-linear objec… ▽ More Optimizing combinatorial structures is core to many real-world problems, such as those encountered in life sciences. For example, one of the crucial steps involved in antibody design is to find an arrangement of amino acids in a protein sequence that improves its binding with a pathogen. Combinatorial optimization of antibodies is difficult due to extremely large search spaces and non-linear objectives. Even for modest antibody design problems, where proteins have a sequence length of eleven, we are faced with searching over 2.05 x 10^14 structures. Applying traditional Reinforcement Learning algorithms such as Q-learning to combinatorial optimization results in poor performance. We propose Structured Q-learning (SQL), an extension of Q-learning that incorporates structural priors for combinatorial optimization. Using a molecular docking simulator, we demonstrate that SQL finds high binding energy sequences and performs favourably against baselines on eight challenging antibody design tasks, including designing antibodies for SARS-COV. △ Less

Submitted 13 September, 2022; v1 submitted 10 September, 2022; originally announced September 2022.

arXiv:2206.02675 [pdf, other]

Effects of Safety State Augmentation on Safe Exploration

Authors: Aivar Sootla, Alexander I. Cowen-Rivers, Jun Wang, Haitham Bou Ammar

Abstract: Safe exploration is a challenging and important problem in model-free reinforcement learning (RL). Often the safety cost is sparse and unknown, which unavoidably leads to constraint violations -- a phenomenon ideally to be avoided in safety-critical applications. We tackle this problem by augmenting the state-space with a safety state, which is nonnegative if and only if the constraint is satisfie… ▽ More Safe exploration is a challenging and important problem in model-free reinforcement learning (RL). Often the safety cost is sparse and unknown, which unavoidably leads to constraint violations -- a phenomenon ideally to be avoided in safety-critical applications. We tackle this problem by augmenting the state-space with a safety state, which is nonnegative if and only if the constraint is satisfied. The value of this state also serves as a distance toward constraint violation, while its initial value indicates the available safety budget. This idea allows us to derive policies for scheduling the safety budget during training. We call our approach Simmer (Safe policy IMproveMEnt for RL) to reflect the careful nature of these schedules. We apply this idea to two safe RL problems: RL with constraints imposed on an average cost, and RL with constraints imposed on a cost with probability one. Our experiments suggest that "simmering, a safe algorithm can improve safety during training for both settings. We further show that Simmer can stabilize training and improve the performance of safe RL with average constraints. △ Less

Submitted 12 October, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

Comments: Published in Neurips 2022

arXiv:2205.15953 [pdf, other]

Timing is Everything: Learning to Act Selectively with Costly Actions and Budgetary Constraints

Authors: David Mguni, Aivar Sootla, Juliusz Ziomek, Oliver Slumbers, Zipeng Dai, Kun Shao, Jun Wang

Abstract: Many real-world settings involve costs for performing actions; transaction costs in financial systems and fuel costs being common examples. In these settings, performing actions at each time step quickly accumulates costs leading to vastly suboptimal outcomes. Additionally, repeatedly acting produces wear and tear and ultimately, damage. Determining \textit{when to act} is crucial for achieving su… ▽ More Many real-world settings involve costs for performing actions; transaction costs in financial systems and fuel costs being common examples. In these settings, performing actions at each time step quickly accumulates costs leading to vastly suboptimal outcomes. Additionally, repeatedly acting produces wear and tear and ultimately, damage. Determining \textit{when to act} is crucial for achieving successful outcomes and yet, the challenge of efficiently \textit{learning} to behave optimally when actions incur minimally bounded costs remains unresolved. In this paper, we introduce a reinforcement learning (RL) framework named \textbf{L}earnable \textbf{I}mpulse \textbf{C}ontrol \textbf{R}einforcement \textbf{A}lgorithm (LICRA), for learning to optimally select both when to act and which actions to take when actions incur costs. At the core of LICRA is a nested structure that combines RL and a form of policy known as \textit{impulse control} which learns to maximise objectives when actions incur costs. We prove that LICRA, which seamlessly adopts any RL method, converges to policies that optimally select when to perform actions and their optimal magnitudes. We then augment LICRA to handle problems in which the agent can perform at most $k<\infty$ actions and more generally, faces a budget constraint. We show LICRA learns the optimal value function and ensures budget constraints are satisfied almost surely. We demonstrate empirically LICRA's superior performance against benchmark RL methods in OpenAI gym's \textit{Lunar Lander} and in \textit{Highway} environments and a variant of the Merton portfolio problem within finance. △ Less

Submitted 4 June, 2023; v1 submitted 31 May, 2022; originally announced May 2022.

arXiv:2205.15064 [pdf, other]

SEREN: Knowing When to Explore and When to Exploit

Authors: Changmin Yu, David Mguni, Dong Li, Aivar Sootla, Jun Wang, Neil Burgess

Abstract: Efficient reinforcement learning (RL) involves a trade-off between "exploitative" actions that maximise expected reward and "explorative'" ones that sample unvisited states. To encourage exploration, recent approaches proposed adding stochasticity to actions, separating exploration and exploitation phases, or equating reduction in uncertainty with reward. However, these techniques do not necessari… ▽ More Efficient reinforcement learning (RL) involves a trade-off between "exploitative" actions that maximise expected reward and "explorative'" ones that sample unvisited states. To encourage exploration, recent approaches proposed adding stochasticity to actions, separating exploration and exploitation phases, or equating reduction in uncertainty with reward. However, these techniques do not necessarily offer entirely systematic approaches making this trade-off. Here we introduce SElective Reinforcement Exploration Network (SEREN) that poses the exploration-exploitation trade-off as a game between an RL agent -- \exploiter, which purely exploits known rewards, and another RL agent -- \switcher, which chooses at which states to activate a pure exploration policy that is trained to minimise system uncertainty and override Exploiter. Using a form of policies known as impulse control, \switcher is able to determine the best set of states to switch to the exploration policy while Exploiter is free to execute its actions everywhere else. We prove that SEREN converges quickly and induces a natural schedule towards pure exploitation. Through extensive empirical studies in both discrete (MiniGrid) and continuous (MuJoCo) control benchmarks, we show that SEREN can be readily combined with existing RL algorithms to yield significant improvement in performance relative to state-of-the-art algorithms. △ Less

Submitted 30 May, 2022; originally announced May 2022.

Comments: arXiv admin note: text overlap with arXiv:2112.02618, arXiv:2103.09159, arXiv:2110.14468

arXiv:2202.06558 [pdf, other]

Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation

Authors: Aivar Sootla, Alexander I. Cowen-Rivers, Taher Jafferjee, Ziyan Wang, David Mguni, Jun Wang, Haitham Bou-Ammar

Abstract: Satisfying safety constraints almost surely (or with probability one) can be critical for the deployment of Reinforcement Learning (RL) in real-life applications. For example, plane landing and take-off should ideally occur with probability one. We address the problem by introducing Safety Augmented (Saute) Markov Decision Processes (MDPs), where the safety constraints are eliminated by augmenting… ▽ More Satisfying safety constraints almost surely (or with probability one) can be critical for the deployment of Reinforcement Learning (RL) in real-life applications. For example, plane landing and take-off should ideally occur with probability one. We address the problem by introducing Safety Augmented (Saute) Markov Decision Processes (MDPs), where the safety constraints are eliminated by augmenting them into the state-space and resha** the objective. We show that Saute MDP satisfies the Bellman equation and moves us closer to solving Safe RL with constraints satisfied almost surely. We argue that Saute MDP allows viewing the Safe RL problem from a different perspective enabling new features. For instance, our approach has a plug-and-play nature, i.e., any RL algorithm can be "Sauteed". Additionally, state augmentation allows for policy generalization across safety constraints. We finally show that Saute RL algorithms can outperform their state-of-the-art counterparts when constraint satisfaction is of high importance. △ Less

Submitted 22 June, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

Comments: ICML 2022

arXiv:2202.06557 [pdf, other]

Reinforcement Learning in Presence of Discrete Markovian Context Evolution

Authors: Hang Ren, Aivar Sootla, Taher Jafferjee, Junxiao Shen, Jun Wang, Haitham Bou-Ammar

Abstract: We consider a context-dependent Reinforcement Learning (RL) setting, which is characterized by: a) an unknown finite number of not directly observable contexts; b) abrupt (discontinuous) context changes occurring during an episode; and c) Markovian context evolution. We argue that this challenging case is often met in applications and we tackle it using a Bayesian approach and variational inferenc… ▽ More We consider a context-dependent Reinforcement Learning (RL) setting, which is characterized by: a) an unknown finite number of not directly observable contexts; b) abrupt (discontinuous) context changes occurring during an episode; and c) Markovian context evolution. We argue that this challenging case is often met in applications and we tackle it using a Bayesian approach and variational inference. We adapt a sticky Hierarchical Dirichlet Process (HDP) prior for model learning, which is arguably best-suited for Markov process modeling. We then derive a context distillation procedure, which identifies and removes spurious contexts in an unsupervised fashion. We argue that the combination of these two components allows to infer the number of contexts from data thus dealing with the context cardinality assumption. We then find the representation of the optimal policy enabling efficient policy learning using off-the-shelf RL algorithms. Finally, we demonstrate empirically (using gym environments cart-pole swing-up, drone, intersection) that our approach succeeds where state-of-the-art methods of other frameworks fail and elaborate on the reasons for such failures. △ Less

Submitted 14 February, 2022; originally announced February 2022.

Comments: Accepted to ICLR 2022

arXiv:2110.14468 [pdf, other]

DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention

Authors: David Mguni, Usman Islam, Yaqi Sun, Xiuling Zhang, Joel Jennings, Aivar Sootla, Changmin Yu, Ziyan Wang, Jun Wang, Yaodong Yang

Abstract: Reinforcement learning (RL) involves performing exploratory actions in an unknown system. This can place a learning agent in dangerous and potentially catastrophic system states. Current approaches for tackling safe learning in RL simultaneously trade-off safe exploration and task fulfillment. In this paper, we introduce a new generation of RL solvers that learn to minimise safety violations while… ▽ More Reinforcement learning (RL) involves performing exploratory actions in an unknown system. This can place a learning agent in dangerous and potentially catastrophic system states. Current approaches for tackling safe learning in RL simultaneously trade-off safe exploration and task fulfillment. In this paper, we introduce a new generation of RL solvers that learn to minimise safety violations while maximising the task reward to the extent that can be tolerated by the safe policy. Our approach introduces a novel two-player framework for safe RL called Distributive Exploration Safety Training Algorithm (DESTA). The core of DESTA is a game between two adaptive agents: Safety Agent that is delegated the task of minimising safety violations and Task Agent whose goal is to maximise the environment reward. Specifically, Safety Agent can selectively take control of the system at any given point to prevent safety violations while Task Agent is free to execute its policy at any other states. This framework enables Safety Agent to learn to take actions at certain states that minimise future safety violations, both during training and testing time, while Task Agent performs actions that maximise the task performance everywhere else. Theoretically, we prove that DESTA converges to stable points enabling safety violations of pretrained policies to be minimised. Empirically, we show DESTA's ability to augment the safety of existing policies and secondly, construct safe RL policies when the Task Agent and Safety Agent are trained concurrently. We demonstrate DESTA's superior performance against leading RL methods in Lunar Lander and Frozen Lake from OpenAI gym. △ Less

Submitted 1 March, 2023; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: arXiv admin note: text overlap with arXiv:2103.09159

arXiv:2107.02474 [pdf, other]

Viscos Flows: Variational Schur Conditional Sampling With Normalizing Flows

Authors: Vincent Moens, Aivar Sootla, Haitham Bou Ammar, Jun Wang

Abstract: We present a method for conditional sampling for pre-trained normalizing flows when only part of an observation is available. We derive a lower bound to the conditioning variable log-probability using Schur complement properties in the spirit of Gaussian conditional sampling. Our derivation relies on partitioning flow's domain in such a way that the flow restrictions to subdomains remain bijective… ▽ More We present a method for conditional sampling for pre-trained normalizing flows when only part of an observation is available. We derive a lower bound to the conditioning variable log-probability using Schur complement properties in the spirit of Gaussian conditional sampling. Our derivation relies on partitioning flow's domain in such a way that the flow restrictions to subdomains remain bijective, which is crucial for the Schur complement application. Simulation from the variational conditional flow then amends to solving an equality constraint. Our contribution is three-fold: a) we provide detailed insights on the choice of variational distributions; b) we discuss how to partition the input space of the flow to preserve bijectivity property; c) we propose a set of methods to optimise the variational distribution. Our numerical results indicate that our sampling method can be successfully applied to invertible residual networks for inference and classification. △ Less

Submitted 15 October, 2021; v1 submitted 6 July, 2021; originally announced July 2021.

arXiv:2010.05099 [pdf, other]

doi 10.1109/TPAMI.2022.3160350

Diagnosing and Preventing Instabilities in Recurrent Video Processing

Authors: Thomas Tanay, Aivar Sootla, Matteo Maggioni, Puneet K. Dokania, Philip Torr, Ales Leonardis, Gregory Slabaugh

Abstract: Recurrent models are a popular choice for video enhancement tasks such as video denoising or super-resolution. In this work, we focus on their stability as dynamical systems and show that they tend to fail catastrophically at inference time on long video sequences. To address this issue, we (1) introduce a diagnostic tool which produces input sequences optimized to trigger instabilities and that c… ▽ More Recurrent models are a popular choice for video enhancement tasks such as video denoising or super-resolution. In this work, we focus on their stability as dynamical systems and show that they tend to fail catastrophically at inference time on long video sequences. To address this issue, we (1) introduce a diagnostic tool which produces input sequences optimized to trigger instabilities and that can be interpreted as visualizations of temporal receptive fields, and (2) propose two approaches to enforce the stability of a model during training: constraining the spectral norm or constraining the stable rank of its convolutional layers. We then introduce Stable Rank Normalization for Convolutional layers (SRN-C), a new algorithm that enforces these constraints. Our experimental results suggest that SRN-C successfully enforces stability in recurrent video processing models without a significant performance loss. △ Less

Submitted 11 March, 2023; v1 submitted 10 October, 2020; originally announced October 2020.

Journal ref: in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 2, pp. 1594-1605, 1 Feb. 2023

arXiv:2006.09436 [pdf, other]

SAMBA: Safe Model-Based & Active Reinforcement Learning

Authors: Alexander I. Cowen-Rivers, Daniel Palenicek, Vincent Moens, Mohammed Abdullah, Aivar Sootla, Jun Wang, Haitham Ammar

Abstract: In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel(semi-)metrics for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We… ▽ More In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel(semi-)metrics for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. Our results show orders of magnitude reductions in samples and violations compared to state-of-the-art methods. Lastly, we provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints. △ Less

Submitted 12 June, 2020; originally announced June 2020.

arXiv:1910.02469 [pdf, other]

doi 10.1109/TAC.2019.2948194

On the Existence of Block-Diagonal Solutions to Lyapunov and $\mathcal{H}_{\infty}$ Riccati Inequalities

Authors: Aivar Sootla, Yang Zheng, Antonis Papachristodoulou

Abstract: In this paper, we describe sufficient conditions when block-diagonal solutions to Lyapunov and $\mathcal{H}_{\infty}$ Riccati inequalities exist. In order to derive our results, we define a new type of comparison systems, which are positive and are computed using the state-space matrices of the original (possibly nonpositive) systems. Computing the comparison system involves only the calculation o… ▽ More In this paper, we describe sufficient conditions when block-diagonal solutions to Lyapunov and $\mathcal{H}_{\infty}$ Riccati inequalities exist. In order to derive our results, we define a new type of comparison systems, which are positive and are computed using the state-space matrices of the original (possibly nonpositive) systems. Computing the comparison system involves only the calculation of $\mathcal{H}_{\infty}$ norms of its subsystems. We show that the stability of this comparison system implies the existence of block-diagonal solutions to Lyapunov and Riccati inequalities. Furthermore, our proof is constructive and the overall framework allows the computation of block-diagonal solutions to these matrix inequalities with linear algebra and linear programming. Numerical examples illustrate our theoretical results. △ Less

Submitted 6 October, 2019; originally announced October 2019.

Comments: This is an extended technical report. The main results have been accepted for publication as a technical note in the IEEE Transactions on Automatic Control

arXiv:1909.11076 [pdf, ps, other]

doi 10.1109/TAC.2022.3151187

Block Factor-width-two Matrices and Their Applications to Semidefinite and Sum-of-squares Optimization

Authors: Yang Zheng, Aivar Sootla, Antonis Papachristodoulou

Abstract: Semidefinite and sum-of-squares (SOS) optimization are fundamental computational tools in many areas, including linear and nonlinear systems theory. However, the scale of problems that can be addressed reliably and efficiently is still limited. In this paper, we introduce a new notion of block factor-width-two matrices and build a new hierarchy of inner and outer approximations of the cone of posi… ▽ More Semidefinite and sum-of-squares (SOS) optimization are fundamental computational tools in many areas, including linear and nonlinear systems theory. However, the scale of problems that can be addressed reliably and efficiently is still limited. In this paper, we introduce a new notion of block factor-width-two matrices and build a new hierarchy of inner and outer approximations of the cone of positive semidefinite (PSD) matrices. This notion is a block extension of the standard factor-width-two matrices, and allows for an improved inner-approximation of the PSD cone. In the context of SOS optimization, this leads to a block extension of the scaled diagonally dominant sum-of-squares (SDSOS) polynomials. By varying a matrix partition, the notion of block factor-width-two matrices can balance a trade-off between the computation scalability and solution quality for solving semidefinite and SOS optimization problems. Numerical experiments on a range of large-scale instances confirm our theoretical findings. △ Less

Submitted 8 February, 2022; v1 submitted 24 September, 2019; originally announced September 2019.

Comments: Accepted for publication as a regular paper at IEEE TAC. Code is available through https://github.com/zhengy09/SDPfw

arXiv:1903.04938 [pdf, ps, other]

Block Factor-Width-Two Matrices in Semidefinite Programming

Authors: Aivar Sootla, Yang Zheng, Antonis Papachristodoulou

Abstract: In this paper, we introduce a set of block factor-width-two matrices, which is a generalisation of factor-width-two matrices and is a subset of positive semidefinite matrices. The set of block factor-width-two matrices is a proper cone and we compute a closed-form expression for its dual cone. We use these cones to build hierarchies of inner and outer approximations of the cone of positive semidef… ▽ More In this paper, we introduce a set of block factor-width-two matrices, which is a generalisation of factor-width-two matrices and is a subset of positive semidefinite matrices. The set of block factor-width-two matrices is a proper cone and we compute a closed-form expression for its dual cone. We use these cones to build hierarchies of inner and outer approximations of the cone of positive semidefinite matrices. The main feature of these cones is that they enable a decomposition of a large semidefinite constraint into a number of smaller semidefinite constraints. As the main application of these classes of matrices, we envision large-scale semidefinite feasibility optimisation programs including sum-of-squares (SOS) programs. We present numerical examples from SOS optimisation showcasing the properties of this decomposition. △ Less

Submitted 12 March, 2019; originally announced March 2019.

Comments: To appear in European Control Conference, 2019

arXiv:1803.05996 [pdf, ps, other]

Scalable analysis of linear networked systems via chordal decomposition

Authors: Yang Zheng, Maryam Kamgarpour, Aivar Sootla, Antonis Papachristodoulou

Abstract: This paper introduces a chordal decomposition approach for scalable analysis of linear networked systems, including stability, $\mathcal{H}_2$ and $\mathcal{H}_{\infty}$ performance. Our main strategy is to exploit any sparsity within these analysis problems and use chordal decomposition. We first show that Grone's and Agler's theorems can be generalized to block matrices with any partition. This… ▽ More This paper introduces a chordal decomposition approach for scalable analysis of linear networked systems, including stability, $\mathcal{H}_2$ and $\mathcal{H}_{\infty}$ performance. Our main strategy is to exploit any sparsity within these analysis problems and use chordal decomposition. We first show that Grone's and Agler's theorems can be generalized to block matrices with any partition. This facilitates networked systems analysis, allowing one to solely focus on the physical connections of networked systems to exploit scalability. Then, by choosing Lyapunov functions with appropriate sparsity patterns, we decompose large positive semidefinite constraints in all of the analysis problems into multiple smaller ones depending on the maximal cliques of the system graph. This makes the solutions more computationally efficient via a recent first-order algorithm. Numerical experiments demonstrate the efficiency and scalability of the proposed method. △ Less

Submitted 15 March, 2018; originally announced March 2018.

Comments: 6 pages; to appear at ECC2018

MSC Class: 93D05; 93D25; 93C05

arXiv:1709.06809 [pdf, ps, other]

Block-Diagonal Solutions to Lyapunov Inequalities and Generalisations of Diagonal Dominance

Authors: Aivar Sootla, Yang Zheng, Antonis Papachristodoulou

Abstract: Diagonally dominant matrices have many applications in systems and control theory. Linear dynamical systems with scaled diagonally dominant drift matrices, which include stable positive systems, allow for scalable stability analysis. For example, it is known that Lyapunov inequalities for this class of systems admit diagonal solutions. In this paper, we present an extension of scaled diagonally do… ▽ More Diagonally dominant matrices have many applications in systems and control theory. Linear dynamical systems with scaled diagonally dominant drift matrices, which include stable positive systems, allow for scalable stability analysis. For example, it is known that Lyapunov inequalities for this class of systems admit diagonal solutions. In this paper, we present an extension of scaled diagonally dominance to block partitioned matrices. We show that our definition describes matrices admitting block-diagonal solutions to Lyapunov inequalities and that these solutions can be computed using linear algebraic tools. We also show how in some cases the Lyapunov inequalities can be decoupled into a set of lower dimensional linear matrix inequalities, thus leading to improved scalability. We conclude by illustrating some advantages and limitations of our results with numerical examples. △ Less

Submitted 20 September, 2017; originally announced September 2017.

Comments: 6 pages, to appear in Proceedings of the Conference on Decision and Control 2017

arXiv:1709.00695 [pdf, ps, other]

doi 10.1109/TCNS.2019.2935618

Distributed Design for Decentralized Control using Chordal Decomposition and ADMM

Authors: Yang Zheng, Maryam Kamgarpour, Aivar Sootla, Antonis Papachristodoulou

Abstract: We propose a distributed design method for decentralized control by exploiting the underlying sparsity properties of the problem. Our method is based on chordal decomposition of sparse block matrices and the alternating direction method of multipliers (ADMM). We first apply a classical parameterization technique to restrict the optimal decentralized control into a convex problem that inherits the… ▽ More We propose a distributed design method for decentralized control by exploiting the underlying sparsity properties of the problem. Our method is based on chordal decomposition of sparse block matrices and the alternating direction method of multipliers (ADMM). We first apply a classical parameterization technique to restrict the optimal decentralized control into a convex problem that inherits the sparsity pattern of the original problem. The parameterization relies on a notion of strongly decentralized stabilization, and sufficient conditions are discussed to guarantee this notion. Then, chordal decomposition allows us to decompose the convex restriction into a problem with partially coupled constraints, and the framework of ADMM enables us to solve the decomposed problem in a distributed fashion. Consequently, the subsystems only need to share their model data with their direct neighbours, not needing a central computation. Numerical experiments demonstrate the effectiveness of the proposed method. △ Less

Submitted 3 August, 2019; v1 submitted 3 September, 2017; originally announced September 2017.

Comments: 11 pages, 8 figures. Accepted for publication in the IEEE Transactions on Control of Network Systems

Journal ref: IEEE Transactions on Control of Network Systems (Volume: 7, Issue: 2, June 2020)

arXiv:1708.00232 [pdf, other]

Pulse-Based Control Using Koopman Operator Under Parametric Uncertainty

Authors: Aivar Sootla, Damien Ernst

Abstract: In applications, such as biomedicine and systems/synthetic biology, technical limitations in actuation complicate implementation of time-varying control signals. In order to alleviate some of these limitations, it may be desirable to derive simple control policies, such as step functions with fixed magnitude and length (or temporal pulses). In this technical note, we further develop a recently pro… ▽ More In applications, such as biomedicine and systems/synthetic biology, technical limitations in actuation complicate implementation of time-varying control signals. In order to alleviate some of these limitations, it may be desirable to derive simple control policies, such as step functions with fixed magnitude and length (or temporal pulses). In this technical note, we further develop a recently proposed pulse-based solution to the convergence problem, i.e., minimizing the convergence time to the target exponentially stable equilibrium, for monotone systems. In particular, we extend this solution to monotone systems with parametric uncertainty. Our solutions also provide worst-case estimates on convergence times. Furthermore, we indicate how our tools can be used for a class of non-monotone systems, and more importantly how these tools can be extended to other control problems. We illustrate our approach on switching under parametric uncertainty and regulation around a saddle point problems in a genetic toggle switch system. △ Less

Submitted 1 August, 2017; originally announced August 2017.

arXiv:1707.08462 [pdf, other]

doi 10.1016/j.automatica.2018.01.036

An Optimal Control Formulation of Pulse-Based Control Using Koopman Operator

Authors: Aivar Sootla, Alexandre Mauroy, Damien Ernst

Abstract: In many applications, and in systems/synthetic biology, in particular, it is desirable to compute control policies that force the trajectory of a bistable system from one equilibrium (the initial point) to another equilibrium (the target point), or in other words to solve the switching problem. It was recently shown that, for monotone bistable systems, this problem admits easy-to-implement open-lo… ▽ More In many applications, and in systems/synthetic biology, in particular, it is desirable to compute control policies that force the trajectory of a bistable system from one equilibrium (the initial point) to another equilibrium (the target point), or in other words to solve the switching problem. It was recently shown that, for monotone bistable systems, this problem admits easy-to-implement open-loop solutions in terms of temporal pulses (i.e., step functions of fixed length and fixed magnitude). In this paper, we develop this idea further and formulate a problem of convergence to an equilibrium from an arbitrary initial point. We show that this problem can be solved using a static optimization problem in the case of monotone systems. Changing the initial point to an arbitrary state allows to build closed-loop, event-based or open-loop policies for the switching/convergence problems. In our derivations we exploit the Koopman operator, which offers a linear infinite-dimensional representation of an autonomous nonlinear system. One of the main advantages of using the Koopman operator is the powerful computational tools developed for this framework. Besides the presence of numerical solutions, the switching/convergence problem can also serve as a building block for solving more complicated control problems and can potentially be applied to non-monotone systems. We illustrate this argument on the problem of synchronizing cardiac cells by defibrillation. Potentially, our approach can be extended to problems with different parametrizations of control signals since the only fundamental limitation is the finite time application of the control signal. △ Less

Submitted 28 June, 2018; v1 submitted 26 July, 2017; originally announced July 2017.

Comments: corrected typos

Journal ref: Automatica Volume 91, May 2018, Pages 217-224

arXiv:1705.02853 [pdf, other]

Geometric Properties of Isostables and Basins of Attraction of Monotone Systems

Authors: Aivar Sootla, Alexandre Mauroy

Abstract: In this paper, we study geometric properties of basins of attraction of monotone systems. Our results are based on a combination of monotone systems theory and spectral operator theory. We exploit the framework of the Koopman operator, which provides a linear infinite-dimensional description of nonlinear dynamical systems and spectral operator-theoretic notions such as eigenvalues and eigenfunctio… ▽ More In this paper, we study geometric properties of basins of attraction of monotone systems. Our results are based on a combination of monotone systems theory and spectral operator theory. We exploit the framework of the Koopman operator, which provides a linear infinite-dimensional description of nonlinear dynamical systems and spectral operator-theoretic notions such as eigenvalues and eigenfunctions. The sublevel sets of the dominant eigenfunction form a family of nested forward-invariant sets and the basin of attraction is the largest of these sets. The boundaries of these sets, called isostables, allow studying temporal properties of the system. Our first observation is that the dominant eigenfunction is increasing in every variable in the case of monotone systems. This is a strong geometric property which simplifies the computation of isostables. We also show how variations in basins of attraction can be bounded under parametric uncertainty in the vector field of monotone systems. Finally, we study the properties of the parameter set for which a monotone system is multistable. Our results are illustrated on several systems of two to four dimensions. △ Less

Submitted 8 May, 2017; originally announced May 2017.

Comments: 12 pages, to appear in IEEE Transaction on Automatic Control

arXiv:1605.06252 [pdf, other]

Sha** Pulses to Control Bistable Monotone Systems Using Koopman Operator

Authors: Aivar Sootla, Alexandre Mauroy, Jorge Goncalves

Abstract: In this paper, we further develop a recently proposed control method to switch a bistable system between its steady states using temporal pulses. The motivation for using pulses comes from biomedical and biological applications (e.g. synthetic biology), where it is generally difficult to build feedback control systems due to technical limitations in sensing and actuation. The original framework wa… ▽ More In this paper, we further develop a recently proposed control method to switch a bistable system between its steady states using temporal pulses. The motivation for using pulses comes from biomedical and biological applications (e.g. synthetic biology), where it is generally difficult to build feedback control systems due to technical limitations in sensing and actuation. The original framework was derived for monotone systems and all the extensions relied on monotone systems theory. In contrast, we introduce the concept of switching function which is related to eigenfunctions of the so-called Koopman operator subject to a fixed control pulse. Using the level sets of the switching function we can (i) compute the set of all pulses that drive the system toward the steady state in a synchronous way and (ii) estimate the time needed by the flow to reach an epsilon neighborhood of the target steady state. Additionally, we show that for monotone systems the switching function is also monotone in some sense, a property that can yield efficient algorithms to compute it. This observation recovers and further extends the results of the original framework, which we illustrate on numerical examples inspired by biological applications. △ Less

Submitted 20 May, 2016; originally announced May 2016.

Comments: 7 pages

arXiv:1603.07686 [pdf, other]

On Existence of Solutions to Structured Lyapunov Inequalities

Authors: Aviar Sootla, James Anderson

Abstract: In this paper, we derive sufficient conditions on drift matrices under which block-diagonal solutions to Lyapunov inequalities exist. The motivation for the problem comes from a recently proposed basis pursuit algorithm. In particular, this algorithm can provide approximate solutions to optimisation programmes with constraints involving Lyapunov inequalities using linear or second order cone progr… ▽ More In this paper, we derive sufficient conditions on drift matrices under which block-diagonal solutions to Lyapunov inequalities exist. The motivation for the problem comes from a recently proposed basis pursuit algorithm. In particular, this algorithm can provide approximate solutions to optimisation programmes with constraints involving Lyapunov inequalities using linear or second order cone programming. This algorithm requires an initial feasible point, which we aim to provide in this paper. Our existence conditions are based on the so-called $\mathcal{H}$ matrices. We also establish a link between $\mathcal{H}$ matrices and an application of a small gain theorem to the drift matrix. We finally show how to construct these solutions in some cases without solving the full Lyapunov inequality. △ Less

Submitted 24 March, 2016; originally announced March 2016.

Comments: To appear in the Proceedings of the 2016 American Control Conference

arXiv:1510.05784 [pdf, other]

Structured Projection-Based Model Reduction with Application to Stochastic Biochemical Networks

Authors: Aivar Sootla, James Anderson

Abstract: The Chemical Master Equation (CME) is well known to provide the highest resolution models of a biochemical reaction network. Unfortunately, even simulating the CME can be a challenging task. For this reason more simple approximations to the CME have been proposed. In this work we focus on one such model, the Linear Noise Approximation. Specifically, we consider implications of a recently proposed… ▽ More The Chemical Master Equation (CME) is well known to provide the highest resolution models of a biochemical reaction network. Unfortunately, even simulating the CME can be a challenging task. For this reason more simple approximations to the CME have been proposed. In this work we focus on one such model, the Linear Noise Approximation. Specifically, we consider implications of a recently proposed LNA time-scale separation method. We show that the reduced order LNA converges to the full order model in the mean square sense. Using this as motivation we derive a network structure preserving reduction algorithm based on structured projections. We present convex optimisation algorithms that describe how such projections can be computed and we discuss when structured solutions exits. We also show that for a certain class of systems, structured projections can be found using basic linear algebra and no optimisation is necessary. The algorithms are then applied to a linearised stochastic LNA model of the yeast glycolysis pathway. △ Less

Submitted 20 October, 2015; originally announced October 2015.

Comments: 13 pages; 7 figures; submitted to IEEE Transaction on Automatic Control

arXiv:1510.01153 [pdf, other]

Properties of Isostables and Basins of Attraction of Monotone Systems

Authors: Aivar Sootla, Alexandre Mauroy

Abstract: In this paper, we investigate geometric properties of monotone systems by studying their isostables and basins of attraction. Isostables are boundaries of specific forward-invariant sets defined by the so-called Koopman operator, which provides a linear infinite-dimensional description of a nonlinear system. First, we study the spectral properties of the Koopman operator and the associated semigro… ▽ More In this paper, we investigate geometric properties of monotone systems by studying their isostables and basins of attraction. Isostables are boundaries of specific forward-invariant sets defined by the so-called Koopman operator, which provides a linear infinite-dimensional description of a nonlinear system. First, we study the spectral properties of the Koopman operator and the associated semigroup in the context of monotone systems. Our results generalize the celebrated Perron-Frobenius theorem to the nonlinear case and allow us to derive geometric properties of isostables and basins of attraction. Additionally, we show that under certain conditions we can characterize the bounds on the basins of attraction under parametric uncertainty in the vector field. We discuss computational approaches to estimate isostables and basins of attraction and illustrate the results on two and four state monotone systems. △ Less

Submitted 22 March, 2016; v1 submitted 5 October, 2015; originally announced October 2015.

Comments: 8 pages, 3 figures, contains material to appear in Proceedings of American Control Conference 2016

arXiv:1510.01149 [pdf, other]

Operator-Theoretic Characterization of Eventually Monotone Systems

Authors: Aivar Sootla, Alexandre Mauroy

Abstract: Monotone systems are dynamical systems whose solutions preserve a partial order in the initial condition for all positive times. It stands to reason that some systems may preserve a partial order only after some initial transient. These systems are usually called eventually monotone. While monotone systems have a characterization in terms of their vector fields (i.e. Kamke-Muller condition), event… ▽ More Monotone systems are dynamical systems whose solutions preserve a partial order in the initial condition for all positive times. It stands to reason that some systems may preserve a partial order only after some initial transient. These systems are usually called eventually monotone. While monotone systems have a characterization in terms of their vector fields (i.e. Kamke-Muller condition), eventually monotone systems have not been characterized in such an explicit manner. In order to provide a characterization, we drew inspiration from the results for linear systems, where eventually monotone (positive) systems are studied using the spectral properties of the system (i.e. Perron-Frobenius property). In the case of nonlinear systems, this spectral characterization is not straightforward, a fact that explains why the class of eventually monotone systems has received little attention to date. In this paper, we show that a spectral characterization of nonlinear eventually monotone systems can be obtained through the Koopman operator framework. We consider a number of biologically inspired examples to illustrate the potential applicability of eventual monotonicity. △ Less

Submitted 26 July, 2017; v1 submitted 5 October, 2015; originally announced October 2015.

Comments: 13 pages

arXiv:1509.08392 [pdf, ps, other]

Properties of Eventually Positive Linear Input-Output Systems

Authors: Aivar Sootla

Abstract: In this paper, we consider the systems with trajectories originating in the nonnegative orthant becoming nonnegative after some finite time transient. First we consider dynamical systems (i.e., fully observable systems with no inputs), which we call eventually positive. We compute forward-invariant cones and Lyapunov functions for these systems. We then extend the notion of eventually positive sys… ▽ More In this paper, we consider the systems with trajectories originating in the nonnegative orthant becoming nonnegative after some finite time transient. First we consider dynamical systems (i.e., fully observable systems with no inputs), which we call eventually positive. We compute forward-invariant cones and Lyapunov functions for these systems. We then extend the notion of eventually positive systems to the input-output system case. Our extension is performed in such a manner, that some valuable properties of classical internally positive input-output systems are preserved. For example, their induced norms can be computed using linear programming and the energy functions have nonnegative derivatives. △ Less

Submitted 19 May, 2024; v1 submitted 28 September, 2015; originally announced September 2015.

arXiv:1503.02557 [pdf, ps, other]

On Monotonicity and Propagation of Order Properties

Authors: Aivar Sootla

Abstract: In this paper, a link between monotonicity of deterministic dynamical systems and propagation of order by Markov processes is established. The order propagation has received considerable attention in the literature, however, this notion is still not fully understood. The main contribution of this paper is a study of the order propagation in the deterministic setting, which potentially can provide… ▽ More In this paper, a link between monotonicity of deterministic dynamical systems and propagation of order by Markov processes is established. The order propagation has received considerable attention in the literature, however, this notion is still not fully understood. The main contribution of this paper is a study of the order propagation in the deterministic setting, which potentially can provide new techniques for analysis in the stochastic one. We take a close look at the propagation of the so-called increasing and increasing convex orders. Infinitesimal characterisations of these orders are derived, which resemble the well-known Kamke conditions for monotonicity. It is shown that increasing order is equivalent to the standard monotonicity, while the class of systems propagating the increasing convex order is equivalent to the class of monotone systems with convex vector fields. The paper is concluded by deriving a novel result on order propagating diffusion processes and an application of this result to biological processes. △ Less

Submitted 9 March, 2015; originally announced March 2015.

Comments: Part of the paper is to appear in American Control Conference 2015

arXiv:1409.6150 [pdf, other]

Sha** Pulses to Control Bistable Biological Systems

Authors: Aivar Sootla, Diego Oyarzun, David Angeli, Guy-Bart Stan

Abstract: In this paper we study how to shape temporal pulses to switch a bistable system between its stable steady states. Our motivation for pulse-based control comes from applications in synthetic biology, where it is generally difficult to implement real-time feedback control systems due to technical limitations in sensors and actuators. We show that for monotone bistable systems, the estimation of the… ▽ More In this paper we study how to shape temporal pulses to switch a bistable system between its stable steady states. Our motivation for pulse-based control comes from applications in synthetic biology, where it is generally difficult to implement real-time feedback control systems due to technical limitations in sensors and actuators. We show that for monotone bistable systems, the estimation of the set of all pulses that switch the system reduces to the computation of one non-increasing curve. We provide an efficient algorithm to compute this curve and illustrate the results with a genetic bistable system commonly used in synthetic biology. We also extend these results to models with parametric uncertainty and provide a number of examples and counterexamples that demonstrate the power and limitations of the current theory. In order to show the full potential of the framework, we consider the problem of inducing oscillations in a monotone biochemical system using a combination of temporal pulses and event-based control. Our results provide an insight into the dynamics of bistable systems under external inputs and open up numerous directions for future investigation. △ Less

Submitted 2 October, 2015; v1 submitted 22 September, 2014; originally announced September 2014.

Comments: 14 pages, contains material from the paper in Proc Amer Control Conf 2015, (pp. 3138-3143) and "Sha** pulses to control bistable systems analysis, computation and counterexamples", which is due to appear in Automatica

arXiv:1403.7429 [pdf, other]

Distributed Reconstruction of Nonlinear Networks: An ADMM Approach

Authors: Wei Pan, Aivar Sootla, Guy-Bart Stan

Abstract: In this paper, we present a distributed algorithm for the reconstruction of large-scale nonlinear networks. In particular, we focus on the identification from time-series data of the nonlinear functional forms and associated parameters of large-scale nonlinear networks. Recently, a nonlinear network reconstruction problem was formulated as a nonconvex optimisation problem based on the combination… ▽ More In this paper, we present a distributed algorithm for the reconstruction of large-scale nonlinear networks. In particular, we focus on the identification from time-series data of the nonlinear functional forms and associated parameters of large-scale nonlinear networks. Recently, a nonlinear network reconstruction problem was formulated as a nonconvex optimisation problem based on the combination of a marginal likelihood maximisation procedure with sparsity inducing priors. Using a convex-concave procedure (CCCP), an iterative reweighted lasso algorithm was derived to solve the initial nonconvex optimisation problem. By exploiting the structure of the objective function of this reweighted lasso algorithm, a distributed algorithm can be designed. To this end, we apply the alternating direction method of multipliers (ADMM) to decompose the original problem into several subproblems. To illustrate the effectiveness of the proposed methods, we use our approach to identify a network of interconnected Kuramoto oscillators with different network sizes (500~100,000 nodes). △ Less

Submitted 28 March, 2014; originally announced March 2014.

Comments: To appear in the Preprints of 19th IFAC World Congress 2014

arXiv:1403.5971 [pdf, other]

On Projection-Based Model Reduction of Biochemical Networks-- Part II: The Stochastic Case

Authors: Aivar Sootla, James Anderson

Abstract: In this paper, we consider the problem of model order reduction of stochastic biochemical networks. In particular, we reduce the order of (the number of equations in) the Linear Noise Approximation of the Chemical Master Equation, which is often used to describe biochemical networks. In contrast to other biochemical network reduction methods, the presented one is projection-based. Projection-based… ▽ More In this paper, we consider the problem of model order reduction of stochastic biochemical networks. In particular, we reduce the order of (the number of equations in) the Linear Noise Approximation of the Chemical Master Equation, which is often used to describe biochemical networks. In contrast to other biochemical network reduction methods, the presented one is projection-based. Projection-based methods are powerful tools, but the cost of their use is the loss of physical interpretation of the nodes in the network. In order alleviate this drawback, we employ structured projectors, which means that some nodes in the network will keep their physical interpretation. For many models in engineering, finding structured projectors is not always feasible; however, in the context of biochemical networks it is much more likely as the networks are often (almost) monotonic. To summarise, the method can serve as a trade-off between approximation quality and physical interpretation, which is illustrated on numerical examples. △ Less

Submitted 24 March, 2014; originally announced March 2014.

Comments: Submitted to the 53rd CDC

arXiv:1403.3579 [pdf, other]

On Projection-Based Model Reduction of Biochemical Networks-- Part I: The Deterministic Case

Authors: Aivar Sootla, James Anderson

Abstract: This paper addresses the problem of model reduction for dynamical system models that describe biochemical reaction networks. Inherent in such models are properties such as stability, positivity and network structure. Ideally these properties should be preserved by model reduction procedures, although traditional projection based approaches struggle to do this. We propose a projection based model r… ▽ More This paper addresses the problem of model reduction for dynamical system models that describe biochemical reaction networks. Inherent in such models are properties such as stability, positivity and network structure. Ideally these properties should be preserved by model reduction procedures, although traditional projection based approaches struggle to do this. We propose a projection based model reduction algorithm which uses generalised block diagonal Gramians to preserve structure and positivity. Two algorithms are presented, one provides more accurate reduced order models, the second provides easier to simulate reduced order models. The results are illustrated through numerical examples. △ Less

Submitted 14 March, 2014; originally announced March 2014.

Comments: Submitted to 53rd IEEE CDC

arXiv:1303.3183 [pdf, ps, other]

Toggling a Genetic Switch Using Reinforcement Learning

Authors: Aivar Sootla, Natalja Strelkowa, Damien Ernst, Mauricio Barahona, Guy-Bart Stan

Abstract: In this paper, we consider the problem of optimal exogenous control of gene regulatory networks. Our approach consists in adapting an established reinforcement learning algorithm called the fitted Q iteration. This algorithm infers the control law directly from the measurements of the system's response to external control inputs without the use of a mathematical model of the system. The measuremen… ▽ More In this paper, we consider the problem of optimal exogenous control of gene regulatory networks. Our approach consists in adapting an established reinforcement learning algorithm called the fitted Q iteration. This algorithm infers the control law directly from the measurements of the system's response to external control inputs without the use of a mathematical model of the system. The measurement data set can either be collected from wet-lab experiments or artificially created by computer simulations of dynamical models of the system. The algorithm is applicable to a wide range of biological systems due to its ability to deal with nonlinear and stochastic system dynamics. To illustrate the application of the algorithm to a gene regulatory network, the regulation of the toggle switch system is considered. The control objective of this problem is to drive the concentrations of two specific proteins to a target region in the state space. △ Less

Submitted 25 February, 2015; v1 submitted 12 March, 2013; originally announced March 2013.

Comments: 12 pages, presented at the 9th French Meeting on Planning, Decision Making and Learning, Liège (Belgium), May 12-13, 2014

arXiv:1303.2987 [pdf, other]

On Periodic Reference Tracking Using Batch-Mode Reinforcement Learning with Application to Gene Regulatory Network Control

Authors: Aivar Sootla, Natalja Strelkowa, Damien Ernst, Mauricio Barahona, Guy-Bart Stan

Abstract: In this paper, we consider the periodic reference tracking problem in the framework of batch-mode reinforcement learning, which studies methods for solving optimal control problems from the sole knowledge of a set of trajectories. In particular, we extend an existing batch-mode reinforcement learning algorithm, known as Fitted Q Iteration, to the periodic reference tracking problem. The presented… ▽ More In this paper, we consider the periodic reference tracking problem in the framework of batch-mode reinforcement learning, which studies methods for solving optimal control problems from the sole knowledge of a set of trajectories. In particular, we extend an existing batch-mode reinforcement learning algorithm, known as Fitted Q Iteration, to the periodic reference tracking problem. The presented periodic reference tracking algorithm explicitly exploits a priori knowledge of the future values of the reference trajectory and its periodicity. We discuss the properties of our approach and illustrate it on the problem of reference tracking for a synthetic biology gene regulatory network known as the generalised repressilator. This system can produce decaying but long-lived oscillations, which makes it an interesting system for the tracking problem. In our companion paper we also take a look at the regulation problem of the toggle switch system, where the main goal is to drive the system's states to a specific bounded region in the state space. △ Less

Submitted 12 March, 2013; originally announced March 2013.

Showing 1–32 of 32 results for author: Sootla, A